Applause 2024 Generative AI Survey Reveals Enthusiasm for Gen AI Despite Persisting Concerns Around Bias and Performance

75% of respondents said chatbots are getting better at managing toxic or inaccurate responses, yet 50% have still experienced bias and 38% have seen inaccuracies.

March 27, 2024 09:07 AM Eastern Daylight Time

BOSTON--(BUSINESS WIRE)--Applause, a world leader in testing and digital quality, released the results of its second Generative AI (Gen AI) Survey. The survey collected input from over 6,300 consumers, software developers and QA testers regarding their usage of, and attitudes toward, Gen AI tools (including chatbots) and Gen AI in software testing. Many respondents felt that chatbots are getting better at supporting their needs, though concerns around bias and performance remain.

Key findings:

Respondents thought chatbots are managing toxic and inaccurate responses better, but many have still experienced biased or inaccurate results and have data privacy concerns.
Chatbots are mainly being used for research.
Multimodal capabilities are essential for chatbot adoption.
It’s not unusual for users to switch between chatbots, depending on the task.
Gen AI is increasingly being used in software development and testing.

Additional insights:

Not surprisingly, ChatGPT was the most used, and most popular, chatbot.

Being first to market, ChatGPT has been used the most (91%), followed by Gemini (63%) and Microsoft Copilot (55%).
Other chatbots have been used by under a third of users: Grok (32%), Pi (29%), Perplexity (24%), Claude (23%) and Poe (21%).
38% of respondents indicated that they use different chatbots depending on the specific task.
27% of respondents said they have replaced one chatbot with another due to performance.

Chatbots are being used for research on a daily basis, suggesting they deliver valuable results.

91% of respondents have used chatbots to conduct research, and 33% of those respondents use them for research daily.
81% of respondents have used chatbots for answering basic search queries in place of traditional search engines, and 32% of those respondents do so daily.

Multimodal capabilities are essential to getting value from chatbots.

A majority of respondents (62%) said that multimedia is essential for a large portion of their usage of a Gen AI tool.
A fourth of respondents have used both text and voice commands to interact with chatbots, with 5% stating they use voice as their main form of input.

Gen AI is increasingly being used in software development and testing.

Of the 1,539 respondents using Gen AI for software development and testing, the most common applications are writing or debugging code (51%), test reporting (48%), building test cases (46%) and building apps (42%).
GitHub Copilot is the most popular tool for coding assistance (41% of respondents), followed by OpenAI Codex (24% of respondents).

There is room to improve the experience.

Only 19% of users indicated that the chatbot understood their prompt and provided a helpful response every time.
Chatbot features users would like to see include better source attribution, more localized responses, support for more languages and deeper personalization.

Concerns still linger around data privacy, inaccurate responses and biased responses. However, respondents thought chatbots are managing toxic and inaccurate responses better.

89% of respondents were concerned about providing private information to chatbots, and 11% said they would never provide private information.
50% of the respondents have experienced biased responses, and 38% have seen examples of inaccurate responses.
75% of respondents felt that chatbots are getting better at managing toxic or inaccurate responses.

“It’s clear from the survey that consumers are keen to use Gen AI chatbots, and some have even integrated it into their daily lives for tasks like research and search. Chatbots are getting better at dealing with toxicity, bias and inaccuracy – however, concerns still remain. Not surprisingly, switching between chatbots to accomplish different tasks is common, while multimodal capabilities are now table stakes,” said Chris Sheehan, SVP Strategic Accounts and AI at Applause. “To gain further adoption, chatbots need to continue to train models on quality data in specific domains and thoroughly test across a diverse user base to drive down toxicity and inaccuracy.”

The Generative AI Survey is part of the State of Digital Quality content series from Applause. In May 2023, the company released its second annual State of Digital Quality Report, which analyzes a representative sample of its testing data and reports on the most common flaws in digital experiences in several industries, including retail, finance, media and telecommunications, and travel and hospitality.

About Applause

Applause is a world leader in testing and digital quality. Brands today win or lose customers through digital interactions, and Applause delivers authentic feedback on the quality of digital assets and experiences, provided by real users in real-world settings. Our disruptive approach harnesses the power of the Applause platform and leverages the uTest community of more than one million independent digital testers worldwide. Unlike traditional testing methods (including lab-based and offshoring), Applause responds with the speed, scale and flexibility that digital-focused brands require and expect. Applause provides insightful, actionable testing results that can directly inform go/no go release decisions, helping development teams build better and faster, and release with confidence. Digital-first brands rely on Applause as a best practice to deliver the digital experiences their customers love.

Applause provides managed services for generating high quality AI datasets, evaluating large language models and thoroughly testing AI applications. Through our global community of digital experts, companies gain access to custom-built teams for specific AI use cases. Our services help companies reduce bias, toxicity and inaccuracies in their models, and accelerate Gen AI development.

Based on improvement guidelines, Applause uses generative AI to improve the quality of written test cases using the new Smart Suggestion feature within Applause Test Case Management (TCM). Smart Suggestion uses ChatGPT to optimize test cases and return suggestions to improve original test case content. Clear and well-written test cases provide faster time to testing, less time spent responding to questions, clearer reported bugs, lower costs and time overhead, and more efficient test cycle execution.

Learn more about Applause Generative AI Solutions.

Contacts

Suzanne Wholley
pr@applause.com

Contacts

Contacts

Search