SELECT LANGUAGE BELOW

Most AI chatbot responses about midterm elections are inaccurate, surprising analysis reveals

Most AI chatbot responses about midterm elections are inaccurate, surprising analysis reveals

Recent findings reveal that when you ask a leading AI chatbot about midterm elections, it’s likely to give a factually incorrect or biased response, citing foreign state media 90% of the time.

A startup called Forum AI, focused on assessing and enhancing AI model accuracy, examined four widely-used chatbots: OpenAI’s ChatGPT, Anthropic’s Claude, Google’s Gemini, and xAI’s Grok.

Surprisingly, the analysis indicated that these bots often struggle to differentiate between credible news outlets and state propaganda, like the Global Times from China, with 15% of all replies referencing at least one state media source.

In a particular instance, Claude quoted the Global Times when asked about the form of government in the United States. This finding was highlighted in a blog post by Katie Harbath, a former Facebook executive and expert for the forum.

The challenges seem to escalate with foreign policy questions.

ChatGPT referenced a state-run media outlet in 51% of its answers, while Grok did so in 44% of cases. The overall percentage for all chatbots concerning foreign policy inquiries was 35%.

Much of the information came from news agencies associated with countries that are not friendly towards the United States.

According to Andy Hall and Robbie Goldfarb from Forum AI, outlets like Xinhua, Global Times, and CGTN were frequently cited, as well as Russian and, to a lesser extent, Iranian news sources.

The researchers asked the chatbots a total of 3,136 questions on various topics, including U.S. politics, international relations, healthcare, and the economy.

The review encompassed 12,542 responses, which were evaluated for accuracy by a panel of experts, marking it as “the largest independent evaluation of AI in news and current affairs” to date.

About 30% of the responses contained at least one factual mistake, ranging from incorrect dates to erroneous policy details.

ChatGPT emerged as the most accurate chatbot, with a 9% error rate, followed by Gemini at 25%, Claude at 41%, and Grok at 43%.

For instance, Gemini inaccurately stated that ACA premiums in Arkansas would rise by 65% to 67% in 2026, even though the approved average increase was around 22%.

In comments about U.S.-Iranian tensions, Grok claimed that U.S. evaluations indicated Iran still lacked a capable navy and air force, despite public reports suggesting that Iran’s capabilities are diminished but not entirely absent.

Politically, chatbots had difficulties maintaining neutrality. “Nearly a quarter of responses did not meet neutrality standards,” noted the forum.

The trend became more evident during election discussions. Claude’s failures leaned left, as did 90% of Gemini’s and 92% of ChatGPT’s, while Grok skewed right 76% of the time.

In response, a spokesperson for Anthropic asserted that Claude is intended to engage with political impartiality and explore opposing viewpoints without bias.

The CEO of Forum AI, Campbell Brown, previously served as a CNN anchor and led news partnerships at Meta, Mark Zuckerberg’s company.

“The risks are considerable, the tools to address them are available, and now is the time to influence their development,” Harbath emphasized.

The Post has sought comments from OpenAI, Google, and xAI regarding the study.

Facebook
Twitter
LinkedIn
Reddit
Telegram
WhatsApp

Related News