AI Search Engines Cite Incorrect Sources at a 60% Rate

March 15, 2025

A new study from the Columbia Journalism Review's Trucking Center for Digital Journalism reveals serious accuracy issues regarding the generated AI models used in news search. According to the survey, when you ask about AI search engine news, the surprising error rate for AI search engines is 60%.

Ars Technica Report This study tested eight AI-driven search tools with live search capabilities and found that AI models incorrectly answered more than 60% of queries about news sources. This is particularly concerning given that AI models are currently being used as an alternative to traditional search engines, according to reports by researchers from Klaudia Jawińska and Aisvariya Chandrasekar.

Error rates vary widely across platforms tested. Confused provided misinformation in 37% of queries, but CHATGPT searches were 67% wrong time. The Elon Musk's Grok 3 had an error rate of 94%. In this study, researchers provided direct excerpts to AI models from actual news articles, each asking them to identify the headings, original publisher, publication date, and URL. In total, 1,600 queries were executed with 8 generated search tools.

This study found that rather than rejecting responses when there is a lack of reliable information, AI models often provide “completion.” This behavior was seen in all models tested. Surprisingly, paid premium versions like Perplexity Pro ($20 a month) and Grok 3 Premium ($40 a month) were more confident in delivering false responses than the free version, but answered more total prompts correctly.

Also, evidence has revealed suggesting that some AI tools ignored publishers' robot exclusion protocol settings to prevent unauthorized access. For example, the free version of Perplexity correctly identifies all 10 excerpts from Paywalled National Geographic Content, despite explicitly blocking Perplexity's Web Crawlers.

Whether the AI search tool provided citations or the publisher had officially licensed its AI companies, it frequently directed users to syndicated versions on platforms like Yahoo News, rather than on the original publisher site. URL manufacturing is another big issue, with over half of the quotes from Google's Gemini and Grok 3 leading to manufacturing or broken URLs that resulted in error pages. Of the 200 GROK 3 quotes tested, 154 resulted in broken links.

These issues present difficult challenges that are difficult for publishers. Blocking AI crawlers can result in the loss of attributes entirely, but it allows for widespread reuse of content without returning traffic to the publisher's site. Time Magazine's COO Mark Howard has expressed his desire for more transparency and control over how time content appears in AI-generated searches. However, he sees room for repetitive improvements, saying “the product is the worst today ever,” and he looks at the substantial investments being made to improve the tool. Howard also suggests that if you fully trust free AI tools, the consumer is at fault, saying, “If you believe any of these free products are 100% accurate, everyone now believes.

The Openai and Microsoft statements allowed the receipt of the findings, but did not directly address the specific issues raised. Openai focused on its commitment to supporting publishers through traffic-driven overviews, citations, clear links and attributions. Microsoft said it complies with the robot exclusion protocol and publisher directives.

This latest report extends the findings of previous towing centres from November 2024, identifying similar accuracy issues with the way ChATGPT handles news content. The extensive CJR report provides details on this important and evolving issue at the intersection of AI and online journalism.

Please read more Find Ars Technica here.