Tow Digital News Center: New study finds AI search tools are only 60% accurate on average

It’s a fact that AI models can lack accuracy. Hallucinations and repeating false information have long been a thorny issue for developers. Because use cases vary so widely, it’s difficult to pin down quantifiable percentages associated with AI accuracy. A team of researchers claims they now have the numbers.

The Tow Center for Digital Journalism recently studied eight AI search engines, including ChatGPT Search, Perplexity, Perplexity Pro, Gemini, DeepSeek Search, Grok-2 Search, Grok-3 Search, and Copilot. They tested the accuracy of each tool and recorded how often the tool refused to answer.

The researchers randomly selected 200 news articles from 20 news publishers (10 articles each). They ensured that each article returned the top three results in a Google search when using an article excerpt. They then ran the same query in each AI search tool and rated the accuracy based on whether the search correctly cited A) the article, B) the news organization, and C) the URL.

The researchers then labeled each search based on a range of accuracy from “completely correct” to “completely incorrect.” As you can see in the graph below, all but two versions of Perplexity performed poorly. Overall, the AI search engines were inaccurate 60% of the time. Furthermore, the AI’s “confidence” in these incorrect results reinforced them.

This study is fascinating because it confirms in a quantitative way what we have known for years – that LLMs are “the most sophisticated liars ever.” They report with complete authority that what they say is true, even when it is not, and sometimes even argue or make up other false assertions when faced with doubt.

In an anecdotal article from 2023, Ted Gioia (The Honest Broker) pointed to dozens of ChatGPT responses showing the bot confidently “lying” in response to a large number of queries. While some examples were adversarial queries, many were just general questions.

Even after admitting it was wrong, ChatGPT provided more false information after admitting it was wrong. LLM seemed to be programmed to answer every user input at all costs. The researchers' data confirmed this hypothesis, noting that ChatGPT Search was the only AI tool that answered all 200 article queries. However, it was completely accurate only 28% of the time and was completely inaccurate 57% of the time.

ChatGPT wasn’t the worst. Both versions of X’s Grok AI performed poorly, but Grok-3 Search had an accuracy of 94%. Microsoft’s Copilot wasn’t much better, as it refused to answer 104 of the 200 queries. Of the remaining 96 queries, only 16 were “completely correct,” 14 were “partially correct,” and 66 were “completely wrong,” giving it an accuracy of about 70%.

Arguably the craziest part of all this is that the companies that make these tools are not transparent about this lack of accuracy while charging the public $20-200/month. Furthermore, Perplexity Pro ($20/month) and Grok-3 Search ($40/month) answer a slightly higher percentage of queries correctly than their free versions (Perplexity and Grok-2 Search), but also have significantly higher error rates (above).

Not everyone agrees, though. Lance Ulanoff of TechRadar said he may never use Google again after trying ChatGPT Search. He described the tool as fast, clear, and accurate, with a clean interface and no ads.

<<: PC prices are also rising: manufacturers want profits due to component shortages

>>: The truth behind computer and mobile phone freezing: cosmic ray interference

How to quickly start a product promotion IP on Zhihu?

Burning money, poaching anchors, and high bandwidth fees, how do live streaming platforms achieve commercial monetization?

Blog

World Stroke Prevention Day丨Practical! Common knowledge of stroke first aid can save lives at critical moments!

Recommend

Save the "disappeared her" in your mind! Can protein modification help you "remember"?

◎Reporter Zhang Jiaxin Improving memory is an ete...

What are the popular techniques on Tik Tok? Methods and techniques to become popular on Tik Tok!

How to make Douyin videos popular has always been...

The fifth professional shooting and editing practical training camp of Red Planet Nan Ge, 7 days of professional shooting and editing practical courses

The course comes from the Red Planet of Departure...

Tow Digital News Center: New study finds AI search tools are only 60% accurate on average

How to quickly start a product promotion IP on Zhihu?

E-commerce live streaming practical process summary

Solid wood furniture has so many advantages, why does it have such a strong smell?

Burning money, poaching anchors, and high bandwidth fees, how do live streaming platforms achieve commercial monetization?

World Stroke Prevention Day丨Practical! Common knowledge of stroke first aid can save lives at critical moments!

How to use WeChat to remotely control your computer

Yiwei Liu, "Corporate Growth Map: An Operational Guide for Sustainable Corporate Growth"

Being too clean can also make you sick! Get rid of these 6 "good habits" as soon as possible

This green vegetable is popular! Some special "vegetables" need to be carefully identified...

Can the pigments in ultra-processed foods such as cakes and instant noodles cause cancer?

Recommend

Save the "disappeared her" in your mind! Can protein modification help you "remember"?

What are the popular techniques on Tik Tok? Methods and techniques to become popular on Tik Tok!

The fifth professional shooting and editing practical training camp of Red Planet Nan Ge, 7 days of professional shooting and editing practical courses

Quick Look! An Illustrated Guide to Marine Life in the Paleozoic Era 600 Million Years Ago (Part 2)

8 ways to drive traffic to public accounts through short videos

Tencent Information Stream and QQ Advertising Double 11 E-commerce Promotion Red Book

2020 DaNei C/C++ Training

The Information: ChatGPT’s paid users exceed 20 million, revenue surges 30%

These 11 bad habits are disease "catalysts"! How many of them are you among?

App promotion is at a bottleneck? 7 ways to solve your problems!

《PIMP Core Notes》Dating Course for Boys

There are 3 basic ways to get traffic when promoting APP!

Search ranking rules of Android App Market, ASO optimization of Android App Market!

How much does it cost to create a training mini program in Changji?

Cultural News: Guzheng + Fashion, what makes Chinese culture “break through the circle”?