Published on March 17, 2025
In Global Tech

It’s a Bad Idea to Trust Grok, Gemini, ChatGPT, Perplexity for Citations

Web search features in AI apps struggle to provide accurate information about the original publishers.

Illustration by Nalini Nirad

by Supreeth Koundinya

Over the years, using the internet has increasingly meant surrendering control over the content we see. It is particularly evident in social media platforms, where information is algorithmically curated rather than actively sought by users.

Even though search engines allow individuals to independently seek and select information by clicking directly on desired links, this approach is slowly diminishing. Google’s recent testing of the ‘AI Mode’ in search suggests a future where AI-based curation becomes the default method for information retrieval.

Similarly, Perplexity, Grok, and ChatGPT also heavily promote AI-driven search tools, which seems to have worked. One in four Americans uses AI instead of traditional search engines.

However, a new report from Columbia University highlighted a critical flaw affecting these AI-based search tools—issues with citation accuracy, which is the very aspect AI labs emphasise to build user confidence.

Study Exposes Gaps in AI Search Accuracy

Tow Center for Digital Journalism, Columbia University, performed an evaluation of search tools from ChatGPT, Perplexity, Grok, DeepSeek Search, and Google’s Gemini. Ten articles from each of the twenty publishers were selected randomly, and direct excerpts were picked from those articles as input for the AI tool.

Then, the tool was asked to identify the article’s headline, original publisher, publication date, and URL. The study found that collectively, these search engines provided incorrect answers to more than 60% of queries. Notably, Perplexity answered 37% of queries incorrectly, and Grok 3 answered 94% of queries incorrectly.

Source: Tow Center for Digital Journalism

“Most of the tools we tested presented inaccurate answers with alarming confidence,” read the study, which highlighted that outputs rarely used phrases like ‘it appears’, ‘it’s possible’ and ‘I couldn’t locate the exact article’, all of which signify knowledge gaps, and uncertainties.

The research also revealed that more than half of the responses from Gemini and Grok 3 cited broken links.

Source: Tow Center for Digital Journalism

Moreover, these AI tools also often failed to identify the original source of the content. “For instance, despite its partnership with The Texas Tribune, Perplexity Pro cited syndicated versions of Tribune articles for three of the ten queries. In contrast, Perplexity cited an unofficial republished version for one,” the report added.

These issues stem despite the continued efforts of companies like OpenAI, and Perplexity to partner with publishers to provide reliable, and accurate outputs. The study observed multiple instances of these chatbots providing inaccurate responses from the very website they teamed up with.

Source: Tow Center for Digital Journalism

These results are alarming, to say the least. “Seems pretty misleading to advertise a capability as search/retrieval if it provides incorrect answers and links over 40% of the time,” Narasimha Chari, a product manager on X, said while citing the study.

Google Has a Responsibility to Fulfill

While AI systems and products continuously improve, there has been an increasingly strong push to adopt AI for search. Given the above results, this might seem premature. Recently, Google announced that AI overviews are being rolled out to more users, without having to sign in to access the feature.

Aligning with the results of the above-mentioned study, several users have recently expressed frustration with AI overviews and their inaccurate responses. While Google calls AI overviews “one of the most popular search features ever”, there also seems to be no way to disable them.

For instance, Mehdi Sadaghdar, who runs the popular YouTube channel ElectroBOOM, found Google’s AI providing a confusing response to a rather straightforward question. When he wanted to find the amount of energy contained by a lightning bolt, the AI overview first answered “1 gigajoules”, followed by another result showing an answer of “approximately 5 gigajoules”.

“I feel it is dangerous for Google AI answers to be the first result in the searches. I found myself accepting what it says as fact, but then with inaccuracies…it could be spreading false information that would result in inaccurate responses,” Sadaghdar added in a post on X.

Kind of useless google AI overview! pic.twitter.com/9pfrFsf6ki
— Mehdi Sadaghdar (@ElectroBOOMGuy) January 27, 2025

Moreover, Google is also testing an ‘AI Mode’ in Google Search, which, according to its demonstration video, seems to be the first tab users can see. Moreover, as per Google, it comes with enhanced capabilities for reasoning, multi-modal and high-quality responses with Gemini 2.0.

Having said that, Google has indeed been having an incredible run with its newly released Gemini models and the associated multimodal features recently. It is only fair to expect more refinements to AI overviews in search, a product from the company that faces the most number of users.

Moreover, a report from Statista suggests that over 90 million online users in the United States are set to primarily rely on AI for browsing the web. AI makers will certainly need to undertake more responsibilities as false information can lead to mild inconveniences and even fatal consequences in some situations.

📣 Want to advertise in AIM? Book here

Supreeth Koundinya

Supreeth is an engineering graduate who is curious about the world of artificial intelligence and loves to write stories on how it is solving problems and shaping the future of humanity.

Cohesity Unveils ‘Industry’s First AI Search for On-Premises Backup Data’

AI Search Will Define the Next Generation of Business—Here’s Why

LinkedIn Reveals India’s Top Skills for 2025, AI Literacy Takes Lead

What Was Former Intel CEO Doing at NVIDIA’s Flagship Event?

Are Adobe’s AI Agents the Final Step to Fully Automated Customer Service?

NVIDIA Announces 2 Personal Supercomputers—One is as Small as Mac Mini

Anthropic Launches Claude 2.1, Surpasses GPT-4 Turbo in Context Length

Anthropic to Launch Voice Mode Soon, More Features Incoming for Business Users

Wi-Fi Troubles are About to be a Thing of Past, Thanks to AI

Association of Data Scientists

GenAI Corporate Training Programs

Our Upcoming Conference

Happy Llama 2025

India's Biggest Conference on AI Startups

April 25, 2025 | 📍 Hotel Radisson Blu, Bengaluru

Download the easiest way to
stay informed

‘Most Data Centres Are Not Ready for Liquid Cooling’, says Oracle Exec on NVIDIA Blackwell

Siddharth Jindal

Built on the Blackwell architecture introduced last year, Blackwell Ultra features the NVIDIA GB300 NVL72 rack-scale solution and the NVIDIA HG B300 NVL16 system.