Published on February 28, 2025
In Global Tech

OpenAI Offers GPT-4.5 With 40% Fewer Hallucinations, 30x Higher Cost

The model trades power for personality.

by Supreeth Koundinya

The rapid release of advanced AI models in the past few days has been impossible to ignore. With the launch of Grok-3 and Claude 3.7 Sonnet, two leading AI companies, xAI and Anthropic, have significantly accelerated the pace of innovation in the field.

As rumours about OpenAI’s newest model circulated, anticipation surged. However, when GPT-4.5 was released, OpenAI said it wasn’t a frontier model and was less powerful than the company’s o3-mini model and many others in the competition.

It doesn’t excel in coding, reasoning, or any such capabilities, either—because it isn’t meant to be. At this time, OpenAI has focused more on the model’s usability than anything else.

Fewer Hallucinations on GPT-4.5

OpenAI tested GPT-4.5 on the SimpleQA benchmark, a tool that evaluates the factual accuracy of AI models in answering short, fact-seeking questions. The model achieved a hallucination rate of 37.1%, in contrast to the o3-Mini, which recorded over 70%. The GPT-4o model exhibited a hallucination rate of 61%.

This indicates a 40% reduction in the hallucination rate compared to its predecessor. In accuracy rates on the SimpleQA benchmark, GPT 4.5 scored 62.5%, higher than OpenAI’s o3-mini (15%), o1 (47%), and GPT-4o (38.2%). This is also higher than many models in the competition, as the Grok-3 model scored a 43.6% accuracy rate in the benchmark, the Gemini 2.0 Pro scored 44.3%, and the Claude 3.5 Sonnet scored 28.4%.

OpenAI also released a system card for the GPT-4.5 model, which evaluates all the safety concerns and associated risks. In an evaluation called PersonQA, which tested the model for hallucinations, GPT-4.5 was more accurate and showed a lesser hallucination rate than the o1 and the GPT-4o models.

Given its availability at the $200/month pro plan, several users agreed with OpenAI’s claims of reduced hallucinations.

Aaron Levie, CEO of the cloud storage company Box, revealed that GPT-4.5 significantly improved over the GPT-4o in extracting data fields from enterprise content, like important details in a contract. “We found a 19 pt [point] improvement in single shot extraction. This is a huge improvement for any mission-critical enterprise workflow,” he said in a post on X.

Early testers of the model also gave high praise for the model’s verbal and emotional intelligence. “I found it to be by far the highest verbal intelligence model I’ve ever used. It’s an outstanding writer and conversationalist,” said Theo Jaffee, who had early access to the GPT-4.5 model.

‘First Model That Feels Like Talking to a Thoughtful Person’

While CEO Sam Altman was absent from the launch event, he said on X that GPT-4.5 “is the first model that feels like talking to a thoughtful person to me.”

“I have had several moments where I’ve sat back in my chair and been astonished at getting actually good advice from an AI,” added Altman, and said that the model offers a different kind of intelligence. There’s a magic to it that he hasn’t felt before.

The model supposedly excels at creative and emotional thinking. Ethan Mollick, a professor at The Wharton School, said on X, “It can write beautifully, is very creative, and is occasionally oddly lazy on complex projects.” He even joked that the model took a “lot more” classes in the humanities.

Andrej Karpathy, the former OpenAI researcher and founder of Eureka Labs, found that two years ago, when he tested the GPT-4, the model’s word choice was more creative, and he had improved understanding of the nuances of the prompt compared to GPT-3.5. Karpathy said that he has a similar feeling for GPT-4.5. Everything is a little bit better,” he said.

OpenAI, in the model’s system card, said internal testers reported GPT-4.5 as warm, intuitive, and natural. “When tasked with emotionally charged queries, it knows when to offer advice, defuse frustration, or simply listen to the user,” the report read.

Overall, the GPT 4.5 isn’t a mind-blowing model, and it isn’t the best model on benchmarks either. For example, it is worse than the recently released Claude 3.7 Sonent on coding benchmarks and offers only a marginal improvement over the GPT-4o.

Altman also confirmed earlier that the company plans to release the GPT-5 model soon, combining general purpose and reasoning capabilities in a single model.

Comes at an Exponential Cost

However, if the company aims to make the GPT-4.5 available to the masses, there’s bad news. It isn’t available yet on the free version or even the $20/month plan. If it were to be deployed on other platforms via API, it would be the most expensive model, and its pricing is an exponential jump over GPT-4o or even the o3-mini.

The GPT-4.5 Preview costs $75 and $150 per 1 million input and output tokens, respectively. In comparison, the GPT-4o costs $2.5 and $10 per million input and output tokens, respectively.

Clement Delangue, CEO at HuggingFace, said, “IMO [in my opinion], if GPT 4.5 was released as an open-source base model (that everyone can distill), it would be the most impactful release of the year,” and added that he isn’t a fan of the API either.

“Making a few hundred million [dollars] now from it via API doesn’t move the needle compared to the 10x more usage/visibility/goodwill/talent they could get by open-sourcing it,” he added.

OpenAI will have to watch out for the launch of DeepSeek-R2 and Meta’s Llama 4, which are expected to be out in a few months.

Moreover, if OpenAI is marketing the model for its creative and empathetic outputs, they are subjective metrics at the end of the day. Karpathy conducted a poll on X to check if users prefer outputs of GPT-4.5, or GPT-4o, and many users preferred the latter. It will be interesting to see how many users will be truly pleased with GPT-4.5 when it is released.

📣 Want to advertise in AIM? Book here

Supreeth Koundinya

Supreeth is an engineering graduate who is curious about the world of artificial intelligence and loves to write stories on how it is solving problems and shaping the future of humanity.

OpenAI is Trying Really Hard to Attract Young Talent

OpenAI Releases New Audio Models to Power Voice Agents

LinkedIn Reveals India’s Top Skills for 2025, AI Literacy Takes Lead

What Was Former Intel CEO Doing at NVIDIA’s Flagship Event?

Are Adobe’s AI Agents the Final Step to Fully Automated Customer Service?

NVIDIA Announces 2 Personal Supercomputers—One is as Small as Mac Mini

Anthropic Launches Claude 2.1, Surpasses GPT-4 Turbo in Context Length

Anthropic to Launch Voice Mode Soon, More Features Incoming for Business Users

OpenAI’s Head of Post-Training Liam Fedus Departs to Build AI for Science Startup

Wi-Fi Troubles are About to be a Thing of Past, Thanks to AI

Association of Data Scientists

GenAI Corporate Training Programs

Our Upcoming Conference

Happy Llama 2025

India's Biggest Conference on AI Startups

April 25, 2025 | 📍 Hotel Radisson Blu, Bengaluru

Download the easiest way to
stay informed

‘Most Data Centres Are Not Ready for Liquid Cooling’, says Oracle Exec on NVIDIA Blackwell

Siddharth Jindal

Built on the Blackwell architecture introduced last year, Blackwell Ultra features the NVIDIA GB300 NVL72 rack-scale solution and the NVIDIA HG B300 NVL16 system.