Published on February 26, 2025
In Global Tech

Grok 3 vs Claude 3.7 Sonnet vs o3-mini vs Gemini 2.0

Each of these models excels in different areas, reflecting the diverse strategies employed by their developers.

Illustration by Nalini Nirad

by Siddharth Jindal

The LLM battle in 2025 is off to a strong start, with frontier models already vying for dominance. Elon Musk’s xAI recently launched Grok 3, which has reportedly impressed users worldwide. Meanwhile, Anthropic introduced Claude 3.7 Sonnet, and earlier this year, OpenAI launched o3-mini with plans to release GPT-4.5 soon.

Google has expanded its Gemini 2.0 lineup with the introduction of Gemini 2.0 Flash and Gemini 2.0 Pro models.

Coding and Reasoning

With a 70.3% score on SWE-bench Verified, Claude 3.7 Sonnet outperforms o3-mini, which scores 49.3%, making it a strong choice for coding. Meanwhile, Grok 3 is also gaining recognition as a competitive coding model. In a blog post, xAI stated that on LiveCodeBench (v5), Grok 3 mini beta (Think) scored 80.4, while o3-mini scored 74.1.

The reasoning models are available through the Grok app. Users can prompt Grok 3 to ‘Think’ or, for more complex inquiries, activate ‘Big Brain’ mode, which uses extra computational power for deeper reasoning.

Besides coding and reasoning, Grok 3 can generate images and supports voice conversation mode. However, access to these features requires a Premium+ or SuperGrok subscription.

Claude 3.7 Sonnet is ideal for complex software tasks. Its extended thinking mode enhances math and science capabilities. However, it does not support voice or video processing.

The model is a ‘hybrid’, meaning it can simultaneously function as both a standard LLM and a reasoning LLM. In extended thinking mode, the model reviews its reasoning before generating a response, leading to improved performance in math, physics, coding, instruction-following, and other complex tasks.

When using Claude 3.7 Sonnet through the API, users can control the number of tokens allocated for reasoning, up to a maximum of 1,28,000 tokens. This allows them to manage the trade-off between response speed, cost, and output quality.

The model can accept text and images as input. This means it can process and analyse text-based data and images to generate responses or perform tasks like code generation and problem-solving. However, it lacks image generation capabilities and does not support voice conversation.

Similarly, OpenAI’s o3-mini is suitable for competitive programming, coding challenges, and cost-sensitive applications. The company released the model in response to DeepSeek’s R1, an open-source alternative to OpenAI’s o1, which was developed at a fraction of the cost.

Unlike Anthropic, where users can set a fixed number of tokens for reasoning, OpenAI provides three reasoning effort levels – low, medium, and high – allowing developers to adjust processing based on their needs.

This feature lets o3-mini allocate more processing power for complex problems or prioritise speed when low latency is required. However, o3-mini does not support vision-related tasks, so developers should continue using OpenAI o1 for visual reasoning.

Like o1, o3-mini comes with a larger context window of 2,00,000 tokens and a max output of 1,00,000 tokens in the API.

Few can match Google Gemini when it comes to multimodality and longer context windows. Gemini 2.0 Flash offers a range of features, including native tool use, a 1 million-token context window, and multimodal input. While it currently supports text output, image and audio output, along with the Multimodal Live API, will be available soon.

For coding, Google has introduced Gemini 2 Pro. The tech giant says the model excels at coding capabilities and processing complex prompts with improved comprehension and reasoning. It also features Google’s largest-ever context window of 2 million tokens, allowing for in-depth analysis of extensive information.

Availability and Pricing

Grok 3 is integrated into X and available for free to all users. However, advanced features like voice mode are exclusive to Premium+ subscribers. Users can interact with Grok 3 directly through the X app or website. X Premium+ is currently available in India for ₹3,470 per month.

xAI’s SuperGrok subscription costs $30 per month or $300 per year when purchased through the iOS app. This standalone app provides access to advanced Grok 3 features like DeepSearch and reasoning modes.

The company also announced that in the coming weeks, Grok 3 and Grok 3 mini will be available through its API platform, offering access to both standard and reasoning models. Moreover, DeepSearch will be released to Enterprise partners via the API.

On the other hand, OpenAI’s o3-mini is available to all free ChatGPT users. The model is accessible through the Chat Completions API, Assistants API, and Batch API for select developers in API usage tiers 3-5.

OpenAI’s o3-mini is a small, cost-efficient reasoning model optimised for coding, math, and science. It supports tools and Structured Outputs and offers a context length of 2,00,000 tokens.

The model’s pricing is set at $1.10 per million input tokens, with a discounted rate of $0.55 per million cached input tokens. Output tokens are priced at $4.40 per million tokens.

Claude 3.7 Sonnet is available across all Claude plans, including Free, Pro, Team, and Enterprise, as well as through Anthropic’s API, Amazon Bedrock, and Google Cloud’s Vertex AI. The model is priced the same as its predecessors at $3 per million input tokens and $15 per million output tokens, including thinking tokens.

Meanwhile, Gemini 2.0 Pro is now available as an experimental model to developers in Google AI Studio and Vertex AI and to Gemini Advanced users in the model drop-down on desktop and mobile.

In the free tier, users can process inputs and generate outputs at no cost. In the paid tier, input processing costs $0.10 per million tokens for text, images, and videos, while audio inputs are priced at $0.70 per million tokens. Output generation is available at $0.40 per million tokens.

Moreover, context caching is free in the free tier. In the paid tier, however, it costs $0.025 per million tokens for text, image, and video data and $0.175 per million tokens for audio.

In Conclusion

Each of these models excels in different areas, reflecting the diverse strategies employed by their developers. The choice between these models should be based on specific needs and the type of tasks intended for them.

Grok 3 stands out with its multimodal capabilities and advanced reasoning, while Claude 3.7 Sonnet shines in coding and complex problem-solving. OpenAI’s o3-mini offers cost-efficient reasoning and flexibility, whereas Google’s Gemini 2.0 boasts an extensive context window and strong multimodal capabilities.

📣 Want to advertise in AIM? Book here

Siddharth Jindal

Siddharth is a media graduate who loves to explore tech through journalism and putting forward ideas worth pondering about in the era of artificial intelligence.

Google Brings New Features to NotebookLM and Gemini

No One is Going to be GPU Poor Anymore

Google to Acquire Cloud Platform Startup Wiz for $32 Billion

Anthropic Launches Claude 2.1, Surpasses GPT-4 Turbo in Context Length

Anthropic to Launch Voice Mode Soon, More Features Incoming for Business Users

Google’s New AI Model Might Make Photoshop Redundant for Beginners

Musk’s xAI Acquires GenAI Startup Hotshot to Scale Text-to-Video Generation

Is MCP the New HTTP for AI?

The Startup that is Winning Over Investors, Tech Giants, and Developers

Association of Data Scientists

GenAI Corporate Training Programs

Our Upcoming Conference

Happy Llama 2025

India's Biggest Conference on AI Startups

April 25, 2025 | 📍 Hotel Radisson Blu, Bengaluru

Download the easiest way to
stay informed

‘Most Data Centres Are Not Ready for Liquid Cooling’, says Oracle Exec on NVIDIA Blackwell

Siddharth Jindal

Built on the Blackwell architecture introduced last year, Blackwell Ultra features the NVIDIA GB300 NVL72 rack-scale solution and the NVIDIA HG B300 NVL16 system.