Published on February 26, 2025
In Global Tech

Anthropic Cracks the Code with Claude 3.7 Sonnet

Some are even calling it the best model for coding.

by Supreeth Koundinya

For weeks, it has been a running joke that Anthropic is stuck in a cycle of releasing blogs and research reports while its competitors sprint ahead with innovative AI models.

Now, the company has finally released a new version of Claude – the 3.7 Sonnet.

Despite the questionable nomenclature, the jump from 3.5 to 3.7 and the decision to skip 4.0, users embraced its coding capabilities in no time.

People were actively engaged in building fun games, animations, user interfaces, and other such projects. One user on X effectively summed up the overall sentiment.

*openai releases a model*
literally beats every existing benchmark, sith dark lord vibes, ASI timeline accelerated

*claude releases a model*
plays pokemon, happy vibes, everyone starts vibecoding
— atlas (@creatine_cycle) February 24, 2025

Mckay Wrigley, founder of the AI-based upskilling platform Takeoff AI, said on X, “Claude 3.7 Sonnet is the best model in the world for code.”

Even on benchmarks, the model tops the list. It scored 62.3% accuracy on the SWE-bench while OpenAI’s o3-mini (high) scored a 49.3%. Artificial Analysis, a platform that independently analyses AI models, called it the best non-reasoning model for coding.

Besides benchmarks and first impressions, users were quick to build several projects. Deedy Das, principal at Menlo Ventures, built an app for the popular board game Connect 4 using Claude 3.7 Sonnet and said that the model could write around 5,000 lines of code in just 30 minutes. “It is the closest thing to AGI (Artificial General Intelligence) I’ve seen,” he said. Notably, Menlo Ventures is an investor in Anthropic.

In another instance, Ethan Mollick, a professor at The Wharton School, threw a challenge at the model, asking it to create a sketch of a control panel of a ‘futuristic spaceship’ using p5.js, a JavaScript library for creative coding. He declared that Claude 3.7 Sonnet was the winner.

👀

Here is Claude 3.7 on my long-standing challenge: "create something I can paste into p5js that will startle me with its cleverness in creating something that invokes the control panel of a starship in the distant future"

(every other model is in the quoted tweets, not close) https://t.co/LPQOyUpdBP pic.twitter.com/LLxCam5ldF
— Ethan Mollick (@emollick) February 25, 2025

“Honestly, the gap here is pretty insane, even compared to the o1 models and Grok 3. The dashboard was fully interactive as well; no other model came close,” he said.

In another instance, Derek Nee, an AI engineer and CEO at flowith, compared Claude 3.7 Sonnet with models like OpenAI’s o1, DeepSeek-R1, and Claude 3.5 Sonnet in a task to write a Scalable Vector Graphics (SVG) code for a book cover of a science fiction book. In his evaluation, the 3.7 Sonnet created the most visually pleasing image. Nee said that it crushes other models.

AIM also tested the model by redesigning the Hacker News homepage using Apple’s Human Interface guidelines. In just two iterations, we were able to build an interactive website with front-end libraries.

Even OpenAI Agrees Anthropic is a Better Coding Model

Anthropic has earned a reputation for excelling in code-based tasks. This isn’t just a claim from the company or its fans. Recently, even its competitor, OpenAI, publicly acknowledged that it lags behind Anthropic in this area.

OpenAI introduced a benchmark called SWELancer to test whether AI models can successfully complete real-world software engineering tasks on Upwork. The benchmark comprised over 1,400 tasks across various aspects of software development.

The results revealed that Claude 3.5 Sonnet performed better than GPT-4o and the o1 reasoning model in several tasks.

That said, the Sonnet 3.7 model isn’t free from criticism. It is still very expensive to use, exponentially more than OpenAI’s o3 Mini. The Claude 3.7 Sonnet costs $3 per million input tokens and a whopping $15 per million output tokens.

OpenAI’s o3 Mini, which is comparable to Claude 3.7 Sonnet on benchmarks, costs $1.1 per million input tokens and $4.40 per million output tokens.

Jeremy Chone, a YouTuber who teaches programming, said on X that Sonnet 3.7 “struggles with instructions”. He added that it tends to deviate from recommended coding practices, as it creates separate coding files in Rust.

3.7 Sonnet Available on All Popular AI Coding Tools

Sonnet 3.7 excels at coding but doesn’t rank well as a general-purpose model overall. Furthermore, users already have access to AI tools dedicated to coding, like Cursor and Windsurf, so it raises the question of what Claude seems to achieve here.

However, AI models like Claude are still the foundational layer for these coding tools, and nearly every popular platform has already integrated the 3.7 Sonnet.

The model is now available on Replit Agent, GitHub Copilot, Cursor, Windsurf, and many other platforms.

Cursor, while announcing the new model’s availability on its platform, said, “We’ve been very impressed by its coding ability, especially on real-world agentic tasks. It appears to be the new state of the art.”

However, these tools face an incoming threat from Anthropic. Along with the 3.7 Sonnet, the company also launched an ‘agentic’ coding tool called Claude Code. This tool functions as an active collaborator that can read code, edit files, commit, and push code to GitHub. The tool is currently available under research preview.

“In early testing, Claude Code completed tasks in a single pass that would normally take over 45 minutes of manual work, reducing development time and overhead,” the company said.

It will be interesting to see how a coding agent built on a foundational model takes on successful wrappers like Cursor, Windsurf, or even Devin.

📣 Want to advertise in AIM? Book here

Supreeth Koundinya

Supreeth is an engineering graduate who is curious about the world of artificial intelligence and loves to write stories on how it is solving problems and shaping the future of humanity.

LinkedIn Reveals India’s Top Skills for 2025, AI Literacy Takes Lead

What Was Former Intel CEO Doing at NVIDIA’s Flagship Event?

Are Adobe’s AI Agents the Final Step to Fully Automated Customer Service?

NVIDIA Announces 2 Personal Supercomputers—One is as Small as Mac Mini

Anthropic Launches Claude 2.1, Surpasses GPT-4 Turbo in Context Length

Anthropic to Launch Voice Mode Soon, More Features Incoming for Business Users

Wi-Fi Troubles are About to be a Thing of Past, Thanks to AI

ElevenLabs Brings PM Modi’s Voice to the World

Google to Partner With MediaTek for Next-Gen TPU

Association of Data Scientists

GenAI Corporate Training Programs

Our Upcoming Conference

Happy Llama 2025

India's Biggest Conference on AI Startups

April 25, 2025 | 📍 Hotel Radisson Blu, Bengaluru

Download the easiest way to
stay informed

‘Most Data Centres Are Not Ready for Liquid Cooling’, says Oracle Exec on NVIDIA Blackwell

Siddharth Jindal

Built on the Blackwell architecture introduced last year, Blackwell Ultra features the NVIDIA GB300 NVL72 rack-scale solution and the NVIDIA HG B300 NVL16 system.