For weeks, it has been a running joke that Anthropic is stuck in a cycle of releasing blogs and research reports while its competitors sprint ahead with innovative AI models.
Now, the company has finally released a new version of Claude – the 3.7 Sonnet.
Despite the questionable nomenclature, the jump from 3.5 to 3.7 and the decision to skip 4.0, users embraced its coding capabilities in no time.
People were actively engaged in building fun games, animations, user interfaces, and other such projects. One user on X effectively summed up the overall sentiment.
*openai releases a model*
— atlas (@creatine_cycle) February 24, 2025
literally beats every existing benchmark, sith dark lord vibes, ASI timeline accelerated
*claude releases a model*
plays pokemon, happy vibes, everyone starts vibecoding
Mckay Wrigley, founder of the AI-based upskilling platform Takeoff AI, said on X, “Claude 3.7 Sonnet is the best model in the world for code.”
Even on benchmarks, the model tops the list. It scored 62.3% accuracy on the SWE-bench while OpenAI’s o3-mini (high) scored a 49.3%. Artificial Analysis, a platform that independently analyses AI models, called it the best non-reasoning model for coding.
Besides benchmarks and first impressions, users were quick to build several projects. Deedy Das, principal at Menlo Ventures, built an app for the popular board game Connect 4 using Claude 3.7 Sonnet and said that the model could write around 5,000 lines of code in just 30 minutes. “It is the closest thing to AGI (Artificial General Intelligence) I’ve seen,” he said. Notably, Menlo Ventures is an investor in Anthropic.
In another instance, Ethan Mollick, a professor at The Wharton School, threw a challenge at the model, asking it to create a sketch of a control panel of a ‘futuristic spaceship’ using p5.js, a JavaScript library for creative coding. He declared that Claude 3.7 Sonnet was the winner.
👀
— Ethan Mollick (@emollick) February 25, 2025
Here is Claude 3.7 on my long-standing challenge: "create something I can paste into p5js that will startle me with its cleverness in creating something that invokes the control panel of a starship in the distant future"
(every other model is in the quoted tweets, not close) https://t.co/LPQOyUpdBP pic.twitter.com/LLxCam5ldF
“Honestly, the gap here is pretty insane, even compared to the o1 models and Grok 3. The dashboard was fully interactive as well; no other model came close,” he said.
In another instance, Derek Nee, an AI engineer and CEO at flowith, compared Claude 3.7 Sonnet with models like OpenAI’s o1, DeepSeek-R1, and Claude 3.5 Sonnet in a task to write a Scalable Vector Graphics (SVG) code for a book cover of a science fiction book. In his evaluation, the 3.7 Sonnet created the most visually pleasing image. Nee said that it crushes other models.
AIM also tested the model by redesigning the Hacker News homepage using Apple’s Human Interface guidelines. In just two iterations, we were able to build an interactive website with front-end libraries.
Even OpenAI Agrees Anthropic is a Better Coding Model
Anthropic has earned a reputation for excelling in code-based tasks. This isn’t just a claim from the company or its fans. Recently, even its competitor, OpenAI, publicly acknowledged that it lags behind Anthropic in this area.
OpenAI introduced a benchmark called SWELancer to test whether AI models can successfully complete real-world software engineering tasks on Upwork. The benchmark comprised over 1,400 tasks across various aspects of software development.
The results revealed that Claude 3.5 Sonnet performed better than GPT-4o and the o1 reasoning model in several tasks.

That said, the Sonnet 3.7 model isn’t free from criticism. It is still very expensive to use, exponentially more than OpenAI’s o3 Mini. The Claude 3.7 Sonnet costs $3 per million input tokens and a whopping $15 per million output tokens.
OpenAI’s o3 Mini, which is comparable to Claude 3.7 Sonnet on benchmarks, costs $1.1 per million input tokens and $4.40 per million output tokens.
Jeremy Chone, a YouTuber who teaches programming, said on X that Sonnet 3.7 “struggles with instructions”. He added that it tends to deviate from recommended coding practices, as it creates separate coding files in Rust.
3.7 Sonnet Available on All Popular AI Coding Tools
Sonnet 3.7 excels at coding but doesn’t rank well as a general-purpose model overall. Furthermore, users already have access to AI tools dedicated to coding, like Cursor and Windsurf, so it raises the question of what Claude seems to achieve here.
However, AI models like Claude are still the foundational layer for these coding tools, and nearly every popular platform has already integrated the 3.7 Sonnet.
The model is now available on Replit Agent, GitHub Copilot, Cursor, Windsurf, and many other platforms.
Cursor, while announcing the new model’s availability on its platform, said, “We’ve been very impressed by its coding ability, especially on real-world agentic tasks. It appears to be the new state of the art.”
However, these tools face an incoming threat from Anthropic. Along with the 3.7 Sonnet, the company also launched an ‘agentic’ coding tool called Claude Code. This tool functions as an active collaborator that can read code, edit files, commit, and push code to GitHub. The tool is currently available under research preview.
“In early testing, Claude Code completed tasks in a single pass that would normally take over 45 minutes of manual work, reducing development time and overhead,” the company said.
It will be interesting to see how a coding agent built on a foundational model takes on successful wrappers like Cursor, Windsurf, or even Devin.