What Makes DeepSeek So Special?

DeepSeek’s success has motivated Perplexity AI chief Aravind Srinivas to explore building a similar startup in India.
Illustration by Diksha Mishra

Without drawing attention, DeepSeek has made it clear that the company means business. The China-based AI research lab recently released its new models, DeepSeek-R1 and DeepSeek-R1-Zero. The models are on par with OpenAI’s o1. 

The DeepSeek-R1 model is now available at chat.deepseek.com, complete with its API, which supports fine-tuning and distillation. Users can freely experiment and explore its capabilities. One of the most entertaining features is that, while generating responses, it also shares its internal monologue, which many users find amusing.

“The raw chain of thought from DeepSeek is fascinating. It really reads like a human thinking out loud. Charming and strange,” Ethan Mollick, professor at The Wharton School, said. Sharing similar sentiments, Matthew Berman, CEO of Forward Future, said, “DeepSeek-R1 has the most human-like internal monologue I’ve ever seen. It’s actually quite endearing.”

DeepSeek was not the only one. Another Chinese company, Moonshot, unveiled Kimi K1.5, an o1-level multimodal model.

“The Chinese ‘Open’AI companies are turning the Chinese New Year into a celebration for the entire global AI community,” AI researcher Wenhu Chen said.

DeepSeek’s success has motivated Perplexity AI chief Aravind Srinivas to explore building a similar startup in India. Expressing regret about not developing LLMs from scratch, he said, “I’m not in a position to run a DeepSeek-like company for India, but I’m happy to help anyone obsessed enough to do it and open-source the models.”

Reinforcement Learning for the Win

DeepSeek, in its research paper, revealed that the company bet big on reinforcement learning (RL) to train both of these models. DeepSeek-R1-Zero was developed using a pure RL approach without any prior supervised fine-tuning (SFT). This model utilised Group Relative Policy Optimisation (GRPO), which allows for efficient RL training by estimating baselines from group scores rather than requiring a separate critic model of similar size to the policy model. 

DeepSeek-R1 incorporates a multi-stage training approach and cold-start data. This method improved the model’s performance by refining its reasoning abilities while maintaining clarity in output. “The model has shown performance comparable to OpenAI’s o1-1217 on various reasoning tasks,” the company said. 

“This ‘aha moment’ in the DeepSeek-R1 paper is huge. Pure reinforcement learning (RL) enables an LLM to automatically learn to think and reflect,” Yuchen Jin, co-founder and CTO of Hyperbolic, said. 

He added that the excitement around DeepSeek is similar to the AlphaGo era. Just like how AlphaGo used pure RL to play countless Go games and optimise its strategy to win, DeepSeek is using the same approach to advance its capabilities. “2025 could be the year of RL.”

This method enables the model to explore reasoning capabilities autonomously without being constrained by supervised data.

“We are living in a timeline where a non-US company is keeping the original mission of OpenAI alive – truly open, frontier research that empowers all. It makes no sense. The most entertaining outcome is the most likely,” Jim Fan, senior research manager and lead of Embodied AI (GEAR Lab), said

“DeepSeek-R1 not only open-sources a barrage of models but also spills all the training secrets. They are perhaps the first OSS project that shows major, sustained growth of an RL flywheel,” he added. 

On the other hand, Kimi k1.5 utilises RL with long and short-chain-of-thought (CoT). The model supports up to 128k tokens. Moreover, according to their self-published report, it achieves state-of-the-art (SOTA) performance on benchmarks like AIME (77.5), MATH-500 (96.2), and LiveCodeBench (47.3). 

By combining RL with long-CoT and multi-modal strategies, the Kimi k1.5 significantly improves reasoning, planning, and reflection across a wide range of tasks.

“DeepSeek does AlphaZero approach – purely bootstrap through RL without human input, i.e. ‘cold start’. Kimi does AlphaGo-Master approach – light SFT to warm up through prompt-engineered CoT traces,” Fan added

DeepSeek doesn’t use techniques like Monte Carlo Tree Search (MCTS), Process Reward Model (PRM), or dense reward modelling. In contrast, AlphaGo and its successors, including AlphaGo Zero, utilise MCTS.

Alibaba recently launched its open-source reasoning model, Marco-o1. The model was powered by CoT fine-tuning, MCTS, reflection mechanisms, and innovative reasoning strategies to tackle complex real-world problems.

DeepSeek-R1 Throws OpenAI into the Water 

DeepSeek R1 not only surpasses OpenAI o1 on benchmarks but also proves to be far more cost-effective, delivering savings of 96–98% across all categories. 

Meanwhile, OpenAI CEO Sam Altman recently stated on X that the company has not yet developed AGI. “We are not gonna deploy AGI next month, nor have we built it,” he posted. The company, however, intends to release o3 mini within the next couple of weeks. 

On the other hand, Google has launched an experimental update (gemini-2.0-flash-thinking-exp-01-21), which has brought improved performance across several key benchmarks in math, science, and multimodal reasoning. Notable results include AIME at 73.3%, GPQA at 74.2%, and MMMU at 75.4%. 

Moreover, it comes with a 1M long context, which allows users deeper analysis of long-form texts like multiple research papers  or extensive datasets.

In December last year, Google unveiled the Gemini 2.0 Flash Thinking model. The model offers advanced reasoning capabilities and showcases its thoughts. Logan Kilpatrick, senior product manager at Google, said the model “unlocks stronger reasoning capabilities and shows its thoughts”. 

Most recently, Google DeepMind published a study that introduced inference time scaling for diffusion models. Following this, the lab published a new paper that introduced a new technique called Mind Evolution to improve the efficiency of large language models (LLMs) during inference. This method involves using the model to generate possible responses, recombining different parts of those responses, and refining them to create better results.

📣 Want to advertise in AIM? Book here

Picture of Siddharth Jindal

Siddharth Jindal

Siddharth is a media graduate who loves to explore tech through journalism and putting forward ideas worth pondering about in the era of artificial intelligence.
Related Posts
Association of Data Scientists
GenAI Corporate Training Programs
Our Upcoming Conference
India's Biggest Conference on AI Startups
April 25, 2025 | 📍 Hotel Radisson Blu, Bengaluru
Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.