How DeepSeek Made OpenAI Take India Seriously

OpenAI CEO Sam Altman believes while models are not cheap, India can still build its own reasoning model and become a leader.

Although it’s been a while since DeepSeek’s release, its impact has been profound, making AI both affordable and widely accessible; so much so that even OpenAI found itself under pressure.

Referring to a time when he said it was “hopeless” for India to build its own foundational model, OpenAI CEO Sam Altman clarified his previous statement during his visit to India and said, “That was a very specific time with scaling laws.”

“But we are now in a world where we have made incredible progress with distillation,” he said while talking about the power of small models and reasoning models. As per him, while models are still not cheap, India can still build its own reasoning model and become a leader.  

Recently, Altman published a blog in which he stated that the cost to use a given level of AI falls about 10x every 12 months, and lower prices lead to much more use. This will, in turn, require more compute.

DeepSeek’s success has left many wondering how China achieved this with limited resources. At MLDS 2025, Paras Chopra, founder of Lossfunk, shared how DeepSeek pulled it off.

He said that one of the major hurdles in scaling large AI models is managing the key-value (KV) cache, which grows quadratically and limits the size of inputs and outputs. The conventional approach involves using inefficient methods like linear attention or group query attention to manage this. DeepSeek, however, found a more efficient solution.

“They came out with a low-rank approximation of it, what they called compressed latent KV,” Chopra explained. This approach allowed DeepSeek to process longer inputs more efficiently, resulting in improved performance and longer chains of reasoning without the need for excessive computational resources. 

By addressing the quadratic growth of the KV cache, DeepSeek made it possible to handle larger datasets without the usual computational costs.

Besides, DeepSeek’s approach to the mixture of expert (MoE) architecture was another key factor. MoE allows different parts of the model to be isolated on different GPUs, saving resources.

Chopra said that while others simply routed tasks to the best experts or a fixed number of experts, DeepSeek’s innovation was more dynamic. “They thought about intelligence as being comprised of two parts – shared experts and routed experts.”

To further cut costs, Chopra said DeepSeek innovated at the hardware level as well. He shared that DeepSeek was the first to push the boundaries of Compute Unified Device Architecture (CUDA) and Parallel Thread Execution (PTX), NVIDIA’s intermediate language, to address memory bandwidth bottlenecks. “They were the first ones to also do FP8 precision training,” he said.

Using FP8 precision allowed DeepSeek to run its models on smaller, less expensive GPUs. Training with FP8 precision significantly lowered the memory requirements compared to traditional FP16 or FP32 training, which, in turn, reduced the costs associated with both training and inference.

India Takes Inspiration

Chopra argues that for India to develop a state-of-the-art foundation model sheer compute power might not be the most effective solution. “The human brain is an incredibly efficient AGI. It runs on potatoes. You don’t need a nuclear-powered data centre to operate an AGI,” he said. 

Comparing ISRO’s accomplishments in several missions at a lower cost than NASA’s, he added that India can do the same in AI.

“As a nation, we don’t have to look too far to see the amazing things we’ve already accomplished. We’ve done it in areas like space, and there’s no reason why we can’t do the same in AI.” Chopra’s company Lossfunk is also on a mission to build a state-of-the-art foundational reasoning model from India and is inviting applicants to join the effort. 

“Creativity is born out of constraints, and DeepSeek’s success proves that with the right approach, it’s possible to innovate and scale AI models without relying on endless financial resources,” Chopra further said.

In an interview with AIM, Harneet SN, founder of Rabbitt AI, said, “DeepSeek is the Jambavan moment for India in the sense that, just like in the Ramayana, Jambavan came and reminded Hanuman of his powers, DeepSeek has done the same for India’s AI community.”

The IndiaAI mission recently called for proposals to build India’s own foundational model, with finance minister Nirmala Sitharaman allocating ₹2,000 crore for the mission – nearly a fifth of the ₹10,370 crore announced for the scheme last year. 

Similarly, Gaurav Aggarwal, AI/ML lead at Jio, is inviting exceptional graduate students in the US working on challenging AI problems to join as research interns to build next-generation AI models for India and the world. “India has fallen behind in the race to develop its own cutting-edge LLMs – but we are changing that,” he said in a post on X. 

Is DeepSeek’s Success Exaggerated?

OpenAI vice president Srinivas Narayanan, during an interaction with IIT Madras professor Balaraman Ravindran, said that the success of DeepSeek is overly exaggerated. 

“What we have learned from DeepSeek is that they’ve done some things that are efficient and from which we can learn, but the level of efficiency has been extremely exaggerated,” he said, adding that while people talk about the cost of building a single model, it is not the cost of running an entire AI lab.

“​​If you take the cost of a single model that OpenAI would train, maybe our most recent runs would be pretty comparable. But it’s much harder to lead—you have to run…a lot more experiments before you finally decide what model you’re going to train,” he said. 

Narayanan added that OpenAI’s latest model, o3-mini, is comparably cheaper than the other models in the US on inference. He believes that there will be not much difference between closed-source models and open-source models in terms of pricing in the future. 

Similarly, Google DeepMind chief Demis Hassabis recently said that DeepSeek can do “extremely good engineering” and that it “changes things on a geopolitical scale”. However, from a technology point of view, Hassabis said it was not a big change.

“Despite the hype, there’s no actual new scientific advance…It’s using known techniques [in AI],” he said, adding that the hype around DeepSeek has been “exaggerated a little bit”.

Meanwhile, Amazon chief Andy Jassy, during the recent earnings call, said that with DeepSeek-like models, inference costs will come “meaningfully down”. “I think it will make it much easier for companies to infuse all their applications with inference and with generative AI.” 

He clarified that people thought they would spend less money on infrastructure. However, what happens is that companies will spend a lot less per unit of infrastructure, and that is very useful for their businesses. Notably, AWS was the first cloud to host DeepSeek R1 on AWS Bedrock and Sagemaker. 

📣 Want to advertise in AIM? Book here

Picture of Siddharth Jindal

Siddharth Jindal

Siddharth is a media graduate who loves to explore tech through journalism and putting forward ideas worth pondering about in the era of artificial intelligence.
Related Posts
Association of Data Scientists
GenAI Corporate Training Programs
Our Upcoming Conference
India's Biggest Conference on AI Startups
April 25, 2025 | 📍 Hotel Radisson Blu, Bengaluru
Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.