Groq Aims to Provide At Least Half of the World’s AI Compute, Says CEO

For Groq, it isn’t about overtaking NVIDIA but co-existing with it. 

When a company like NVIDIA becomes the preferred option for most AI labs for hardware and rises to the status of the world’s biggest company, does it imply that the GPU maker excels in every domain? Probably not. 

American AI infrastructure provider Groq once shared an amusing comment in a blog post. “GPUs are cool for training models, but for inference, they’re slowpokes, leading directly to the great-model-that-no-one-uses problem.”

The company is well on its way to beating NVIDIA in providing inference – a crucial process in which a pre-trained AI model applies its learnings to generate outputs. 

Groq’s language processing unit (LPU) offers capabilities specific to AI inference in ways much better than a traditional graphics processing unit (GPU).

In a podcast interview with venture capitalist Harry Stebbings, Groq CEO Jonathan Ross explained that instead of relying on external memory like GPUs do, LPUs keep all the model parameters directly within their chips.

“Imagine you were trying to build a factory, and it was only 1/100th of the size needed for the assembly line,” said Ross in an analogy, indicating how GPUs operate. 

This would lead to the factory repeatedly processing small batches, dismantling the setup, and restarting the process over and over. 

In contrast, Ross said LPUs allow computation to flow smoothly through thousands of chips simultaneously, eliminating inefficiencies and significantly improving the speed. 

While more chips are being used, LPUs have a significantly less energy consumption footprint than GPUs. 

Owing to this, Ross said, “We [Groq] need to be one of the most important compute providers in the world. Our goal by the end of 2027 is to provide at least half of the world’s AI inference compute.” 

Moreover, last year, NVIDIA CEO Jensen Huang said that one of the major challenges NVIDIA currently faces is generating tokens at incredibly low latency.

However, in no way is Groq’s mission supposed to be misunderstood – they’re not competing with NVIDIA. 

‘I Think NVIDIA Will Sell Every Single GPU They Make for Training’

At first glance, Ross’ statements and the recent events surrounding NVIDIA might suggest that the Taiwanese giant is in trouble, but it isn’t. Both Groq and any inference solutions provider will co-exist with NVIDIA.

“Training should be done on GPUs,” Ross said. “I think NVIDIA will sell every single GPU they make for training.” 

Ross added that if Groq were to deploy high volumes of lower-cost inference chips, the demand for training would increase. “The more inference you have, the more training you need, and vice versa,” he said. 

Moreover, Ross said Groq contemplates selling its LPUs as a “nitro boost to GPUs”. The company experimented with running portions of a model on an LPU and the rest on the GPUs. This speeds up the process and makes the GPUs run much more economically. 

Having said that, Ross and the company don’t really view NVIDIA as a competitor. “They [NVIDIA] don’t offer fast tokens and low-cost tokens. It’s a very different product, but what they do very well is training, and they do it better than anyone else,” he said. 

The demand for GPUs will not end despite the advent of a growing AI inference provider market. “How are you going to do the training?” Ross asked. 

“Buy the GPUs. Get every single one you can,” he said. 

Not Without Competition

That said, Groq competes with several other inference service providers. Most notably, Cerebras and SambaNova, also based in the United States, offer hardware products that directly target NVIDIA’s dominance. 

Recently, Perplexity AI and Mistral AI announced the integration of Cerebras Inference into their products. The latter calls its app ‘Le Chat’ – the fastest AI assistant in the world

On the other hand, SambaNova is the only inference provider among the trio that is capable of handling the Llama 3.1 405B model. 

Groq, on the other hand, no longer sells its AI inference hardware, and its proprietary technology can be accessed on the cloud platform via different models on GroqCloud.

The platform hosts multiple third-party models, including ones developed by Alibaba (Qwen), Meta (Llama), and DeepSeek (R1).

Moreover, Groq has announced that it is available on OpenRouter.ai, a platform that offers a unified interface for accessing numerous AI models. Groq now allows users to use DeepSeek-R1 distilled on Meta’s Llama 70B with 1,000 tokens per second. 

Recently, Saudi Arabia announced a $1.5 billion investment in Groq to expand AI infrastructure in the region. The funding builds on Groq’s previous work in the region, including the rapid deployment of the largest AI inference cluster in the Middle East in December 2024.

📣 Want to advertise in AIM? Book here

Picture of Supreeth Koundinya

Supreeth Koundinya

Supreeth is an engineering graduate who is curious about the world of artificial intelligence and loves to write stories on how it is solving problems and shaping the future of humanity.
Related Posts
Association of Data Scientists
GenAI Corporate Training Programs
Our Upcoming Conference
India's Biggest Conference on AI Startups
April 25, 2025 | 📍 Hotel Radisson Blu, Bengaluru
Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.