Is Mistral’s Le Chat Truly the ‘World’s Fastest AI Assistant’? 

Le Chat's Flash Answers is using Cerebras Inference, which is touted to be the ‘fastest AI inference provider'.

French AI startup Mistral unveiled the Le Chat app for iOS and Android a few days ago. The app functions as an AI chatbot or assistant, rivalling ChatGPT, Claude, and Gemini, among others. The app offers most of its features for free, with upgraded limits in the Pro tier, which costs $14.99 monthly. 

Le Chat offers web search, image and document understanding capabilities alongside code interpretation and image generation. 

Given the sheer number of AI assistant applications in the market, a new entrant must offer a formidable differentiation. Mistral claims its low-latency models are powered by the ‘fastest inference engines on the planet’. Furthermore, Mistral also says that it responds faster than any other chat assistant, up to 1,100 words per second, via their Flash Answers feature. 

Thanks to Cerebras

Cerebras Inference, a service that delivers high-speed processing to AI applications, is the secret sauce to its speed. 

According to the company, Cerebras Inference is the ‘world’s fastest AI inference provider’ and makes Le Chat 10 times faster than GPT-4o, Claude Sonnet 3.5, and DeepSeek R1. Cerbras also revealed that the 123 billion-parameter Mistral Large model is behind Le Chat. 

Mistral and Cerebras compared Le Chat with Claude 3.5 Sonnet and ChatGPT-4o using a prompt to generate a snake game using Python. 

The results from Mistral’s YouTube video revealed that ChatGPT outputs 85 tokens per second, Claude 120 tokens per second, and Le Chat outperformed the two with 1,100 tokens per second. 

In a video by Cerebras, it was revealed that Le Chat took 1.3 seconds to complete the task, Claude 3.5 Sonnet took 19 seconds, and GPT-4o took 46 seconds. 

“This performance is made possible by the Wafer Scale Engine 3’s SRAM-based inference architecture in combination with speculative decoding techniques developed in collaboration with researchers at Mistral,” said Cerebras in a blogpost.

Several users also resonate with these claims. A user named Marc on X said that the model is “mind-blowingly fast” and added that it built a simple React application in less than 5 seconds. 

Here’s What We Found in Our Real-World Tests

Besides, we at AIM also conducted a real-time test of some of the leading models, albeit with a different prompt, which expects AI models to solve a Chemistry numerical problem from an IIT-JEE question paper – which is often considered one of the world’s most difficult examinations. 

We considered OpenAI’s GPT-4o, o3 Mini, o3 Mini High, Anthropic’s Claude 3.5 Sonnet, DeepSeek R1, Google’s Gemini 2.0 Flash, and of course, Mistral’s Le Chat. 

The following question was used as input: “Ice at –10°C is to be converted into steam at 110°C. The mass of ice is 10-3 kg. What amount of heat is required?”

While we timed the results, Mistral’s Le Chat was the fastest model, but it comes with a caveat.

Le Chat returned the output in less than 4 seconds in three out of six times we tested the model. On the other hand, Google’s Gemini 2.0 Flash returned the output under 6 seconds all the times we tested it. 

It begs the question of whether Flash Answers came into action each time despite being enabled by default.

Note that we were using the free version of the Le Chat assistant, and the pro version provides an upgraded limit to the Flash Answers feature

Moreover, the speeds at which these models perform also depend upon the nature of the queries. Reasoning models, with their lengthy chain of thoughts, prioritise the accuracy of the answer and are bound to take more time. 

For instance, when we tested the prompt with DeepSeek R1, it took over a minute to complete the problem, with a chain of thoughts that involved verification steps, where the model said, “But let me check if all the values are correct. Did I use the right specific heat for steam?” and so on. 

Furthermore, it also took a great deal of time to ensure the answer was provided with the right number of decimal figures. 

A test from Artificial Analysis revealed that OpenAI’s o3-mini was the fastest model among the competition, which outputs 214 tokens per second, ahead of the o1-mini, at 167 tokens per second. 

According to Artificial Analysis, o3-mini also achieved a high score of 89 on its Quality Index, which is on par with o1 (90 points) and DeepSeek R1 (89 points). This quality index quantifies the overall capabilities of the AI model. 

OpenAI has prioritised inference time scaling to deliver outputs at higher speeds. With Cerbreas’ inference capabilities, Mistral seems to have joined the race. Moreover, there is an ongoing battle of token speeds between inference providers like Cerebras, Groq, and SambaNova. 

These ambitions to deliver high-speed responses align with what Jensen Huang, CEO of NVIDIA, said last year. He envisioned a future where AI systems perform various tasks, such as tree search, chain of thought, and mental simulations, reflecting on their own answers and responding in real-time—within a single second. 

📣 Want to advertise in AIM? Book here

Picture of Supreeth Koundinya

Supreeth Koundinya

Supreeth is an engineering graduate who is curious about the world of artificial intelligence and loves to write stories on how it is solving problems and shaping the future of humanity.
Related Posts
Association of Data Scientists
GenAI Corporate Training Programs
Our Upcoming Conference
India's Biggest Conference on AI Startups
April 25, 2025 | 📍 Hotel Radisson Blu, Bengaluru
Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.