Published on February 24, 2025
In AI News

DeepSeek Launches FlashMLA, an MLA Decoding Kernel for Hopper GPUs

The kernel supports BF16 and features a paged KV cache with a block size of 64.

by Siddharth Jindal

DeepSeek, a Chinese artificial intelligence (AI) lab by High-Flyer startup, has kicked off its “Open Source Week” by releasing FlashMLA, a decoding kernel designed for Hopper GPUs. It is optimised for processing variable-length sequences and is now in production.

The kernel supports BF16 and features a paged KV cache with a block size of 64. On the H800 GPU, it achieves speeds of 3000 GB/s in memory-bound configurations and 580 TFLOPS in compute-bound configurations.

DeepSeek says FlashMLA is inspired by projects like FlashAttention 2&3 and Cutlass. The kernel is available on GitHub for exploration and use.

“Honored to share FlashMLA – our efficient MLA decoding kernel for Hopper GPUs, optimised for variable-length sequences and now in production,” the company said in a post on X.

The release of FlashMLA is expected to improve computational efficiency, particularly in applications involving AI and potentially impacting sectors like cryptocurrency trading algorithms. FlashMLA, available on GitHub, offers high performance with speeds of up to 3000 GB/s for memory tasks and 580 TFLOPS for computing.

DeepSeek recently announced it is launching five open-source repositories starting this week. “We’re a tiny team (at) DeepSeek exploring AGI (Artificial General Intelligence). Starting next week, we’ll be open-sourcing five repos, sharing our small but sincere progress with full transparency,” it said on X.

Currently, it has a collection of 14 open-source models and repositories on Hugging Face.

Recently, it released its DeepSeek-R1 and DeepSeek-V3 models. These AI models offer state-of-the-art performance while being trained and deployed at a fraction of the cost of their competitors.

📣 Want to advertise in AIM? Book here

Siddharth Jindal

Siddharth is a media graduate who loves to explore tech through journalism and putting forward ideas worth pondering about in the era of artificial intelligence.

This Developer Ran the 671 Billion Parameter DeepSeek-R1 Model—Without a GPU

Will Micro VCs Fuel the Next $1 Billion Indian AI Startup?

DeepSeek Reports 545% Daily Profit Despite Free AI Services

DeepSeek Offers 75% Discount on its Reasoning Model; New R2 Model to Release Before May

DeepSeek Launches DeepEP, a Communication library for Mixture of Experts Model Training and Inference

DeepSeek May Not Hurt Chip Demand, After All

This Govt-Funded ₹235 Cr AI Initiative is India’s Real Answer to DeepSeek

DeepSeek Will Announce 5 Open-Source Repositories Starting Next Week

Association of Data Scientists

GenAI Corporate Training Programs

Our Upcoming Conference

Happy Llama 2025

India's Biggest Conference on AI Startups

April 25, 2025 | 📍 Hotel Radisson Blu, Bengaluru

Download the easiest way to
stay informed

‘Most Data Centres Are Not Ready for Liquid Cooling’, says Oracle Exec on NVIDIA Blackwell

Siddharth Jindal

Built on the Blackwell architecture introduced last year, Blackwell Ultra features the NVIDIA GB300 NVL72 rack-scale solution and the NVIDIA HG B300 NVL16 system.