Published on February 12, 2025
In AI News

ByteDance Unveils Goku to Take on Google’s Luma and OpenAI’s Sora

While the models may be less powerful than the anime character Goku, they do look pretty impressive.

by Ankush Das

ByteDance, the parent company of TikTok, has dropped a family of joint image-video generation models called Goku. The models seem to be named after the popular anime character ‘Goku’ from the Dragon Ball series.

This comes right after the company teased a video AI model that generates videos from images, dubbed OmniHuman-1.

Researchers claim that the Goku models help create product videos featuring AI-generated influencers, marketing avatars, landscape demos, visualising Chinese poetry, portrait video demos, and more.

The research paper attributes the model’s ability to generate high-quality videos to several key factors. One is the implementation of a rectified flow (RF) formulation for joint image and video generation and the employment of a 3D joint image-video VAE to compress inputs into a shared latent space.

Moreover, the architecture features a Transformer network with full attention, enhanced with techniques like FlashAttention, sequence parallelism, Patch n’ Pack, 3D RoPE position embedding, and Q-K normalisation.

The paper also states that the Goku models demonstrate superior performance in both qualitative and quantitative evaluations, setting new benchmarks when compared to competitors like Luma, Open-Sora, Mira, and Pika.

Goku achieved 0.76 on GenEval, 83.65 on DPG-Bench for text-to-image generation, and 84.85 on VBench for text-to-video tasks. You can see the benchmark results below.

“We believe that this work provides valuable insights and practical advancements for the research community in developing joint image-and-video generation models,” the researchers said.

The model’s ability to generate high-quality product videos featuring AI-generated influencers and other realistic visuals could hugely benefit content creators, influencers, marketers, and others.

📣 Want to advertise in AIM? Book here

Ankush Das

I am a tech aficionado and a computer science graduate with a keen interest in AI, Open Source, and Cybersecurity.

Google Adds AI Video-Generation Model ‘Veo 2’ to YouTube Shorts

Adobe Adds New AI-Powered Video Model to Its Arsenal

TikTok’s Parent Teases Video AI Model Rivaling OpenAI’s Sora, Turns Photos into Videos

How Human Perception of Brightness and Color Shapes Video Encoding Strategies

‘Music Should More Closely Resemble a Video Game in the Future’

Luma AI Announces Ray2 AI Video Model Pre-trained with ‘10x Compute’

AiVANTA and Unscript Partner to Offer AI-Driven Video Solutions for Enterprises

Now You Can Create Videos with See-Through Effects—Effortlessly!

Association of Data Scientists

GenAI Corporate Training Programs

Our Upcoming Conference

Happy Llama 2025

India's Biggest Conference on AI Startups

April 25, 2025 | 📍 Hotel Radisson Blu, Bengaluru

Download the easiest way to
stay informed

‘Most Data Centres Are Not Ready for Liquid Cooling’, says Oracle Exec on NVIDIA Blackwell

Siddharth Jindal

Built on the Blackwell architecture introduced last year, Blackwell Ultra features the NVIDIA GB300 NVL72 rack-scale solution and the NVIDIA HG B300 NVL16 system.