Published on July 11, 2024
In AI News

Researchers Unveil MiraData for Longer Video Generation With Structured Captions

MiraData offers videos averaging 72.1 seconds with detailed captions of 318 words, surpassing other datasets in length and descriptive detail.

by Gopika Raj

OpenAI’s Sora video generation model has sparked a new wave of research in long-duration, high-quality video synthesis. In response, researchers have introduced MiraData, a large-scale video dataset designed to bolster the field of video generation.

Read the full paper here.

MiraData features videos with an average duration of 72.1 seconds and detailed structured captions averaging 318 words, significantly surpassing existing datasets in both length and descriptive detail. The dataset was curated through a meticulous five-step process, including collection from diverse sources, video splitting and stitching, quality-based selection, and comprehensive captioning.

To demonstrate the effectiveness of MiraData, researchers developed MiraDiT, a Diffusion Transformer-based video generation model. When trained on MiraData, MiraDiT outperformed models trained on previous datasets, particularly in motion strength and 3D consistency.

The researchers also introduced MiraBench, an enhanced evaluation framework featuring 17 metrics across six key aspects of video generation, including temporal consistency, motion strength, and text-video alignment. This benchmark aims to provide a more comprehensive assessment of video generation models.

MiraData’s structured captions, which include detailed descriptions of main objects, backgrounds, camera movements, and video style, proved beneficial for increasing dynamics, enhancing temporal consistency, and improving text-video alignment in generated content.

MiraData shows promise in advancing video generation but has potential limitations and societal impacts, like dataset biases and misuse for deep fakes. Researchers emphasize the need for ethical guidelines and robust privacy protections in its development and application.

Models like Sora, Odyssey, and Pika are making strides in AI-powered video generation. With researchers introducing new datasets like MiraData, these video models will benefit from being trained on high-quality video datasets.

📣 Want to advertise in AIM? Book here

Gopika Raj

With a Master's degree in Journalism & Mass Communication, Gopika Raj infuses her technical writing with a distinctive flair. Intrigued by advancements in AI technology and its future prospects, her writing offers a fresh perspective in the tech domain, captivating readers along the way.

Researchers Unveil AudioX—AI Model That Converts Anything to Audio, Music

Foxconn Unveils FoxBrain—Chinese AI Model Poised for Open Source Release

Security Researchers Issue Stark Warning: Do Not Use DeepSeek-R1

Google Adds AI Video-Generation Model ‘Veo 2’ to YouTube Shorts

Adobe Adds New AI-Powered Video Model to Its Arsenal

ByteDance Unveils Goku to Take on Google’s Luma and OpenAI’s Sora

TikTok’s Parent Teases Video AI Model Rivaling OpenAI’s Sora, Turns Photos into Videos

How Human Perception of Brightness and Color Shapes Video Encoding Strategies

Association of Data Scientists

GenAI Corporate Training Programs

Our Upcoming Conference

Happy Llama 2025

India's Biggest Conference on AI Startups

April 25, 2025 | 📍 Hotel Radisson Blu, Bengaluru

Download the easiest way to
stay informed

‘Most Data Centres Are Not Ready for Liquid Cooling’, says Oracle Exec on NVIDIA Blackwell

Siddharth Jindal

Built on the Blackwell architecture introduced last year, Blackwell Ultra features the NVIDIA GB300 NVL72 rack-scale solution and the NVIDIA HG B300 NVL16 system.