Researchers Unveil MiraData for Longer Video Generation With Structured Captions

MiraData offers videos averaging 72.1 seconds with detailed captions of 318 words, surpassing other datasets in length and descriptive detail.

OpenAI’s Sora video generation model has sparked a new wave of research in long-duration, high-quality video synthesis. In response, researchers have introduced MiraData, a large-scale video dataset designed to bolster the field of video generation.

Read the full paper here

MiraData features videos with an average duration of 72.1 seconds and detailed structured captions averaging 318 words, significantly surpassing existing datasets in both length and descriptive detail. The dataset was curated through a meticulous five-step process, including collection from diverse sources, video splitting and stitching, quality-based selection, and comprehensive captioning.

To demonstrate the effectiveness of MiraData, researchers developed MiraDiT, a Diffusion Transformer-based video generation model. When trained on MiraData, MiraDiT outperformed models trained on previous datasets, particularly in motion strength and 3D consistency.

The researchers also introduced MiraBench, an enhanced evaluation framework featuring 17 metrics across six key aspects of video generation, including temporal consistency, motion strength, and text-video alignment. This benchmark aims to provide a more comprehensive assessment of video generation models.

MiraData’s structured captions, which include detailed descriptions of main objects, backgrounds, camera movements, and video style, proved beneficial for increasing dynamics, enhancing temporal consistency, and improving text-video alignment in generated content. 

MiraData shows promise in advancing video generation but has potential limitations and societal impacts, like dataset biases and misuse for deep fakes. Researchers emphasize the need for ethical guidelines and robust privacy protections in its development and application.

Models like Sora, Odyssey, and Pika are making strides in AI-powered video generation. With researchers introducing new datasets like MiraData, these video models will benefit from being trained on high-quality video datasets.

📣 Want to advertise in AIM? Book here

Picture of Gopika Raj

Gopika Raj

With a Master's degree in Journalism & Mass Communication, Gopika Raj infuses her technical writing with a distinctive flair. Intrigued by advancements in AI technology and its future prospects, her writing offers a fresh perspective in the tech domain, captivating readers along the way.
Related Posts
Association of Data Scientists
GenAI Corporate Training Programs
Our Upcoming Conference
India's Biggest Conference on AI Startups
April 25, 2025 | 📍 Hotel Radisson Blu, Bengaluru
Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.