AI4Bharat, the AI lab at IIT Madras, has introduced IndicTrans3-beta, a state-of-the-art (SOTA) multilingual translation model designed to support translations across 22 Indic languages.
Click here to test out the model.
The model is optimised for document-level machine translation (MT) and aims to deliver performance on par with leading global translation models.
The key features of IndicTrans3 include high-accuracy translations, support for multiple Indian languages, and real-world optimisation for diverse applications.
AI4Bharat has also announced plans to release the training data soon, further contributing to the open-source AI ecosystem.
Mitesh Khapra, the head of AI4Bharat, posted on LinkedIn, saying, “Over the past 4 years, we at AI4Bharat have been on a mission to accelerate Indian language AI —building large-scale datasets, models, and tools, and releasing everything open-source for the community. Now, all our contributions are available on Hugging Face!”
Khapra also thanks EkStep Foundation, Nilekani Philanthropies, and Bhashini (MeitY), for helping in the development.
IndicTrans2, the previous version of the multilingual translation model, has been heavily adopted by several Indian companies for AI research and development.
Last year in November, AI4Bharat announced the launch of BhasaAnuvaad, a speech translation dataset tailored for Indian languages, boasting coverage across 13 languages and approximately 44,400 hours of audio.
This marks the largest publicly accessible speech translation resource of its kind for Indian linguistic diversity.