
AI4Bharat Launches IndicTrans3 for 22 Indic Languages
AI4Bharat has also announced plans to release the training data soon, further contributing to the open-source AI ecosystem.
AI4Bharat has also announced plans to release the training data soon, further contributing to the open-source AI ecosystem.
“India’s 5,000-year-old civilisation holds a wealth of valuable information that can be used to build better AI solutions for the population.”
AI4Bharat also developed Indic-Spontaneous-Synth, a synthetic evaluation set to highlight how current models, though effective on datasets like FLEURS, tend to underperform in realistic, spontaneous language translation scenarios, underscoring the need for more robust datasets.
MILU’s evaluation shows that GPT-4 achieved the highest accuracy among 40+ tested models, scoring 72%.
The collaboration will focus on developing and training ASR models that can seamlessly recognize and transcribe conversations in multiple languages.
The organisation has also revamped its website, making datasets, models, and tools more accessible.
The initiative offers contributors the chance to gain experience in a live project environment, build their portfolios, and contribute to the Indian Language AI landscape.
LLMs are used for evaluating outputs of other LLMs which can influence leaderboards. Finding Blind Spots will help evaluate these evaluators.
In a podcast with AIM, Indic AI developers said that what India needs is more initiatives like AI4Bharat, with industry-academia collaboration.
It covers 22 languages with 251 billion tokens and 74.8 million instruction-response pairs.
A substantial 1639 hours have already been transcribed, with a median of 73 hours per language.
People+AI is actively involved in developing an open leaderboard for Indic large language models.
The four researchers triumphed in the LIMMITS ’24 challenge, which tasked participants with replicating a speaker’s voice in real-time in different languages.
An open-source ecosystem needs to develop for this to happen
“The aim is to reach a stage where Indic models become viable options for enterprise adoption,” said Ankit Bose.
The research lab has also released the instruction tuning datasets to enable further research for IndicLLMs
A self-taught AI enthusiast and developer, Vik Paruchuri, believes his OCR model, Surya, would help create low-resource Indic language datasets and models.
“We are building foundational models from scratch and that is what is keeping us busy,” said Ganesh Ramakrishnan about how BharatGPT will mark India on the global AI map.
The ideal candidate should be recent graduates with a bachelor’s degree in computer science or related fields from 2022 onwards.
OpenAI decides to enter India at a time when India-focused AI models are on the rise
With UAE and China racing through with launching vernacular-based LLMs, where does India stand in the AI race?
As of 2022, India was ranked the sixth country with the most AI investments. However, with the current generative AI race, Indian investors are nowhere in sight. But, why?
“It currently takes an average low-income Indian over 7 generations to make USD 1500 in savings and a Karya worker can make the same amount in less than a year.”
Sam Altman’s negative remark might have caused a stir, but India is indeed making progress in building foundational models catering to Indian markets
During this year’s Microsoft Build Conference, Microsoft demonstrated the implementation of a Generative AI-powered multilingual chatbot developed in India
BharatGPT is capable of running rich data besides text such as images, audio, video, and even maps
“Our entire purpose in this project is to enable the community to build the technology.”
Nilekani Philanthropies is supporting the institute with a grant of INR 36 crores.
The first version of Shoonya is expected to release later this month.
AIM’s Happy Llama 2025 brings AI startups, investors, and experts under one roof to drive meaningful growth and innovation.
© Analytics India Magazine Pvt Ltd & AIM Media House LLC 2025