How Krutrim Built Chitrarth for a Billion Indians

Chitrarth is designed to close the language gap by supporting Hindi, Bengali, Telugu, Tamil, Marathi, Gujarati, Kannada, Malayalam, Odia, and Assamese.
How-Krutrim-Built-Chitrarth-for-a-Billion-Indians
Illustration by Nalini Nirad

India has been aiming to develop its frontier AI model to serve the country’s vast population in their native language. However, this approach has many problems, including the lack of digitised data in Indian languages and also the unavailability of the images on which the models need to be trained.

To further the effort of building AI for Bharat, Ola’s Krutrim AI Lab has introduced Chitrarth, a multimodal Vision-Language Model (VLM). By combining multilingual text in ten predominant Indian languages with visual data, Chitrarth aims to democratise AI accessibility for over a billion Indians.

Most AI-powered VLMs struggle with linguistic inclusivity, as they are predominantly built on English datasets. This is also why BharatGen, the multimodal AI initiative supported by the Department of Science and Technology (DST), recently launched its e-vikrAI VLM for the Indian e-commerce ecosystem. 

Similarly, Chitrarth is designed to close this language gap by supporting Hindi, Bengali, Telugu, Tamil, Marathi, Gujarati, Kannada, Malayalam, Odia, and Assamese. The model was built using Krutrim’s multilingual LLM as its backbone, ensuring it understands and generates content in these languages with high accuracy.

What is Unique About Chitrarth?

According to the research paper, Chitrarth is built on Krutrim-7B and incorporates SIGLIP (siglip-so400m-patch14-384) as its vision encoder. Its architecture follows a two-stage training process: Adapter Pre-Training (PT) and Instruction Tuning (IT).

Pre-training is conducted using a dataset chosen for superior performance in initial experiments. The dataset is translated into multiple Indic languages using an open-source model, ensuring a balanced split between English and Indic languages.

This approach maintains linguistic diversity, computational efficiency, and fairness in performance across languages. Fine-tuning is performed on an instruction dataset, enhancing the model’s ability to handle multimodal reasoning tasks.

The dataset includes a vision-language component containing academic tasks, in-house multilingual translations, and culturally significant images. The training data includes images representing prominent personalities, monuments, artwork, and cuisine, ensuring the model understands India’s diverse cultural heritage.

Chitrarth excels in tasks such as image captioning, visual question answering (VQA), and text-based image retrieval. The model is trained on multilingual image-text pairs, allowing it to interpret and describe images in multiple Indian languages. 

This makes Chitrarth a game-changer for applications in education, accessibility, and digital content creation, enabling users to interact with AI in their native language without relying on English as an intermediary.

Like BharatGen, Chitrarth’s capabilities enable it to support various real-world applications, including e-commerce, UI/UX analysis, monitoring systems, and creative writing. 


For example, automating product descriptions and attribute extraction for online retailers like Myntra, AJIO, and Nykaa is what the team is targeting as presented in the blog. 

To evaluate Chitrarth’s performance across Indian languages, Krutrim developed BharatBench, a comprehensive benchmark suite designed for low-resource languages. BharatBench assesses VLMs on tasks such as VQA and image-text alignment, setting a new standard for multimodal AI in India. 

Besides, Chitrarth has been evaluated against VLMs on academic multimodal tasks, consistently outperforming models like IDEFICS 2 (7B) and PALO 7B while maintaining competitive performance on TextVQA and VizWiz benchmarks. 

Despite its advancements, Chitrarth faces challenges such as biases in automated translations and the availability of high-quality training data for Indic languages. 

The Road Ahead for Krutrim

Earlier this month, Ola chief Bhavish Aggarwal announced Krutrim AI Lab and the launch of several open source AI models tailored to India’s unique linguistic and cultural landscape. In addition to Chitrarth, these include the launch of Dhwani, Vyakhyarth, and Krutrim Translate. 

In partnership with NVIDIA, the lab will also deploy India’s first GB200 supercomputer by March, and plans to scale it into the nation’s largest supercomputer by the end of the year. 

This infrastructure will support the training and deployment of AI models, addressing challenges related to data scarcity and cultural context. The lab has committed to investing ₹2,000 crore into Krutrim, with a pledge to increase this to ₹10,000 crore by next year.

In an interview to Outlook Business, an Ola executive said they plan to release Krutrim’s third model on August 15. It is likely to be a Mixture of Experts model consisting of 700 billion parameters. The team also has ambitious plans to develop its own AI chip, Bodhi, by 2028.

📣 Want to advertise in AIM? Book here

Picture of Mohit Pandey

Mohit Pandey

Mohit writes about AI in simple, explainable, and sometimes funny words. He holds keen interest in discussing AI with people building it for India, and for Bharat, while also talking a little bit about AGI.
Related Posts
Association of Data Scientists
GenAI Corporate Training Programs
Our Upcoming Conference
India's Biggest Conference on AI Startups
April 25, 2025 | 📍 Hotel Radisson Blu, Bengaluru
Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.