Soket AI Labs Partners with Google Cloud to Boost Pragna-1B Model

Pragna-1B, developed by Soket AI Labs and Google Cloud, delivers state-of-the-art performance for vernacular languages.
Soket AI Labs Becomes the First Indian Startup to Build Solutions Towards Ethical AGI
Illustration by Raghavendra Rao

Soket AI Labs, the Indian AI research firm behind Pragna-1B, India’s first open-source multilingual foundation model, has announced a new collaboration with Google Cloud to further enhance the model’s capabilities and reach. Pragna-1B, which was initially released on May 1, 2024, aims to enable the adoption of Generative AI in India by providing support for vernacular languages such as Hindi, English, Bengali, and Gujarati.

Abhishek Upperwal, Founder of Soket AI Labs, said, “By leveraging Google cloud, Pragna-1B, despite being trained on fewer parameters, is efficient and compares performance in language processing tasks to similar category models.”

He further added, “Tailored specifically for vernacular languages, Pragna-1B offers balanced language representation and enables faster and more efficient tokenization suited for organisations seeking optimised operations and enhanced functionality.”

The collaboration also aims to make Pragna-1B more accessible to developers and organisations. Soket AI Labs plans to list its AI Developer Platform on the Google Cloud Marketplace and the Pragna series of models on the Google Vertex AI model registry. This integration will provide developers with a streamlined experience for fine-tuning models using high-performance resources like Vertex AI and TPUs.

The model has been designed specifically with Indian contexts in mind, ensuring transparency and clarity for enterprises integrating AI into their operations. Soket AI Labs leveraged Google Cloud’s AI infrastructure to achieve efficiency and cost-effectiveness in the development of Pragna-1B.

Google Cloud also plan to list Soket’s AI Developer Platform on the Google Cloud Marketplace and the Pragna series of models on the Google Vertex AI model registry. 

The collaboration between Soket AI Labs and Google Cloud also extends to technical work on training large-scale models and curating high-quality datasets for Indian languages. This joint effort aims to promote AI innovation in India while ensuring transparency and clarity in the development process.

The story so far

Soket AI Labs, founded by Abhishek Upperwal in 2019, created ‘Bhasha,’ a series of high-quality datasets designed for training Indian language models. This includes ‘Bhasha-wiki,’ which consists of 44.1 million articles translated from English Wikipedia into six Indian languages, and “Bhasha-wiki-indic,” a refined subset focusing on content relevant to India. 

Pragna-1B, features a Transformer Decoder-only architecture with 1.25 billion parameters and a context length of 2048 tokens. Trained on approximately 150 billion tokens, with a focus on Hindi, Bangla, and Gujarati, Pragna-1B delivers state-of-the-art performance for vernacular languages in a small form factor.

In a recent LinkedIn post, Upperwal highlighted the improvements in GPT-4o’s tokenizer and vocabulary size, which now supports 200k tokens. However, he noted that Pragna-1b’s tokenizer still outperforms GPT-4o when it comes to Kannada, Gujarati, Tamil, and Urdu, serving as a motivation for Soket AI Labs to improve support for Hindi and other Indian languages.

Soket AI Labs is also experimenting with a Mixture of Experts model, expanding the languages supported and exploring different architectures for increased optimisation. 

📣 Want to advertise in AIM? Book here

Picture of K L Krithika

K L Krithika

K L Krithika is a tech journalist at AIM. Apart from writing tech news, she enjoys reading sci-fi and pondering the impossible technologies, trying not to confuse it with reality.
Related Posts
Association of Data Scientists
GenAI Corporate Training Programs
Our Upcoming Conference
India's Biggest Conference on AI Startups
April 25, 2025 | 📍 Hotel Radisson Blu, Bengaluru
Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.