OpenAI finally introduced the ChatGPT moment for AI agents with its new AI agent ‘Operator’. The agent can perform tasks on the web without human intervention based on users’ instructions. Notably, the focus of every enterprise and startup is on AI agents capable of performing tasks independently. Indian founders are no exception as they advance into the next phase of AI agents, now powered by voice capabilities.
The Rise of AI Voice Agents
Moving from text-based interaction to using voice to activate tasks and agents to run them is a trend that AI startups in India are actively pursuing.
Sudarshan Kamath, founder of smallest ai, which builds text-to-speech models and voice agents, shared his views on voice agents. The company’s journey into voice AI began with the realisation that everyone has a very different voice that they like. To address this, smallest.ai introduced voice cloning, which allows users to create customised voices with reference audio.
Smallest.ai’s focus on AI agents is rooted in their potential to handle complex tasks in real time. “There are companies who are moving away from IVR-based systems to voice bot-based systems, and these voice bot-based systems are smarter, more interactive, and more realistic,” Kamath said while interacting with AIM.
He explained the use case of these voice agents in content creation, such as companies producing product videos or marketing campaigns. “Or, it could be individuals who are influencers or social media accounts who are basically trying to create content on Instagram, YouTube,” he added.
Why Voice?

Kamath highlighted how large enterprises, including publicly listed ones, are increasingly exploring voice-based workflows. “This shift has happened because generative AI has made these voices much more realistic while maintaining very low latencies,” he said.
Kamath also believes that voice-based solutions offer a high return on investment (ROI) and significantly enhance engagement and user experience. “So, investors are fairly bullish about the voice as the market itself is going to grow.”
Bengaluru-based conversational AI and voice automation startup Gnani.ai claims to currently handle 30,000 concurrent conversations and a few million voice AI conversations daily. Their voice-first SLMs for Indian enterprises are trained on millions of audio hours and billions of Indic language conversations.
“Indian AI startups are focusing on building voice agents due to the country’s diverse linguistic landscape, the rapid adoption of smartphones, and the increasing demand for seamless customer interactions across industries,” Ganesh Gopalan, co-founder and CEO of Gnani.ai, told AIM.
“The rise of vernacular voice interfaces also aligns with the push for digital inclusion in India, enabling startups to cater to a broader audience while tapping into the growing demand for localised, AI-driven solutions,” he added.
Gnani.ai caters to industries such as banking, finance, and insurance and helps them use AI-powered solutions for tasks such as customer support, lead qualification, EMI collection, and insurance renewals.
“Some focus on multilingual support with high accuracy in regional languages, while others emphasise industry-specific solutions, such as BFSI, healthcare, or retail, tailoring their AI to address niche requirements,” he said.
Another Bengaluru-based voice AI startup, Navana.ai, develops indigenous AI-powered speech recognition and natural language processing (NLP) solutions. Having worked with institutions such as IISc Bangalore, IIT Madras, and Bhashini on open-source data collection efforts and co-authoring academic papers. Their voice agents are integrated into applications for industries such as BFSI, agriculture and government services. Ujjivan and Bajaj Finserv are a few of their customers.
“In the last year and a half to two years, the big shift came when LLMs came around and made telephony a very viable channel to reach all of India and plug in AI to do all sorts of services,” Raoul Nanavati, co-founder and CEO of Navana ai, said during an interaction with AIM.
Nanavati believes that voice agents are gaining traction because they make digital services accessible to first-time internet users and address India’s linguistic diversity. “None of them [Google, Microsoft, AWS] worked for Indian languages at that time. Even today, most don’t work for non-major languages or low resource languages,” he emphasised.
What Next?
With voice gaining prominence as a key mode for AI agents, the focus is now probably shifting toward identifying the next trend in the field.
“After voice agents, the next trend in AI is likely to revolve around multimodal AI agents that integrate voice, text, and visual interactions for more immersive and context-aware experiences,” Gopalan said.
He believes that such systems can improve user engagement and create a more intuitive and enriched interface.
“Additionally, the focus will shift toward hyper-personalisation powered by generative AI, where conversational agents predict and adapt to user needs in real time,” he concluded.
Similarly, Praveer Kochhar, co-founder of Kogo Tech Labs, which recently unveiled universal voice assistants for automobiles, believes the agentic systems will move towards larger goal accomplishment. “This transition from task-oriented agentic flows to goal-oriented flows is the next big thing that you’ll start seeing, whether it’s in front office, back office, direct to customers, everywhere,” he told AIM.