Vikas Mishra, a platform cloud architect at Google Cloud, shared his insights on the complexities of deploying AI agents in production at MLDS 2025, India’s biggest developers summit hosted by AIM Media House
Mishra said that while the potential of large language models (LLMs) is widely recognised, translating that into real-world applications presents significant hurdles. “2025 is going to be the year of production,” he predicted, highlighting the increasing pressure to move beyond theoretical applications and implement tangible solutions.
“You could have multiple agents for both your internal and external use cases—things like your customer agent, employee agent, CX agent, and your code agent,” he said.
Mishra outlined the key challenges that hinder successful LLM deployment: performance, cost, latency, and safety. He said that “only about 50% of people are actually running them in production,” despite a much higher intent.
He argued that this gap stems from difficulties in evaluating, debugging, and ensuring the continuous functionality of agents, especially within complex multi-agent workflows. A minor change in prompt or model behaviour can negatively impact user experience, creating a need for robust evaluation and monitoring tools.
Google Cloud Vertex AI to the Rescue
Mishra stressed the importance of selecting the right platform. “It’s not about the models,” he declared, “it’s about the platform.” He added that picking the right platform solves about 70% of the customer’s problems.
He introduced Google Cloud’s Vertex AI as a unified platform that addresses the challenges of productionising AI agents. Vertex AI provides access to a wide range of models, including Gemini, and offers tools for tuning, hosting, and managing them.
Mishra said that Google Cloud’s Model Garden, a platform for building, testing, and deploying models, hosts over 160 LLMs, including both closed- and open-source models.“Models like Llama, Claude, and DeepSeek are all available on Model Garden,” he said.
Speaking of Google’s Gemini and its unique capabilities, including a “2 million context window” that enables the processing of vast amounts of information, he showcased a live demonstration of the model’s multimodal capabilities, interacting with it through both audio and visual inputs.
RAG in Agents
Mishra discussed the crucial role of grounding LLMs with real-time information. “You want access to real-time data,” he said, noting that Vertex AI allows grounding with Google Search and third-party data sources.
He also touched upon the process of model adaptation, from prompt design to full training, and introduced various agent development tools, including Agent Builder, which offers low-code and no-code options.
Mishra emphasised that agent evaluation is critical but difficult.
He described Vertex AI’s online evaluation service and introduced ‘Autorator’, a judge model used for AI benchmarking. He also addressed the importance of model observability and tracing, highlighting the use of Cloud Trace and Cloud Logging for debugging multi-agent workflows.
Finally, he addressed the critical aspects of security and governance. “You own your data,” he said, assuring the attendees that Google Cloud does not use customer data to train its models. He also emphasised the importance of safety filters and content moderation tools.
He concluded by reiterating that Vertex AI provides a comprehensive platform for taking AI agents from prototype to production, leveraging Google Cloud’s infrastructure and expertise.