Published on March 13, 2025
In AI News

Google DeepMind’s Gemini Robotics to Build AI for Robots of all Shapes and Sizes

Gemini Robotics-ER achieves a two to three times higher success rate than Gemini 2.0 in end-to-end settings.

by Sanjana Gupta

Google DeepMind has introduced two new AI models, Gemini Robotics and Gemini Robotics-ER, which have been designed to enhance robotic capabilities in the physical world, the company announced.

These models are based on Gemini 2.0 and aim to enable robots to perform a broader range of real-world tasks. The company’s ultimate goal is “to develop AI that could work for any robot, no matter its shape or size”.

According to Google, for AI models to be useful in robotics, they must be “general, interactive, and dexterous” to adapt to various scenarios, understand commands, and perform tasks similar to human actions.

Gemini Robotics is a vision-language-action (VLA) model that allows robots to comprehend new situations and execute physical actions without specific training. For instance, it can handle tasks like folding paper or unscrewing a bottle cap.

Notably, Figure AI’s recent Helix model is another company that recently cracked AI for humanoids, allowing the robots to perform complex tasks using natural language.

Gemini Robotics-ER is designed for roboticists to develop their own models, offering advanced spatial understanding and using the embodied reasoning abilities of Gemini.

It enhances Gemini 2.0’s capabilities by improving 2D and 3D object detection and pointing. This model allows roboticists to integrate it with existing low-level controllers, enabling robots to perform complex tasks like grasping objects safely.

Google Deepmind researchers mentioned, “We trained the model primarily on data from the bi-arm robotic platform, ALOHA 2, but we also demonstrated that it could control a bi-arm platform, based on the Franka arms used in many academic labs.”

For example, when shown a coffee mug, Gemini Robotics-ER can determine an appropriate two-finger grasp and plan a safe approach trajectory.

It achieves a two to three times higher success rate than Gemini 2.0 in end-to-end settings, and can use in-context learning based on human demonstrations when code generation is insufficient.

Our model Gemini Robotics-ER allows roboticists to tap into the embodied reasoning of Gemini. 🌐

For example, if a robot came across a coffee mug, it could detect it, use ‘pointing’ to recognize parts it could interact with – like the handle – and recognize objects to avoid when… pic.twitter.com/HQMXvWLoJ5
— Google DeepMind (@GoogleDeepMind) March 12, 2025

In a post on X, Google also mentioned partnering further with Apptronik, a US-based robotics company, to develop the next generation of humanoid robots using these models. This will open the giant to more partnerships in the future, like those with testers including Agile Robots, Agility Robotics, Boston Dynamics and Enchanted Tools.

The collaboration includes demonstrations of robots performing tasks such as connecting devices and packing lunchboxes in response to vocal commands.

While the commercial availability of this technology has not been announced, Google continues to explore its capabilities.

In the future, these models are expected to contribute significantly to developing more capable and adaptable robots.

📣 Want to advertise in AIM? Book here

Sanjana Gupta

An information designer who loves to learn about and try new developments in the field of tech and AI. She likes to spend her spare time reading and exploring absurdism in literature.

Dexterity AI Launches the ‘World’s First Industrial Superhumanoid Robot’

Apptronik Secures $53M in Extended Series A, Backed by Mercedes-Benz

New Robotics Method AnyPlace Achieves Object Placement Through VLMs, Synthetic Data

UBTECH Advances Humanoid Robotics with Swarm Intelligence Training at Zeekr

Physical Intelligence Launches ‘Hi Robot’, Helps Robots Think Through Actions

Apptronik, Jabil Partner to Scale Apollo, Advancing Robots that Build Robots

Amazon Forms Frontier AI & Robotics Team to Revolutionise Automation

Figure Cracks AI for Humanoids Before OpenAI Can

Association of Data Scientists

GenAI Corporate Training Programs

Our Upcoming Conference

Happy Llama 2025

India's Biggest Conference on AI Startups

April 25, 2025 | 📍 Hotel Radisson Blu, Bengaluru

Download the easiest way to
stay informed

‘Most Data Centres Are Not Ready for Liquid Cooling’, says Oracle Exec on NVIDIA Blackwell

Siddharth Jindal

Built on the Blackwell architecture introduced last year, Blackwell Ultra features the NVIDIA GB300 NVL72 rack-scale solution and the NVIDIA HG B300 NVL16 system.