Google DeepMind’s Gemini Robotics to Build AI for Robots of all Shapes and Sizes

Gemini Robotics-ER achieves a two to three times higher success rate than Gemini 2.0 in end-to-end settings.

Google DeepMind has introduced two new AI models, Gemini Robotics and Gemini Robotics-ER, which have been designed to enhance robotic capabilities in the physical world, the company announced.

These models are based on Gemini 2.0 and aim to enable robots to perform a broader range of real-world tasks. The company’s ultimate goal is “to develop AI that could work for any robot, no matter its shape or size”.

According to Google, for AI models to be useful in robotics, they must be “general, interactive, and dexterous” to adapt to various scenarios, understand commands, and perform tasks similar to human actions.

Gemini Robotics is a vision-language-action (VLA) model that allows robots to comprehend new situations and execute physical actions without specific training. For instance, it can handle tasks like folding paper or unscrewing a bottle cap. 

Notably, Figure AI’s recent Helix model is another company that recently cracked AI for humanoids, allowing the robots to perform complex tasks using natural language.

Gemini Robotics-ER is designed for roboticists to develop their own models, offering advanced spatial understanding and using the embodied reasoning abilities of Gemini.

It enhances Gemini 2.0’s capabilities by improving 2D and 3D object detection and pointing. This model allows roboticists to integrate it with existing low-level controllers, enabling robots to perform complex tasks like grasping objects safely. 

Google Deepmind researchers mentioned, “We trained the model primarily on data from the bi-arm robotic platform, ALOHA 2, but we also demonstrated that it could control a bi-arm platform, based on the Franka arms used in many academic labs.”

For example, when shown a coffee mug, Gemini Robotics-ER can determine an appropriate two-finger grasp and plan a safe approach trajectory. 

It achieves a two to three times higher success rate than Gemini 2.0 in end-to-end settings, and can use in-context learning based on human demonstrations when code generation is insufficient.

In a post on X, Google also mentioned partnering further with Apptronik, a US-based robotics company, to develop the next generation of humanoid robots using these models. This will open the giant to more partnerships in the future, like those with testers including Agile Robots, Agility Robotics, Boston Dynamics and Enchanted Tools.

The collaboration includes demonstrations of robots performing tasks such as connecting devices and packing lunchboxes in response to vocal commands. 

While the commercial availability of this technology has not been announced, Google continues to explore its capabilities. 

In the future, these models are expected to contribute significantly to developing more capable and adaptable robots.

📣 Want to advertise in AIM? Book here

Picture of Sanjana Gupta

Sanjana Gupta

An information designer who loves to learn about and try new developments in the field of tech and AI. She likes to spend her spare time reading and exploring absurdism in literature.
Related Posts
Association of Data Scientists
GenAI Corporate Training Programs
Our Upcoming Conference
India's Biggest Conference on AI Startups
April 25, 2025 | 📍 Hotel Radisson Blu, Bengaluru
Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.