Google DeepMind has introduced two new AI models, Gemini Robotics and Gemini Robotics-ER, which have been designed to enhance robotic capabilities in the physical world, the company announced.
These models are based on Gemini 2.0 and aim to enable robots to perform a broader range of real-world tasks. The company’s ultimate goal is “to develop AI that could work for any robot, no matter its shape or size”.
According to Google, for AI models to be useful in robotics, they must be “general, interactive, and dexterous” to adapt to various scenarios, understand commands, and perform tasks similar to human actions.
Gemini Robotics is a vision-language-action (VLA) model that allows robots to comprehend new situations and execute physical actions without specific training. For instance, it can handle tasks like folding paper or unscrewing a bottle cap.
Notably, Figure AI’s recent Helix model is another company that recently cracked AI for humanoids, allowing the robots to perform complex tasks using natural language.
Gemini Robotics-ER is designed for roboticists to develop their own models, offering advanced spatial understanding and using the embodied reasoning abilities of Gemini.
It enhances Gemini 2.0’s capabilities by improving 2D and 3D object detection and pointing. This model allows roboticists to integrate it with existing low-level controllers, enabling robots to perform complex tasks like grasping objects safely.
Google Deepmind researchers mentioned, “We trained the model primarily on data from the bi-arm robotic platform, ALOHA 2, but we also demonstrated that it could control a bi-arm platform, based on the Franka arms used in many academic labs.”
For example, when shown a coffee mug, Gemini Robotics-ER can determine an appropriate two-finger grasp and plan a safe approach trajectory.
It achieves a two to three times higher success rate than Gemini 2.0 in end-to-end settings, and can use in-context learning based on human demonstrations when code generation is insufficient.
Our model Gemini Robotics-ER allows roboticists to tap into the embodied reasoning of Gemini. 🌐
— Google DeepMind (@GoogleDeepMind) March 12, 2025
For example, if a robot came across a coffee mug, it could detect it, use ‘pointing’ to recognize parts it could interact with – like the handle – and recognize objects to avoid when… pic.twitter.com/HQMXvWLoJ5
In a post on X, Google also mentioned partnering further with Apptronik, a US-based robotics company, to develop the next generation of humanoid robots using these models. This will open the giant to more partnerships in the future, like those with testers including Agile Robots, Agility Robotics, Boston Dynamics and Enchanted Tools.
The collaboration includes demonstrations of robots performing tasks such as connecting devices and packing lunchboxes in response to vocal commands.
While the commercial availability of this technology has not been announced, Google continues to explore its capabilities.
In the future, these models are expected to contribute significantly to developing more capable and adaptable robots.