Physical Intelligence Launches ‘Hi Robot’, Helps Robots Think Through Actions

The system allows the robot to take real-time feedback in natural language, and “talk to itself” as it performs tasks.

Researchers at Physical Intelligence, an AI robotics company, have developed a system called the Hierarchical Interactive Robot (Hi Robot). This system enables robots to process complex instructions and feedback using vision-language models (VLMs) in a hierarchical structure. 

The system allows robots to break down intricate tasks into simpler steps, similar to how humans reason through complex problems using Daniel Kahneman’s ‘System 1’ and ‘System 2’ approaches

In this context, Hi Robot uses a high-level VLM to reason through complex prompts and a low-level VLM to execute actions. 

Testing and Training Using Synthetic Data

Researchers used synthetic data to train robots to follow complex instructions. Relying solely on real-life examples and atomic commands wasn’t enough to teach robots to handle multi-step tasks. 

To address this, they created synthetic datasets by pairing robot observations with hypothetical scenarios and human feedback. This approach helps the model learn how to interpret and respond to complex commands.

It outdid other methods, including GPT-4o and a flat Very Large Array (VLA) policy, by better following instructions and adapting to real-time corrections. It achieves a 40% higher instruction-following accuracy than GPT-4o. Hence, it demonstrates better alignment with user prompts and real-time observations. 

Source: Official blog

In real-world tests, Hi Robot performed tasks like clearing tables, making sandwiches, and grocery shopping. It effectively handled multi-stage instructions, adapted to real-time corrections, and respected constraints.

Synthetic data, in this context, highlights potential in robotics to efficiently simulate diverse scenarios, reducing the need for extensive real-world data collection.

Hi Robot ‘Talks to Itself’

As seen in an example below, a robot is trained to clean a table by disposing of trash and placing dishes in a bin. It can be directed to follow more intricate commands through Hi Robot. 

Source: Official blog

This system allows the robot to reason through modified commands provided in natural language, enabling it to “talk to itself” as it performs tasks. Moreover, Hi Robot can interpret user contextual comments, incorporating real-time feedback into its actions, such as handling complex prompts. 

This setup allows the robot to incorporate real-time feedback, such as when a user says “that’s not trash”, and adjust its actions accordingly. 

The system has been tested on various robotic platforms, including single-arm, dual-arm, and mobile robots, performing tasks like cleaning tables and making sandwiches.

“Can we get our robots to ‘think’ the same way, with a little ‘voice’ that tells them what to do when presented with a complex task?” the researchers said in the company’s official blog. This advancement could lead to more intuitive and flexible robot capabilities in real-world applications. 

Researchers plan to refine the system in the future by combining the high-level and low-level models, allowing for more adaptive processing of complex tasks.

📣 Want to advertise in AIM? Book here

Picture of Sanjana Gupta

Sanjana Gupta

An information designer who loves to learn about and try new developments in the field of tech and AI. She likes to spend her spare time reading and exploring absurdism in literature.
Related Posts
Association of Data Scientists
GenAI Corporate Training Programs
Our Upcoming Conference
India's Biggest Conference on AI Startups
April 25, 2025 | 📍 Hotel Radisson Blu, Bengaluru
Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.