New Robotics Method AnyPlace Achieves Object Placement Through VLMs, Synthetic Data

This advancement addresses the challenges of object placement, which is often difficult due to variations in object shapes and placement arrangements.

Researchers have introduced a new two-stage method for robotic object placement called AnyPlace, which demonstrates the ability to predict feasible placement poses. This advancement addresses the challenges of object placement, which is often difficult due to variations in object shapes and placement arrangements. 

According to Animesh Garg, one of the researchers from Georgia Institute of Technology, the work addresses the challenge of robot placement, focusing on the generalisability of solutions rather than domain-specific ones.

The system uses a vision language model (VLM) to produce potential placement locations, combined with depth-based models for geometric placement prediction. 

“Our AnyPlace pipeline consists of two stages: high-level placement position prediction and low-level pose prediction,” the researcher paper stated.

The first stage uses Molmo, a VLM, and SAM 2, a large segmentation model, to segment objects and propose placement locations. Only the region around the proposed placement is fed into the low-level pose prediction model, which uses point clouds of objects to be placed and regions of placement locations.

Synthetic Data Generation 

The creators of AnyPlace have developed a fully synthetic dataset of 1,489 randomly generated objects, covering insertion, stacking, and hanging. In total, 13 categories were created, and 5,370 placement poses were generated, as per the paper

This approach helps overcome limitations of real-world data collection, enabling the model to generalise across objects and scenarios.

Garg noted that for object placement, it is possible to generate highly effective synthetic data, allowing for the creation of a grasp predictor for any object using only synthetic data.

“The use of depth data minimises the sim-to-real gap, making the model applicable in real-world scenarios with limited real-world data collection,” Garg noted. The synthetic data generation process creates variability in object shapes and sizes, improving the model’s robustness.

The model achieved an 80% success rate on the vial insertion task, showing robustness and generalisation. The simulation results outperform baselines in success rates, coverage of placement modes and fine-placement precision.

For real-world results, the method transfers directly from synthetic to real-world tasks, “succeeding where others struggle”.

Another recently released research introduces Phantom, a method to train robot policies without collecting any robot data and using only human video demonstrations.

Phantom turns human videos into “robot” demonstrations, making it significantly easier to scale up and diversify robotics data.

📣 Want to advertise in AIM? Book here

Picture of Sanjana Gupta

Sanjana Gupta

An information designer who loves to learn about and try new developments in the field of tech and AI. She likes to spend her spare time reading and exploring absurdism in literature.
Related Posts
Association of Data Scientists
GenAI Corporate Training Programs
Our Upcoming Conference
India's Biggest Conference on AI Startups
April 25, 2025 | 📍 Hotel Radisson Blu, Bengaluru
Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.