Published on February 24, 2025
In AI News

Meta’s New Framework Could Help Make The Next AI Einstein

Meta’s new experimental framework aims to improve LLM agents that help in research.

Illustration by Nikhil Kumar

by Ankush Das

The Meta research team has unveiled an experimental framework called MLGym and MLGym-Bench to train and evaluate AI research agents on various AI research tasks. This comes days after Google unveiled its co-scientist system.

The researchers mention that it is the first machine-learning Gym environment, enabling research on reinforcement learning algorithms for training AI research agents.

As per the research paper, MLGym-Bench consists of 13 diverse and open-ended AI research tasks from various domains, including computer vision, natural language processing, reinforcement learning, and game theory. Solving these tasks requires real-world AI research skills such as generating new ideas and hypotheses, creating and processing data, implementing ML methods, training models, running experiments, analysing the results, and iterating through this process to improve on a given task.

“Our MLGym framework makes it easy to add new tasks, integrate and evaluate models or agents, generate synthetic data at scale, as well as develop new learning algorithms for training agents on AI research tasks,” the research team mentioned.

“We find that current frontier models can improve on the given baselines, usually by finding better hyperparameters, but do not generate novel hypotheses, algorithms, architectures, or substantial improvements,” the researchers added. With this framework, they aim to empower LLM agents capable of independently generating scientific hypotheses, writing scientific papers, analysing results, and more.

The MLGym framework is designed to be modular and extensible, allowing researchers to easily add new tasks, datasets, and tools.

The framework also provides a default agentic harness that can be used to evaluate any base model. It is interesting to note that the research team evaluated several LLMs on the MLGym-Bench benchmark, including Claude-3.5-Sonnet, Llama-3.1 405B, GPT-4o, o1-preview, and Gemini-1.5 Pro. As per their tests, they say Gemini-1.5 Pro is the most cost-effective option in the context of research.

The team hopes that open-sourcing the framework and benchmark will facilitate future research and advance the AI research capabilities of LLM agents. You can find the code on its GitHub page.

While we know the future of research is agentic, these frameworks and benchmarks will help make a difference by providing a standardised way to evaluate and compare AI research agents.

📣 Want to advertise in AIM? Book here

Ankush Das

I am a tech aficionado and a computer science graduate with a keen interest in AI, Open Source, and Cybersecurity.

Meta in Talks with TSMC to Launch its First In-House Own Chip

How-Meta-is-Improving-Reasoning-Capabilities-in-Ll

Meta Introduces Aria Gen 2, Its Next-Gen AI Research Glasses

Meta Is No Longer Just a Social Media Company

Meta in Talks to Acquire South Korean AI Chip Startup FuriosaAI

Meta’s New Report Shows How to Prevent ‘Catastrophic Risks’ from AI

Meta Surpasses Earnings Estimates as Zuckerberg Predicts ‘Really Big Year’ in AI

Meta Plans $65 Bn Investment to Build a 2GW+ Data Center in Manhattan

In a Relief to Meta, Law Tribunal Stays WhatsApp Data Sharing Ban

Association of Data Scientists

GenAI Corporate Training Programs

Our Upcoming Conference

Happy Llama 2025

India's Biggest Conference on AI Startups

April 25, 2025 | 📍 Hotel Radisson Blu, Bengaluru

Download the easiest way to
stay informed

‘Most Data Centres Are Not Ready for Liquid Cooling’, says Oracle Exec on NVIDIA Blackwell

Siddharth Jindal

Built on the Blackwell architecture introduced last year, Blackwell Ultra features the NVIDIA GB300 NVL72 rack-scale solution and the NVIDIA HG B300 NVL16 system.