Published on January 17, 2025
In AI News

Sakana.ai Introduces Transformer², a Self-Adaptive AI

The researchers explain that Transformer² can adapt like a living brain.

by Sanjana Gupta

Sakana.ai, a Tokyo-based AI and R&D startup, just released a self-adaptive AI system called Transformer². The company proposed this as an ML system that dynamically adjusts its weights for various tasks.

Sharing a video on X, the company announced on Wednesday, saying, “Adaptation is a remarkable natural phenomenon, like how the octopus can blend in with its environment or how the brain rewires itself after injury.”

We’re excited to introduce Transformer², a machine learning system that dynamically adjusts its weights for various tasks!https://t.co/ci028qPUWt

Adaptation is a remarkable natural phenomenon, like how the octopus can blend in with its environment, or how the brain rewires… pic.twitter.com/GLODSaZnu4
— Sakana AI (@SakanaAILabs) January 15, 2025

This system, introduced in a research paper ‘Transformer²: Self-adaptive LLMs’, offers a dynamic approach to task handling, setting it apart from traditional, static AI models.

Transformer² utilises a two-step process to adapt its weight matrices in real time, tailoring its operations for specific tasks such as mathematics, coding, reasoning, and visual understanding.

“This model analyses incoming tasks, adjusts its weights, and delivers optimal results through task-specific adaptations,” the researchers said.

Check out the GitHub repository here: https://github.com/SakanaAI/self-adaptive-llms

The Brain of LLMs

The system uses a mathematical technique called Singular Value Decomposition (SVD) to understand which parts of the AI are important for different tasks.

*Left: LLM’s “brain” (i.e., weight matrices) into several independent components using SVD.* | *Right: Using RL to train the combination of these components for various tasks.*

It then uses a method combining SVD fine-tuning with reinforcement learning (RL) to create instructions for adjusting the model’s behaviour, represented as compact ‘z-vectors’.

During inference, the tool employs three strategies—prompt-based, classifier-based, and few-shot adaptation—to detect task types and adjust accordingly.

The researchers noted, “This approach ensures robust and efficient adaptation, outperforming static systems like LoRA across a range of scenarios.”

Superior Performance

Tests on tasks across both the Llama and Mistral LLMs, including GSM8K (math), HumanEval (coding), and TextVQA (visual understanding), revealed superior performance, with significant gains in adaptability and efficiency.

One surprising discovery was that when solving complex math problems, Transformer² combines different types of reasoning—not just mathematical but also programming and logical thinking—similar to how humans approach complex problems.

In an unexpected breakthrough, the researchers found that knowledge gained by one AI model could be transferred to another.

When they moved the learning patterns from one model (Llama) to another (Mistral), the second model showed improved performance on most tasks. However, the researchers note that this worked because both systems had similar underlying structures.

*Left: Self-adaptation on unseen tasks.* | *Right: Learned z-vectors interpolation weights.*

“This marks a significant step toward creating ‘living intelligence’ in AI systems,” the research team explained. They envision future AI systems that can continuously learn and adapt like living beings rather than remaining fixed after their initial training.

They concluded, “This marks a shift from static AI to dynamic models capable of lifelong learning and adaptation, redefining how we interact with intelligent systems.”

📣 Want to advertise in AIM? Book here

Sanjana Gupta

An information designer who loves to learn about and try new developments in the field of tech and AI. She likes to spend her spare time reading and exploring absurdism in literature.

‘We Can’t Just Upload Docs to Any LLM,’ Godrej Capital CTO on Building Saksham AI

Why Businesses Shouldn’t Treat LLMs as Databases

Sakana’s AI CUDA Engineer Delivers Up to 100x Speed Gains Over PyTorch

Google’s New AI Architecture ‘Titans’ Can Remember Long-Term Data

LLMs that Failed Miserably in 2024

Reverse Thinking Could Make LLMs Smarter, More Accurate

LLMs Get ‘Anxious’ Just Like Humans

Why India is Even Building Speech Models

Association of Data Scientists

GenAI Corporate Training Programs

Our Upcoming Conference

Happy Llama 2025

India's Biggest Conference on AI Startups

April 25, 2025 | 📍 Hotel Radisson Blu, Bengaluru

Download the easiest way to
stay informed

‘Most Data Centres Are Not Ready for Liquid Cooling’, says Oracle Exec on NVIDIA Blackwell

Siddharth Jindal

Built on the Blackwell architecture introduced last year, Blackwell Ultra features the NVIDIA GB300 NVL72 rack-scale solution and the NVIDIA HG B300 NVL16 system.