How Sakana trained a 7B model to orchestrate GPT-5, Claude Sonnet 4 and Gemini 2.5 Pro

VentureBeat AI · May 7, 2026, 9:23 PM · Also reported by 2 other sources

Why this matters: a development in AI with implications for how people work, create, and decide.

Every Lang Chain pipeline your team hardcodes starts breaking the moment the query distribution shifts — and it always shifts. That bottleneck is what Sakana AI set out to eliminate.Researchers at Sakana AI have introduced the "RL Conductor," a small language model trained via reinforcement learning to automatically orchestrate a diverse pool of worker LLMs. Conductor dynamically analyzes inputs, distributes labor among workers, and coordinates among agents.This automated coordination achieves state-of-the-art results on difficult reasoning and coding benchmarks, outperforming individual frontier models like GPT-5 and Claude Sonnet 4 as well as expensive human-designed multi-agent pipelines. It achieves this performance at a fraction of the cost and with fewer API calls than competitors. RL Conductor is the backbone of Fugu, Sakana AI’s commercial multi-agent orchestration service.The limitations of manual agentic frameworksLarge language models have strong latent capabilities. But tapping these capabilities to their fullest is a great challenge. Extracting this level of performance relies heavily on manually designed agentic workflows, which serve as critical components in commercial AI products. However, these frameworks fall short because they are inherently rigid and constrained. In comments to VentureBeat, Yujin Tang, co-author of the paper, explained the exact breaking point of current systems: "While using frameworks with hard-coded pipelines like LangChain and Mixture-of-Agents can work well for specific use cases … In production, an inherent bottleneck arises when targeting domains with large user bases with very heterogeneous demands." Tang noted that achieving "real-world generalization in such heterogeneous applications inherently necessitates going beyond human-hardcoded designs."Another bottleneck for building robust agentic systems is that no single model is optimal for all tasks. Different models are fine-tuned to specialize in distinct domains. One m

Article preview — originally published by VentureBeat AI. Full story at the source.

Read full story on VentureBeat AI → More top stories

Also covered by

9to5Mac iOS 27 will let you choose between Gemini, Claude, and more for AI features: report MacRumors iOS 27 Will Let You Pick Claude or Gemini Instead of ChatGPT for Apple Intelligence

Aggregated and edited by the Scoop newsroom. We surface news from VentureBeat AI alongside other reporting so you can compare coverage in one place. Editorial policy · Corrections · About Scoop

How Sakana trained a 7B model to orchestrate GPT-5, Claude Sonnet 4 and Gemini 2.5 Pro

Also covered by

More in ai