Stanford's DeLM cuts multi-agent task costs 50% — without a central orchestrator

VentureBeat AI · Jun 16, 2026, 5:47 PM

Why this matters: a development in AI with implications for how people work, create, and decide.

One of the assumptions behind today’s AI frameworks is that agents require a “boss” at the center; this orchestrator runs the show, routes requests, and makes sure the whole system doesn’t descend into chaos. That assumption may be wrong, and the cost of carrying it could be measured in inference dollars and coordination latency. A new Stanford framework called a decentralized language model, or De LM, is built on the premise that agents can coordinate directly, without routing every update through a central controller.DeLM's shared knowledge base serves as a “common communication substrate” so that agents can build upon one another’s verified progress without having to route every interaction through a main agent to “merge, filter, and rebroadcast,” Yuzhen Mao and Azalia Mirhoseini, co-developers of the framework, explain in a research paper. It’s a system that’s not only possible, but desirable in certain instances. “Agents can build on prior findings, avoid repeated failures, preserve constraints, and recover detailed evidence only when needed.”The challenges of traditional multi-agent systemsIn a typical centralized multi-agent system, a main agent breaks tasks into subtasks, assigns them out to multiple sub-agents in parallel, waits for responses, merges and summarizes intermediate progress, then launches a next wave of orders based on collected context. While this is a natural way to scale LLM reasoning, the Stanford researchers argue that it scales poorly. Every useful finding, partial finding, and failure must be reported back to the main agent, which then determines what information to merge and rebroadcast to the agents below it. “As the number of subtasks grows, this controller becomes a communication and integration bottleneck,” Mao and Mirhoseini write. Further, the main orchestrator may “dilute, omit, or distort” useful information, leading to lost progress. This bottleneck also occurs in long-context reasoning scenarios. Once it receives reports b

Article preview — originally published by VentureBeat AI. Full story at the source.

Read full story on VentureBeat AI → More top stories

Aggregated and edited by the Scoop newsroom. We surface news from VentureBeat AI alongside other reporting so you can compare coverage in one place. Editorial policy · Corrections · About Scoop

Stanford's DeLM cuts multi-agent task costs 50% — without a central orchestrator

More in ai