How RecursiveMAS speeds up multi-agent inference by 2.4x and reduces token usage by 75%

VentureBeat AI · May 15, 2026, 9:04 PM

Why this matters: a development in AI with implications for how people work, create, and decide.

One of the key challenges of current multi-agent AI systems is that they communicate by generating and sharing text sequences, which introduces latency, drives up token costs, and makes it difficult to train the entire system as a cohesive unit. To overcome this challenge, researchers at University of Illinois Urbana-Champaign and Stanford University developed Recursive MAS, a framework that enables agents to collaborate and transmit information through embedding space instead of text. This change results in both efficiency and performance gains. Experiments show that RecursiveMAS achieves accuracy improvement across complex domains like code generation, medical reasoning, and search, while also increasing inference speed and slashing token usage. RecursiveMAS is significantly cheaper to train than standard full fine-tuning or LoRA methods, making it a scalable and cost-effective blueprint for custom multi-agent systems.The challenges of improving multi-agent systemsMulti-agent systems can help tackle complex tasks that single-agent systems struggle to handle. When scaling multi-agent systems for real-world applications, a big challenge is enabling the system to evolve, improve, and adapt to different scenarios over time. Prompt-based adaptation improves agent interactions by iteratively refining the shared context provided to the agents. By updating the prompts, the system acts as a director, guiding the agents to generate responses that are more aligned with the overarching goal. The fundamental limitation is that the capabilities of the models underlying each agent remain static. A more sophisticated approach is to train the agents by updating the weights of the underlying models. Training an entire system of agents is difficult because updating all the parameters across multiple models is computationally non-trivial.Even if an engineering team commits to training their models, the standard method of agents communicating via text-based interactions creates major b

Article preview — originally published by VentureBeat AI. Full story at the source.

Read full story on VentureBeat AI → More top stories

Aggregated and edited by the Scoop newsroom. We surface news from VentureBeat AI alongside other reporting so you can compare coverage in one place. Editorial policy · Corrections · About Scoop

How RecursiveMAS speeds up multi-agent inference by 2.4x and reduces token usage by 75%

More in ai