Alibaba's model never trained as an agent — and improved agent performance across seven benchmarks

VentureBeat AI · Jun 24, 2026, 7:15 PM · Also reported by 2 other sources

Why this matters: a development in AI with implications for how people work, create, and decide.

Alibaba's Qwen team released Qwen-Agent World on Tuesday — two models trained not to act inside agent environments, but to predict what those environments return. The release covers seven domains under a single architecture: MCP, Search, Terminal, Software Engineering, Android, Web, and OS. The release extends Alibaba's recent push into autonomous agents. Qwen3.7-Max, released in May, was built around a 35-hour autonomous execution capability. That shift targets a ceiling teams training agents at scale run into directly. Real search engines surface whatever results exist, with no mechanism to inject controlled conditions. Live terminals do not allow injecting a low-disk-space condition on demand. Agent training is bounded by what production environments will surface, with no systematic way to expose the edge cases agents will need to handle but rarely encounter in training.The research team trained agents inside the resulting simulator and found performance gains that exceeded what training against real environments alone produced. In a separate test, using world model training as a warm-up before agentic fine-tuning improved performance across seven benchmarks, including three the model had never seen during training.The paper accompanying the release identified a gap in prior agent research. "We argue that world modeling is a crucial missing piece in the path to general agents."Qwen-AgentWorld trains on what environments return, not what agents should doMost agent models are trained to answer one question: given what the environment just showed me, what should I do next? Qwen-AgentWorld is trained to answer the inverse: given what the agent just did, what will the environment show next?That reversal is the core of what the paper calls a language world model: instead of optimizing for action selection, the model learns to predict the next environment state across all seven domains under a single training objective. Prior work was narrower: WebWorld, an ear

Article preview — originally published by VentureBeat AI. Full story at the source.

Read full story on VentureBeat AI → More top stories

Also covered by

Investing.com Anthropic says Alibaba illicitly extracted Claude AI model capabilities Hacker News Sakana Fugu: a multi-agent system delivered as one model

Aggregated and edited by the Scoop newsroom. We surface news from VentureBeat AI alongside other reporting so you can compare coverage in one place. Editorial policy · Corrections · About Scoop

Alibaba's model never trained as an agent — and improved agent performance across seven benchmarks

Also covered by

More in ai