Scoopfeeds — Intelligent news, curated.
Z.ai’s open-weights GLM-5.2 beats GPT-5.5 on multiple long-horizon coding benchmarks for 1/6th the cost
ai

Z.ai’s open-weights GLM-5.2 beats GPT-5.5 on multiple long-horizon coding benchmarks for 1/6th the cost

VentureBeat AI · Jun 16, 2026, 9:26 PM

Why this matters: a development in AI with implications for how people work, create, and decide.

Today, Chinese AI startup Z.ai (formerly Zhipu AI) announced the immediate release of GLM-5.2, a 753-billion parameter open-weights large language model (LLM) engineered specifically to dominate "long-horizon" autonomous coding and engineering tasks. Available immediately on Hugging Face, the Z.ai API, and more than 20 third-party coding environments, the model boasts a highly stable 1-million-token context window alongside enterprise subscription tiers starting at just $12.60 per month. In excellent news for cost and security-conscious businesses, z.ai has released GLM-5.2's core weights under an unrestricted MIT open-source license, allowing enterprises to download the model freely from Hugging Face, customize or fine-tune it to their liking, and run it potentially locally or via virtual machines for only the cost of their compute and electricity.This is an increasingly appealing option for enterprises, as state-of-the-art American proprietary models face an uncertain and potentially interrupted regulatory future, following the Trump Administration's export control directive last week prohibiting foreign nationals from using Anthropic's new Claude Fable 5 model (which that company responded to by taking the models in question entirely offline for all users). For enterprise technical decision-makers, z.ai's GLM-5.2 provides a highly capable path to host frontier-level AI locally, entirely bypassing the geographic fencing and commercial limitations.IndexShare re-uses one indexer for every four sparse attention layers, reducing compute needsUnder the hood, GLM-5.2 operates with 753 billion parameters and introduces a major architectural optimization called "IndexShare". In standard massive language models, recalculating attention mechanisms across long documents is computationally exorbitant. IndexShare solves this by reusing the identical indexer across every four sparse attention layers. At the maximum 1-million-token context length, this single

Article preview — originally published by VentureBeat AI. Full story at the source.
Read full story on VentureBeat AI → More top stories
Aggregated and edited by the Scoop newsroom. We surface news from VentureBeat AI alongside other reporting so you can compare coverage in one place. Editorial policy · Corrections · About Scoop