agentic-ai

Analysis of Metastable States in the Transformer Activation Space

LessWrong · Jun 6, 2026, 9:30 PM

Part 1: Do Metastable Token Clusters exist in Trained Transformers? This is the first entry in a sequence. Over about ten parts, this series will work through a few humble experiments that test a mathematical theory of attention against real trained transformers.A project summary: a recent paper by Geshkovski, Letrouit, Polyanskiy, and Rigollet models attention as a dynamical system on the sphere and proves that tokens cluster and drift toward consensus, with a metastable two-timescale structure along the way — under the assumption that Q, K, and V are all the identity. This sequence asks how much of that survives in trained models (most of it does), finds the one prediction that fails universally (the energy is not monotone), and then spends the rest of the series tracking that failure to its mechanism in the value matrix. Links: [GitHub repository]: all the code for every phase.[Original paper]: Geshkovski et al., the theory this whole investigation is built on[YouTube walkthrough]: a video going over the original paper for those who prefer videoThis is the first in what will be a sequence of ~10 posts. Each post will be aligned with a phase from the github. The results of the experimentsMetastability is real in trained transformers — but the mechanism the theory proposes for it is not. Tokens cluster into metastable groups the way the idealized model predicts; the energy that theory says drives the process is violated in every model tested; and how fast a model collapses is set by its value matrix, not by depth or width as the theory claims.Three predictions of the idealized theory survive the move to trained models, and survive cleanly. Token representations cluster over layers (CONFIRMED, every model, every real prompt). The clusters persist across runs of layers — metastable plateaus — before reorganizing (CONFIRMED). And the two timescales of metastability — fast formation, slow merging — are genuinely separate above a depth threshold (CONFIRMED). Two predict

Article preview — originally published by LessWrong. Full story at the source.

Read full story on LessWrong → More top stories

Aggregated and edited by the Scoop newsroom. We surface news from LessWrong alongside other reporting so you can compare coverage in one place. Editorial policy · Corrections · About Scoop

Analysis of Metastable States in the Transformer Activation Space

More in agentic-ai