agentic-ai

Toy transformers may represent belief-state geometry optimally but not minimally

LessWrong · Jun 24, 2026, 4:39 AM

Methods note: The code used for the experiments and related open-source repo were built with Claude. The experimental design and writeup is my own, with minimal editing and formatting amendments made with Claude. Thesis. A toy transformer keeps provably predictively defunct belief state data in its residual stream. This information is shed only when there is a sufficient amount of imposed capacity pressure, in which case the oldest predictively defunct information is shed first. Results I conducted a set of experiments that served as extensions of the results presented in Shai et al. 2024. The original publication uses Hidden Markov Models (HMMs) to train toy transformers on next token prediction, and then uses values in the residual stream to generate Bayesian belief state representation about the HMMs upon which it was trained. Optimal-prediction theory, the proposition that a system is likely to use the most energetically inexpensive ("optimal") method to represent the requisite information for next token prediction, is a meaningful top-down framework for this experiment because it fits neatly with the results of the experiment: referencing representations of the predictively significant state information of the HMMs is less energetically demanding than, say, referencing a lookup table for any given string of tokens (in other words, bringing a single inefficient predictive system to bear on every HMM agnostic to differences in the states and algorithms that differentiate those HMMs). As such, my hypothesis was as follows: according to optimal-prediction theory, a transformer should recognize and prune away information irrelevant to next token prediction. Therefore, were we to introduce a stochastic token generation element (a "coin") that affects state meaningfully for certain epochs, but then becomes irrelevant or redundant, we would expect the transformer to prune away data about the coin. However, the first experiment immediately disproved the initial hypothesi

Article preview — originally published by LessWrong. Full story at the source.

Read full story on LessWrong → More top stories

Aggregated and edited by the Scoop newsroom. We surface news from LessWrong alongside other reporting so you can compare coverage in one place. Editorial policy · Corrections · About Scoop

Toy transformers may represent belief-state geometry optimally but not minimally

More in agentic-ai