computer-science

Matrix Orthogonalization Improves Memory in Recurrent Models

Hacker News · Jul 1, 2026, 5:13 AM

Key takeaways

But for some domains, we can't afford the quadratic-attention overhead of transformers.
The best known RNN for associative recall is mLSTM, a variant of LSTM that maintains a matrix memory. mLSTMs demonstrate substantially improved recall over baselines on one benchmark, MQAR.
Since MQAR doesn't measure NAR, we can look at MAD's noisy AR task suite.

Transformers exhibit remarkable associative recall (AR) abilities: attention provides each token direct access to those preceding it, a mechanism that has been hard for other architectures, like recurrent neural networks (RNNs), to match.

But for some domains, we can't afford the quadratic-attention overhead of transformers. One example is long-horizon RL, in the style of Dreamer. For these kinds of applications, we need to make recurrent neural networks work, but don't want to give up on associative recall.

The best known RNN for associative recall is mLSTM, a variant of LSTM that maintains a matrix memory. mLSTMs demonstrate substantially improved recall over baselines on one benchmark, MQAR. But pure recall may not be sufficient to measure recurrent performance. In fields where environment transitions can be noisy, a useful proxy test is noisy associative recall (NAR).

Article preview — originally published by Hacker News. Full story at the source.

Read full story on Hacker News → More top stories

Aggregated and edited by the Scoop newsroom. We surface news from Hacker News alongside other reporting so you can compare coverage in one place. Editorial policy · Corrections · About Scoop

Matrix Orthogonalization Improves Memory in Recurrent Models

Key takeaways

More in computer-science