Scoopfeeds — Intelligent news, curated.
A 0.12% parameter add-on gives AI agents the working memory RAG can't
ai

A 0.12% parameter add-on gives AI agents the working memory RAG can't

VentureBeat AI · May 21, 2026, 7:00 PM

Why this matters: a development in AI with implications for how people work, create, and decide.

AI agents forget. Every time a coding assistant loses track of a debugging thread, or a data analysis agent re-ingests the same context it already processed, the team pays in latency, token costs, and brittle workflows. The fix most teams reach for — expanding the context window or adding more RAG — is increasingly expensive and still doesn't reliably work.To address this, researchers from Mind Lab and several universities proposed delta-mem, an efficient technique that compresses the model’s historical information into a dynamically updated matrix without changing the model itself. The resulting module adds just 0.12% of the backbone model's parameters — compared to 76.40% for one leading alternative — while outperforming it on memory-heavy benchmarks. Delta-mem allows models to continuously accumulate and reuse historical data, reducing the reliance on massive context windows or complex external retrieval modules for behavioral continuity.The long memory challengeThe conventional solution is to simply dump all the information into the model’s context window.But as Jingdi Lei, co-author of the paper, told VentureBeat, current systems treat memory merely as a context-management problem. “Either we keep expanding the context window, or we retrieve more documents through RAG,” Lei explained. “These approaches are useful and will remain important, but they become increasingly expensive and brittle when agents need to operate over long-running, multi-step interactions, and they don't really [work] like human memory since they are more like looking up documents.”In enterprise settings, the bottleneck is not just whether the model can access history, but whether it can reuse that history efficiently, continuously, and with low latency. Standard attention mechanisms incur a quadratic computational cost as the sequence length increases. Furthermore, expanding the context window does not guarantee the model will actually recall the information effectively. Mod

Article preview — originally published by VentureBeat AI. Full story at the source.
Read full story on VentureBeat AI → More top stories
Aggregated and edited by the Scoop newsroom. We surface news from VentureBeat AI alongside other reporting so you can compare coverage in one place. Editorial policy · Corrections · About Scoop