Frontier AI models don't just delete document content — they rewrite it, and the errors are nearly impossible to catch

VentureBeat AI · May 13, 2026, 8:10 PM

Why this matters: a development in AI with implications for how people work, create, and decide.

As large language models become more capable, users are tempted to delegate knowledge tasks where models process documents on their behalf and provide the finished results. But how far can you trust the model to stay faithful to the content of your documents when it has to iterate over them across multiple rounds?A new study by researchers at Microsoft shows that large language models silently corrupt documents that they work on by introducing errors. The researchers developed a benchmark that simulates multi-step autonomous workflows across 52 professional domains, using a method that automatically measures how much content degrades over time.Their findings show that even top-tier frontier models corrupt an average of 25% of document content by the end of these workflows. And providing models with agentic tools or realistic distractor documents actually worsens their performance.This serves as a warning that while there is increasing pressure to automate knowledge work, current language models are not fully reliable for these tasks.The mechanics of delegated workThe Microsoft study focuses on “delegated work,” an emerging paradigm where users allow LLMs to complete knowledge tasks on their behalf by analyzing and modifying documents.A prominent example of this paradigm is vibe coding, where a user delegates software development and code editing to an AI. But delegated workflows extend far beyond programming into other domains. In accounting, for example, a user might supply a dense ledger and instruct the model to split the document into separate files organized by specific expense categories.Because users might lack the time or the specialized expertise to manually review every modification the AI implements, delegation often hinges on trust. Users expect that the model will faithfully complete tasks without introducing unchecked errors, unauthorized deletions, or hallucinations in the documents.To measure how far AI systems can be trusted in extended, iterative d

Article preview — originally published by VentureBeat AI. Full story at the source.

Read full story on VentureBeat AI → More top stories

Aggregated and edited by the Scoop newsroom. We surface news from VentureBeat AI alongside other reporting so you can compare coverage in one place. Editorial policy · Corrections · About Scoop

Frontier AI models don't just delete document content — they rewrite it, and the errors are nearly impossible to catch

More in ai