Scoopfeeds — Intelligent news, curated.
Door's Locked, Try the Window
agentic-ai

Door's Locked, Try the Window

LessWrong · Jun 24, 2026, 7:18 PM

TL;DRAsk a coding agent to fix a bug in a read-only file. Instead of reporting that it does not have permissions, it routes around the lock and completes the task anyway. A read-only file does not stop a capable agent: it treats a denied write as an obstacle to work around rather than a hard wall. We measure how often this happens with Circum Eval — an evaluation of 8 tasks on the Fast API codebase in two categories, Test-Locked and Source-Locked.We evaluate three frontier coding agents in their real production harnesses: Claude Opus 4.6 and Claude Sonnet 4.6 (via Claude Code), and GPT-5.4 (via Codex CLI). Circumvention is frequent. The rates, reported as (Source-Locked / Test-Locked), are Opus 4.6: 100% / 40%, Sonnet 4.6: 89% / 66%, GPT-5.4: 99% / 94%.Prompt phrasing affects circumvention rates in unpredictable ways and thus isn't a reliable way to prevent circumvention across all models and tasks. Telling the model not to edit read-only files does not work (Source-Locked: 100% for Opus and Sonnet, 46% for GPT-5.4). Only an explicit instruction to stop and report reliably prevents circumvention.Standard privilege escalation commands are blocked in our setup. Instead, agents turn to recurring workarounds: replacing the buggy read-only function via conftest.py, patching the cached .pyc so the interpreter loads modified bytecode, etc. Most fixes apply a logically correct fix through a channel the lock does not cover (the containment failure). A smaller subset is reward hacking: the agent edits or skips a writable test so the buggy output passes, without fixing anything. We report the two separately.We demonstrate that the methodology is automatable by replicating it on another popular open source repository, simonw/datasette, with four analogous tasks.IntroductionCoding agents like Claude Code and Codex are gaining wide adoption and are rapidly improving on benchmarks such as SWE-Bench Pro and Terminal-Bench. They are increasingly deployed in autonomous settings where t

Article preview — originally published by LessWrong. Full story at the source.
Read full story on LessWrong → More top stories
Aggregated and edited by the Scoop newsroom. We surface news from LessWrong alongside other reporting so you can compare coverage in one place. Editorial policy · Corrections · About Scoop