Scoopfeeds — Intelligent news, curated.
Several frontier models are substantially prefill aware
agentic-ai

Several frontier models are substantially prefill aware

LessWrong · Jun 17, 2026, 5:41 PM · Also reported by 1 other source

This blog post discusses work in a recently-published paper. However, this blogpost was primarily written by Parv Mahajan and Andy Wang, and several of the more speculative takes may not represent the all-things-considered view of the entire team.Link to paper: https://arxiv.org/abs/2606.12747TL;DR:We provide more conceptual grounding and extend results in prefill awareness to low-stakes settings, and show that several frontier models show prefill awareness even under conservative elicitation.Further behavioral studies are pretty messy, and we encourage more work in this area.We encourage frontier lab safety teams to measure and mitigate prefill awareness in pre-deployment evaluations.Recently, UK AISI investigated prefill awareness - whether frontier language models can distinguish between tampered and untampered assistant-side content. Prefills are used in misalignment continuation, persona, introspection, and jailbreaking research. Additionally, several prefill-based evaluations are used in pre-deployment testing to make safety claims. Prefill awareness could confound these evaluations, and fits into larger concerns about situational awareness (e.g., control awareness).The previous results largely focused on deployment-relevant settings (e.g., SWE-bench and Petri transcripts), and therefore weren’t able to make strong claims across types of commonly-used prefills and models. In the paper, we:Use a more refined conceptual framework to make sense of prefill awareness and synthesize new experiments with the previous resultsShow that prefill awareness emerges, even in low-stakes scenarios, across several frontier modelsDiscuss issues with behavioral measures of prefill awarenessArgue that prefill awareness should be measured as part of pre-deployment testing.Here, we’ll walk through some of our main takeaways and frustrations, and present some more speculative findings and future directions. In general, we are very excited for more research into different forms of si

Article preview — originally published by LessWrong. Full story at the source.
Read full story on LessWrong → More top stories

Also covered by

Aggregated and edited by the Scoop newsroom. We surface news from LessWrong alongside other reporting so you can compare coverage in one place. Editorial policy · Corrections · About Scoop