agentic-ai

Deployment Awareness Matters More Than Evaluation Awareness

LessWrong · Jun 26, 2026, 10:54 PM

TL;DREvaluation awareness — an AI recognizing it's being evaluated — is a widely discussed concept in AI safety. But there is a closely related concept that we claim is more important: deployment awareness, the AI's ability to recognize when it is not being evaluated and when its actions matter. A misaligned AI with deployment awareness can game evaluations without any evaluation awareness at all, with a simple strategy: act aligned by default, and deviate only when confident you're in real deployment and your actions matter for your goals. This requires two ingredients — occasionally recognizable deployment situations, and enough self-reflective and strategic reasoning for the AI to anticipate and plan around this. We think "deployment awareness" better identifies what makes evaluations fragile, and we develop this idea below.ConceptExplanationCommentsEvaluation awarenessAI is being tested and confidently believes that this is soThis only becomes a problem if most evaluations trigger evaluation awareness, and if the AI knows that.Or if the AI has good self-locating reasoning.Deployment awarenessAI is not being tested and confidently believes it is not being testedThis is a problem even if it happens rarely(if some of those rare cases are high stakes).Accurate self-locating beliefsAI has (roughly correct) beliefs about the sequence of situations it will faceThis allows for strategic planning.It makes deployment awareness and probabilistic strategies more effective.Side note: it's really about consequences, not about evaluation vs. deploymentBefore diving in, a quick but important observation. The distinction that should matter to a strategic AI[1] isn't really "evaluation vs. deployment." It's something more like: "if I take this action, could it backfire? For example, through failing an evaluation, triggering retraining, or losing privileges?" and "if I get away with this action, how much does it advance my goals?" Evaluations typically mean a high risk of backfiri

Article preview — originally published by LessWrong. Full story at the source.

Read full story on LessWrong → More top stories

Aggregated and edited by the Scoop newsroom. We surface news from LessWrong alongside other reporting so you can compare coverage in one place. Editorial policy · Corrections · About Scoop

Deployment Awareness Matters More Than Evaluation Awareness

More in agentic-ai