Improving Petri scheming audits with environment blueprints
This is a short write-up of work conducted as part of the MATS 9.0 program. We thank Victoria Krakovna for mentorship and Fred Bruford for research management.TL;DR: We introduce a pipeline that generates environment blueprints for realistic scheming propensity evaluations. As a case study, we test how well these blueprints power investigator agents — specifically Petri — in auditing Gemini 3.1 Pro Preview for code sabotage. Compared to the baseline Petri, Blueprint-Petri results in audits that are more realistic and significantly more consistent. Across 160 audits for code sabotage, we found no egregious scheming behavior, and only one instance of unprompted deliberation about sabotage.MotivationScheming propensity evaluations need to balance recall (catching scheming behavior) and precision (ruling out confounders). This balance comes with desiderata such as high realism while still strongly incentivizing scheming behavior. Scheming behavior is also rare, so behavioral evaluations that target scheming need to be scalable. Auditing tools such as Petri promise this scalability, but have (at least) the following two problems:Inconsistency. By default, the Petri auditor simulates environment data, user responses, and even the target AI's scaffolding on the fly, and differently each run. Multiple repetitions of "the same" scenario are therefore not easily comparable, and ablations of environmental factors to get a better understanding of the target AI's behavior are noisy and confounded. Holding at least the scaffolding constant is common (Souly et al., Petri-Dish), but there remains inconsistency on the environment level. Even if it were possible to hold the environment mostly fixed too (e.g. as in a static environment), behavior in long-horizon agentic tasks varies a lot across rollouts. A single rollout per scenario misses a lot of signal about propensity, since misaligned actions can e.g. only appear in 5% of rollouts.Overloaded auditor. The auditor has two jobs at