agentic-ai

Do AI Biorisk Thresholds Need Intermediate Warning Levels?

LessWrong · Jun 22, 2026, 1:09 AM

In the Claude Mythos/Fable 5 system card, Anthropic states that the model meets non-novel (CB-1) but falls short of novel (CB-2) biological/chemical weapons development capability thresholds. Despite this difference in conclusions, they introduce protective measures in response to both.This is not the first time there's been a gap between threshold-triggered and actual governance decisions. In 2025, Anthropic activated AI Safety Level 3 (ASL-3) protections with the release of Claude Opus 4 despite being uncertain whether capability thresholds had been met. Anthropic's Responsible Scaling Policy (RSP) v3 discussion further elaborates that capability evaluations may not produce a clean line between "safe" and "dangerous" and labs may spend significant time in what it calls a "zone of ambiguity."In practice, then, the highest biorisk thresholds are not the only thing triggering protective measures. Many real decisions are being made in this grey area, before a threshold is clearly crossed. If terminal thresholds like CB-1 and CB-2 are insufficient on their own to trigger protective measures, what should guide decisions in the grey area? My proposal is that frontier labs should keep those thresholds as red lines but add intermediate warning levels between them focused on bottleneck reduction rather than demonstrated end-to-end weaponization.If a lab cannot rely on threshold definitions alone, it has to find substitutes to guide decision-making in the grey areas between them. This is very difficult. Benchmarks are appealing because they're cheap to run and useful in measuring theoretical capability. However, they don't measure whether a motivated actor could actually turn that raw capability into novel biological weapons. Uplift trials are informative, but expensive to run and not guaranteed to simulate the most likely bad actors and the infrastructure they may have access to.While these and other substitutes are informative, they all only measure proxies of the final th

Article preview — originally published by LessWrong. Full story at the source.

Read full story on LessWrong → More top stories

Aggregated and edited by the Scoop newsroom. We surface news from LessWrong alongside other reporting so you can compare coverage in one place. Editorial policy · Corrections · About Scoop

Do AI Biorisk Thresholds Need Intermediate Warning Levels?

More in agentic-ai