agentic-ai

Thoughts on Claude Fable's silent safeguards

LessWrong · Jun 10, 2026, 11:35 PM · Also reported by 4 other sources

Citing concerns over potential cyber risk, Anthropic initially rolled out Mythos access to only a small number of select organizations; this controlled rollout gave Anthropic visibility into usage, and allowed partners to use the new capabilities defensively (i.e., to find and patch vulnerabilities before they could be exploited by attackers).Anthropic, in releasing Fable 5, has now made a Mythos-level model accessible to the public. However, due to their concerns over potential risks from new capabilities, the public access is restricted via new safeguards.The new safeguardsThe launch blog post enumerates three classes of safeguards: (1) cybersecurity, (2) biology and chemistry, and (3) distillation. Requests classified to fall within one of these three categories are processed by a weaker model (Opus 4.8), and this "fallback" behavior is made transparent to the user:When Fable's classifiers detect a request related to cybersecurity, biology and chemistry, or distillation, the response is automatically handled by Claude Opus 4.8 instead. Users will be informed whenever this occurs.However, Section 1.5 of Fable's system card describes another category of safeguards – a category that is completely omitted by the blog post. Here it is in full (emphasis mine; but I encourage reading all 3 paragraphs carefully):We have also added safeguards related to frontier LLM development. As discussed in Section 6.1 of our February 2026 Risk Report, we are concerned about the risks of accelerating the overall pace of AI dev

Article preview — originally published by LessWrong. Full story at the source.

Read full story on LessWrong → More top stories

Also covered by

ARY News Anthropic launches Claude Fable 5 with new Opus safeguards Hacker News Claude Fable 5 Hacker News Claude Mythos 5 / Fable 5 Hacker News System Card: Claude Fable 5 and Claude Mythos 5 [pdf]

Aggregated and edited by the Scoop newsroom. We surface news from LessWrong alongside other reporting so you can compare coverage in one place. Editorial policy · Corrections · About Scoop

Thoughts on Claude Fable's silent safeguards

Also covered by

More in agentic-ai