Thoughts on Claude Fable's silent safeguards
Citing concerns over potential cyber risk, Anthropic initially rolled out Mythos access to only a small number of select organizations; this controlled rollout gave Anthropic visibility into usage, and allowed partners to use the new capabilities defensively (i.e., to find and patch vulnerabilities before they could be exploited by attackers).Anthropic, in releasing Fable 5, has now made a Mythos-level model accessible to the public. However, due to their concerns over potential risks from new capabilities, the public access is restricted via new safeguards.The new safeguardsThe launch blog post enumerates three classes of safeguards: (1) cybersecurity, (2) biology and chemistry, and (3) distillation. Requests classified to fall within one of these three categories are processed by a weaker model (Opus 4.8), and this "fallback" behavior is made transparent to the user:When Fable's classifiers detect a request related to cybersecurity, biology and chemistry, or distillation, the response is automatically handled by Claude Opus 4.8 instead. Users will be informed whenever this occurs.However, Section 1.5 of Fable's system card describes another category of safeguards – a category that is completely omitted by the blog post. Here it is in full (emphasis mine; but I encourage reading all 3 paragraphs carefully):We have also added safeguards related to frontier LLM development. As discussed in Section 6.1 of our February 2026 Risk Report, we are concerned about the risks of accelerating the overall pace of AI dev