Scoopfeeds — Intelligent news, curated.
Anthropic accused of ‘secret sabotage’ as Claude Fable 5 silently limits capabilities for AI researchers and developers
business

Anthropic accused of ‘secret sabotage’ as Claude Fable 5 silently limits capabilities for AI researchers and developers

Fortune · Jun 10, 2026, 5:43 PM · Also reported by 4 other sources

When Anthropic made its first Mythos-tier model available to the general public yesterday, called Claude Fable 5, Fortune reported it was a “considerable step” for the lab, coming just over a week after the company confidentially filed for IPO paperwork. It had initially deemed Mythos-class models too dangerous to release, citing their significantly enhanced ability to identify software vulnerabilities, but said it was now confident new guardrails in Claude Fable 5 are enough to ensure these dangerous skills don’t fall into the wrong hands. Just hours after the model’s release, however, major backlash from AI researchers, developers and policy experts began brewing on social media. The pushback centered around a paragraph buried in Claude Fable 5’s 319-page system card—a document that offers detailed safety disclosures—which revealed that Fable would quietly downgrade its own responses when it detected requests related to cutting-edge AI development work, such as building the infrastructure used to train large AI models. In practice, that means a user could ask Fable for help, receive a deliberately weakened answer, but not know the model was holding anything back. Critics made it clear they felt this undermined a basic expectation that a tool would either do what it was asked or tell the user it wouldn’t. Unlike Fable’s other restrictions, such as around cybersecurity and biology, which openly redirect users to a less powerful model with a visible notification, the system card emphasized that this is “not visible to the user.” The model still responds, but uses “interventions to limit Claude’s effectiveness” without telling the user it’s doing so. Anthropic estimated the restrictions would affect roughly 0.03% of traffic. But it also defended its effort by saying “enforcing this restriction through our safeguards avoids accelerating the actors most willing to violate these terms.” Pushback from AI community A wide swath of the AI community pushed

Article preview — originally published by Fortune. Full story at the source.
Read full story on Fortune → More top stories

Also covered by

Aggregated and edited by the Scoop newsroom. We surface news from Fortune alongside other reporting so you can compare coverage in one place. Editorial policy · Corrections · About Scoop