agentic-ai

Phonies

LessWrong · Jun 10, 2026, 2:17 PM

Epistemic Status: Pretty uncertain, but feel the negative sentiment toward Anthropic/Fabel in the past day is too bandwagony and needs challenging.This weekend a friend complained to me that Anthropic’s recent article on recursive self-improvement (RSI) was a convenient ploy. Of course they would like us to consider a pause now—it’s the perfect means to secure their lead!I felt this was an unfair characterization. I had heard similar comments about Mythos and Project Glasswing. Did Anthropic really have a model capable of epochal new cybersecurity attacks? Or was this all a marketing scam?We have since learned that ChatGPT 5.5 Pro can find many of the same security issues as Mythos. But it is exactly by hyping up the cybersecurity capabilities of Mythos that organizations first began to take this threat seriously. The security reports coming from partners in Project Glasswing have not been universally positive. But I doubt we would have ended up with a new executive order without the publicity.The conversation on regulation now appears to be moving apace, with both Anthropic and OpenAI having made tentative first hints at a mutual pause.It is possible to act in a way that benefits oneself and which is positive for society. Publishing about RSI and signaling willingness for a mutual pause might indeed have financial benefits for Anthropic (it also might not; wouldn’t a pause negatively effect that evaluation?). But a mutual pause, followed by sensible regulation, could also benefit society.A few commentators seem to hold the moral philosophy of Holden Caulfield, cynically attacking any new attempt at safety as phony.Yesterday we witnessed initial blowback on Fable’s new safety mechanisms. Most notably, Fable will silently degrade its own responses on requests related to frontier model development. Researchers are (understandably!) concerned that this will undermine their ability to use Fable for capabilities and even safety development, exacerbated by a silent failur

Article preview — originally published by LessWrong. Full story at the source.

Read full story on LessWrong → More top stories

Aggregated and edited by the Scoop newsroom. We surface news from LessWrong alongside other reporting so you can compare coverage in one place. Editorial policy · Corrections · About Scoop

Phonies

More in agentic-ai