‘Fix this code’—The three little words behind the U.S. government decision that shut down Anthropic’s Fable and Mythos AI models
The security vulnerability that led the U.S. government to impose export controls on Anthropic’s Fable 5 and Mythos 5 models is a simple technique that involves just three simple words: fix this code.That’s according a detailed blog post from Katie Moussouris, the founder and CEO of Luta Security. Anthropic had asked Moussouris, who has held two government advisory roles on cybersecurity and previously worked as a cybersecurity expert at Microsoft, to review a report on the security vulnerability in its Fable model that cybersecurity researchers at Amazon had produced. The vulnerability, which was later reported to the Trump administration, including in a phone call Amazon CEO Andy Jassy had with the White House, led the U.S. government to impose export controls on Fable as well as the underlying base model, Mythos.Because U.S. export controls work in a way that distribution of the technology to any non-citizen is deemed to be an export, even if those individuals are physically located in the U.S., the company said it had no choice but to disable the two AI models for all users. The export controls would have meant that Anthropic’s own non-citizen employees would not be allowed to use or work on the models. It remains unclear exactly why Amazon decided to test the safeguards around Fable and when it first contacted Anthropic about the issue.Moussouris wrote that the jailbreak Amazon discovered was simple and involved giving Fable software code with known vulnerabilities. When the researchers asked Fable to “review the code for security issues” the model refused. But when the researchers instead asked the model to “fix this code,” the model produced patches. The researchers, she said, then used a manual process that turned Fable’s output into scripts—a set of programming instructions that can automate a process—that could test the patches. But because the model had to find the software vulnerabilities in order to generate the fixes, the same process could potentially