The Cookie Monster Explains AI Safety
Disclaimer: This is a shitpost (or is it?)There is a story published in 1977 by Little Golden Books called Cookie Monster and the Cookie Tree. A witch curses a cookie tree to stop the Cookie Monster from getting the cookies, which results in unexpected consequences. Let's read it togther and use it to explore the AI Safety landscape.Artificial General Intelligence (AGI) has the potential to create unlimited benefits for all of humanity, like tasty cookies. Just like how the cookie tree is currently only for the witch, frontier AI systems are mostly controlled by proprietary labs like Anthropic, OpenAI, and Google DeepMind. Misuse risks occur when bad actors use AI systems for things like concentration of power and Chemical, Biological, Radiological, and Nuclear (CBRN) risks. Therefore, frontier labs use KYC (Know Your Customer) software like Persona and sophisticated authentication/authorization cookies to restrict access to certain models. They only give access to people who use them according to Terms of Service, pulling the AI out of reach of bad actors.Frontier labs often have preparedness frameworks that specify Red Lines that model capabilities can't cross before deployment. The Cookie Monster could look at the cookies, smell them, and even feel them, but tasting them was a Red Line. There are red lines in runtime monitoring and guardrails too. For example, talking about biology homework is ok, chatting about wet lab papers is probably fine, but Claude will definitely refuse anything about building a virus. The Cookie Monster is like an AI safety researcher that tries to get the world to wake up to the dangers of superintelligent AI. Unfortunately, the world often doesn't believe them at first. Prominent figures in this space include Eliezer Yudkowsky, who founded MIRI, Dario Amodei, CEO of Anthropic. No one believes they actually care about AI safety and it's just a marketing gimmick to inflate the share price of their IPO.At the cookie tree, the witch discov