Google DeepMind unveils plan to protect itself from its own rogue AI agents
Google has developed a new plan to police the increasingly capable AI agents it uses within its own AI research organization, and the company is publishing the so-called roadmap to help other AI labs counter the potential threat of rogue AI agents. The Google Deepmind security plan involves a pivot away from the AI safety community’s typical focus on “the alignment problem”—the idea of figuring out how to train an AI system so that its actions reliably match the intentions, values, and ethics of the humans who are managing it. While continuing to say that alignment is one key safety component, Google’s roadmap acknowledges that the alignment problem may never be fully solved, and instead creates a layered security system that treats AI agents as potential rogue insiders within an organization. The 35-page technical report maps out a series of steps and procedures that are designed to catch potential adversarial behavior by AI agents. “If the first line of defense—alignment—fails, how can we mitigate harm anyway?” Rohin Shah, who leads the AGI Safey & Alignment team at Google DeepMind, told Fortune in an interview. The AI agent framework borrows heavily from traditional cybersecurity, especially insider-threat prevention. “We borrow a lot from security, which already deals with the threat of internal employees who might be malicious, and we can apply these to a new setting,” Shah said. But, he noted, “AI is systematically different from humans.” For one thing, AI agents might be able to act far faster and at greater scale than an individual rogue employee ever could. So there need to be systems that can control what tools and data an AI agent has access to as well as systems that can monitor AI agent behavior and spot potentially aberrant patterns in real-time. There are other differences too. For instance, many access controls and permissions systems for human employees are based around a particular employee’s role within the company. A systems administr