We Need Breadth-First AI Safety Plans
Cross-posted from my website. Depth-first plans lay out a path from here to aligned superintelligent AI. We need those kinds of plans. But depth-first plans depend on many assumptions: "We will make AI safe by doing step 1, then step 2, then step 3." Step 1 only works under condition A, step 2 requires condition B, step 3 requires condition C. If A or B or C is false, the whole plan fails (and there's a good chance we all die). Consider Google's safety plan from April 2025. To my knowledge, this is the best among the frontier AI companies' plans. [1] Google's plan depends on a series of conditions: For the most part, the plan does not consider concrete details of how significantly-more-capable AI systems will behave, instead proposing that Google will figure out how to handle those systems once it understands them better. This only works given (at least) two conditions: AI capability improvements occur at a relatively predictable pace, with no unexpectedly large jumps. The plan explicitly assumes no "discontinuous" improvements, which is roughly the same thing. It's good that they're being explicit about this. Once stronger capabilities emerge, there will be enough time to figure out mitigations. The plan entails putting stricter measures in place once AI systems become sufficiently capable. This depends on at least two conditions: Google (or somebody) can accurately determine what capability level is dangerous. Google's evals (or third-party evals) can elicit dangerous capabilities if they exist. The plan requires using AI to bootstrap AI alignment. This depends on several conditions: We can successfully align the AI that we use for bootstrapping, or misalignment will be easy (enough) to spot, or alignment isn't necessary (e.g. because humans can use amplified oversight to monitor smarter-than-human systems). Future Google can be trusted to use enough of its compute to differentially accelerate alignment research, rather than doing something more profitable (for ex