agentic-ai

How I think developers of frontier AI systems and regulators ought to act in the face of existential AI risk

LessWrong · Jun 19, 2026, 10:22 PM

This is a post I drafted on 2025-08-06 that doesn't live up to the ambitious title, but I'm publishing it anyways 10+ months later. "Don't let perfect be the enemy of good."Below I describe a simple framework for thinking about the permissibility of building the next frontier AI system. TL;DR: Evaluate the risk that proceeding with building the AI system will lead to catastrophic harm. (Get independent regulators to evaluate the risk too.) If the risk is too high, do not build it. I am a layperson, but my take is that frontier AI companies generally don't even include as much nuance as I include here in their public communications, so I decided to write up this simple framework to try to improve the discourse. I'd appreciate any feedback. I don't think anything I'm saying here is remotely new, but I'm not aware of a post like this.Anthropic's FrameworkIn a recent podcast episode published July 20, 2025, Anthropic co-founder Ben Mann is asked (at 48:43) "What are the odds that we align AI correctly and actually solve this problem?"In his answer, Ben references the following part of Anthropic's March 8, 2023 blog post titled Core Views on AI Safety: When, Why, What, and How, which lays out a framework for what Anthropic thinks it should do given different possibilities for how difficult it will be to develop powerful AI that is aligned and safe:One particularly important dimension of uncertainty is how difficult it will be to develop advanced AI systems that are broadly safe and pose little risk to humans. Developing such systems could lie anywhere on the spectrum from very easy to impossible. Let’s carve this spectrum into three scenarios with very different implications:Optimistic scenarios: There is very little chance of catastrophic risk from advanced AI as a result of safety failures. Safety techniques that have already been developed, such as reinforcement learning from human feedback (RLHF) and Constitutional AI (CAI), are already largely sufficient for alignme

Article preview — originally published by LessWrong. Full story at the source.

Read full story on LessWrong → More top stories

Aggregated and edited by the Scoop newsroom. We surface news from LessWrong alongside other reporting so you can compare coverage in one place. Editorial policy · Corrections · About Scoop

How I think developers of frontier AI systems and regulators ought to act in the face of existential AI risk

More in agentic-ai