agentic-ai
Where does the race to automate AI research end?
This is a linkpost of a recording of a recent MATS research talk where I argue that the automation of AI research — which Open AI and Anthropic say is imminent — could lead to an unrecoverable alignment failure. Three properties make it especially dangerous: oversight breaks down at scale, capabilities self-amplify, and capabilities will be sped up asymmetrically faster than alignment. The outcome could be a lethal, unrecoverable alignment failure. Link to the paper preprint.Check out the recording here.Discuss
Article preview — originally published by LessWrong. Full story at the source.
Read full story on LessWrong →
More top stories
Aggregated and edited by the Scoop newsroom. We surface news from LessWrong alongside other reporting so you can compare coverage in one place.
Editorial policy · Corrections · About Scoop