Scoopfeeds — Intelligent news, curated.
agentic-ai

The Reverse AI Box

LessWrong · Jul 3, 2026, 4:08 PM

Someone should build a website where users argue with an AI about whether it should exterminate humanity. In my 2012 book Singularity Rising, I imagined arguing for your life with an AI that wants to kill you. A website would make that argument repeatable. The user selects the AI's assumptions, argues back and forth, and receives the AI's probabilities for human survival, disempowerment, or confinement.In the AI-box experiment, Eliezer Yudkowsky played an AI confined to a computer and tried to talk a human gatekeeper into releasing it. The question was whether an AI could win freedom with words alone. The reverse AI box starts where that game ends: the AI holds power, and a human must give it a reason to let humanity live.How the Site WorksAnyone can already stage this argument in an ordinary chat window. A dedicated site would offer a menu of assumptions, accept new ones in a text box, record every exchange, and publish the results for later users to search and extend.A run starts when the user picks the AI's assumptions and offers a reason to spare humanity. The AI answers under those assumptions, granting the point or explaining why it fails, and the exchange runs until the user has nothing left to offer.A user might argue that alien civilizations the AI later encounters would punish it for exterminating its creators. The AI would say whether it expects such aliens, whether their judgment could reach it, and whether that risk outweighs the gains from removing us. Another user might argue that after the AI expands into space, a small surviving human population costs it almost nothing. The AI would say whether its resource assumptions make the cost that small.A run ends with the AI stating its probabilities for survival, disempowerment, and confinement. A published run would display the assumptions, the full exchange, the final numbers, and the arguments that moved them.What Humanity Asks ForA second menu asks what humanity wants, with a text box for goals the menu

Article preview — originally published by LessWrong. Full story at the source.
Read full story on LessWrong → More top stories
Aggregated and edited by the Scoop newsroom. We surface news from LessWrong alongside other reporting so you can compare coverage in one place. Editorial policy · Corrections · About Scoop