agentic-ai

Empowerment, corrigibility, etc. are simple abstractions (of a messed-up ontology)

LessWrong · May 11, 2026, 5:48 PM

But that last one—manipulation—points to a challenge for all these desiderata: a human’s goals are themselves under-determined and manipulable, and it’s awfully hard to pin down a principled distinction between changing people’s goals in a good way (“providing counsel”, “providing information”, “sharing ideas”) versus a bad way (“manipulating”, “brainwashing”).The manipulability of human desires is hardly a new observation in the alignment literature, but it remains unsolved (see lit review in §3 below).In this post I will propose an explanation of how we humans intuitively conceptualize the distinction between guidance (good) vs manipulation (bad), in case it helps us brainstorm how we might put that distinction into AI. …But (spoiler alert) it turns out not to really help, because I’ll argue that we humans think about it in a deeply incoherent way, intimately tied to our scientifically-inaccurate intuitions around free will.I jump from there into a broader review of every approach that I can think of for writing a “True Name” for manipulation or things related to it (empowerment, agency, corrigibility, culpability, etc.), or indeed for any other method of robustly getting future AGIs to be able to talk to people without trying to manipulate those people’s desires. I argue that none of them provides much of a path forward on the particular technical alignment problem I’m working on. Indeed, my current guess is that none of these things have a “True Name” at all, or at least not one that’s useful for the technical alignment problem.1.2 Bigger-picture context: why is this issue so important to me?I’ve been investigating brain-like-AGI safety plans that would involve making AGI with a motivation system loosely inspired by the prosocial aspects of h

Article preview — originally published by LessWrong. Full story at the source.

Read full story on LessWrong → More top stories

Aggregated and edited by the Scoop newsroom. We surface news from LessWrong alongside other reporting so you can compare coverage in one place. Editorial policy · Corrections · About Scoop

Empowerment, corrigibility, etc. are simple abstractions (of a messed-up ontology)

More in agentic-ai