agentic-ai

Alignment to What?

LessWrong · Jun 1, 2026, 9:27 PM

Part One The Signature of a Framing Problem In 1903, G.E. Moore opened Principia Ethica with a diagnosis that has aged well: “the difficulties and disagreements, of which its history is full, are mainly due to a very simple cause: namely to the attempt to answer questions, without first discovering precisely what question it is which you desire to answer.” Einstein made the same observation about science: the formulation of a problem is often more essential than its solution. The situation Moore and Einstein are describing is otherwise called a framing problem. The signature of a framing problem is that a field, even if technically productive, is foundationally stuck (i.e. when its practitioners agree on methods while disagreeing irreconcilably about fundamentals).From the point of view of someone newly introduced to the subject, the field of AI alignment appears to have this signature. The engineering has advanced remarkably while answers to the foundational questions have not. Researchers can fine-tune, constrain, and steer systems with increasing precision, yet there is no agreement on what alignment is ultimately to. This essay argues that the impasse is not a hard technical problem waiting for a technical breakthrough. It is a philosophical problem that the dominant framing creates and, therefore, cannot address.The standard framing of AI alignment asks: how do we get AI systems to do what humans want? Upon analysis, most researchers who pose it are really trying to answer a different question: how do we build AI systems that are safe and useful for the people who use them? The hidden assumption is that what humans want is a stable and sufficient guide to what is safe and beneficial. That assumption seems to be the hidden premise of the entire field, and it is worth examining, because it is not true.Why Human Preference is InsufficientThe assumption fails in three distinct ways. It is unstable. Human preferences are inconsistent, context-dependent, and manipulabl

Article preview — originally published by LessWrong. Full story at the source.

Read full story on LessWrong → More top stories

Aggregated and edited by the Scoop newsroom. We surface news from LessWrong alongside other reporting so you can compare coverage in one place. Editorial policy · Corrections · About Scoop

Alignment to What?

More in agentic-ai