agentic-ai

Aligning Superintelligent Humans

LessWrong · Jun 3, 2026, 8:39 PM

Last post was an example of how intelligence-correlated tools can stabilize reflection. Here I'll discuss how native-cyborgism attenuates the hard parts of getting to a pivotal act.Probable ASI is grown end-to-end and evaluated by much less capable humans. Alignment is hard because we're weaker optimizers than what we're trying to steer:Gradient descent searches a vastly larger space than humans can conceptualize, so:Internals become incomprehensible, particularly if you train for what look like good circuits; I'm going to call this ontological mismatchOutputs start exploring human-unwanted parts of the optimization space, call this power mismatchThe ASI itself has plenty of available strategies for power-grabbingThis power asymmetry means, in general, that the processes we're trying to control simply route around our measures. A mathematically robust specification of properties like corrigibility, if translated to PyTorch code, removes the gaps an ASI would otherwise flow through. I'm glad some people are working on this, but perhaps there are other approaches.What if we instead kept the optimizer at a manageable capacity[1]?IntrospectionHumans, I think, can have much more control over where their values drift as an individual than labs will over future AIs. We can actively guide our internals because they're apparent to us; value stability scales with introspection.People don't typically try to do this. Most who do try don't have particularly strong models of cognitive neuroscience, and rarer still[2] are those who correlate evolutionary psychology (and other relevant sciences) to their introspection.To rephrase, some form of self-understanding scales with introspection and intelligence, and so boosting both intelligence and introspection would buffer ontological drift (in the precise sense of preventing incomprehensibility) and "optimizer power mismatch" (which in this case is more like adversarial outputs from the BCI)."Rolling your own metaethics"What's a value

Article preview — originally published by LessWrong. Full story at the source.

Read full story on LessWrong → More top stories

Aggregated and edited by the Scoop newsroom. We surface news from LessWrong alongside other reporting so you can compare coverage in one place. Editorial policy · Corrections · About Scoop

Aligning Superintelligent Humans

More in agentic-ai