agentic-ai

Structural Proxies

LessWrong · Jun 30, 2026, 12:38 PM

Lately I've been thinking a lot about what work would help with actually winning and getting to good worlds. In the spirit of that I decided to venture outside my normal wheelhouse and spend some time reflecting on what technical research could make me more confident about powerful AIs being safe.AGI safety research is tricky partly because we don’t actually have access to the thing we want to study, i.e. superhuman AI. Much of the work we do now is basically trying to lay (potentially irrelevant) foundations for the period when we actually know what we’re up against, and at that point, a lot of the work might be done by AIs.You can group current approaches by how they try to sidestep this access problem:[1]Prosaic techniques like RLHF and interpretability try to make progress on current model safety in a way that will hopefully generalise, except maybe they just won’t scaleModel organisms artificially construct exemplars of bad behaviour (alignment faking, trojans etc) but it’s hard to tell how representative the constructed case isControl techniques aim to get usable work out of potentially misaligned AIs and bootstrapping, except it’s again unclear how far that will scaleAgent foundations tries to reason theoretically about what powerful AIs will be like, except it’s really hard to be sure if you’ve made any useful progressHonourable mentions: Evals work, Scalable Oversight (RIP), Governance, Pause advocacy etc…Here I want to gesture at a different angle of attack, which I’m going to call structural proxies. The basic idea is that you look for current naturally-appearing problems with AI that share a structure with future problems — in other words, structural proxies. In particular, you want to have some reason to believe that the current problem is being generated by the same types of dynamics that will produce the future problem, even if the manifestations look quite different. You then use the proxies to try to understand the future problems. The closest paral

Article preview — originally published by LessWrong. Full story at the source.

Read full story on LessWrong → More top stories

Aggregated and edited by the Scoop newsroom. We surface news from LessWrong alongside other reporting so you can compare coverage in one place. Editorial policy · Corrections · About Scoop

Structural Proxies

More in agentic-ai