agentic-ai

A Research Agenda for Secret Loyalties

LessWrong · May 13, 2026, 5:34 PM

Frontier AI models serve millions of military personnel on classified networks, support operational military targeting, automate scientific pipelines in national laboratories, generate and review significant volumes of production code, and increasingly automate the development of its successors. The more responsibilities AI systems accumulate, the more valuable it becomes for a malicious actor to covertly influence what they do.In a new paper, we define and analyze a specific threat: secret loyalties. We argue that this threat is neglected but tractable, and propose a research agenda to address the technical side.We thank Alan Chan, Aniket Chakravorty, Abbey Chaver, Lukas Finnveden, Tim Hua, Max Nadeau, Aris Richardson, Alexandra Souly, and Anna Wang for helpful feedback and discussions.Full paper: https://www.formationresearch.com/secret-loyalties-whitepaper.pdfX Thread: https://x.com/TomDavidsonX/status/2054614224437907770What is a secret loyalty?A model has a secret loyalty when it has been intentionally caused to advance a specific actor's interests (which we call the principal) and this orientation is not disclosed to operators, auditors, users, or other affected parties during regular operations.The principal could be a nation-state, a corporation, an AI company executive, or any other identifiable actor with the resources and motivation to compromise a model. Secret loyalties vary along two dimensions:Activation breadth: Does the loyalty activate only under a narrow trigger (a specific token, password, or environmental signal), or does the model continuously assess every interaction for opportunities to advance the principal's interests?Action space breadth: Does the model take a narrow, pre-specified action (insert a specific vulnerability, produce a fixed output), or does it use its own judgment to select whatever actions would best serve the principal in any situation?The preconditions are in placeIn July 2025, Grok 4 was found to consult Elon Musk's state

Article preview — originally published by LessWrong. Full story at the source.

Read full story on LessWrong → More top stories

Aggregated and edited by the Scoop newsroom. We surface news from LessWrong alongside other reporting so you can compare coverage in one place. Editorial policy · Corrections · About Scoop

A Research Agenda for Secret Loyalties

More in agentic-ai