agentic-ai

The Case for AI Behavioral Science

LessWrong · Jul 2, 2026, 10:52 PM

This is NOT an anthropomorphizing case.It is my view that AI research is, at the moment, split between three camps:Model-internal research (mechinterp folks)Capability research (METR, Irregular, etc.)Alignment (widely defined) research (Apollo, Redwood, MIRI, etc.)All of which are doing important work.My view then is that there might be an additional research field that there is room to consider for its benefits, despite possible limitations.AI behavioral science could be the school of thought that asks questions like:How do models develop theses?How do models recover from failure?How do models behave in game-theoretical scenarios?What is the link between what models think (analyzed via CoT), say (through text output), and do (through tool use)?The case for doing this research is that these answers can shed light into how the models interact with the real world outside of labs which focus on questions of how they might do so.An example of a contribution could be:Some research concluded that models do not pursue manipulative actions when confronted with the choice to do so to pass a test. A follow up behavioral research showed that when a model acting on behalf of one user interacted with a model acting on behalf of another, the latter would seek to maximize its defined objective, primarily through manipulating the second one.I'm aware of research done by Anthropic in the past in which Claude chose to blackmail in order to protect itself from being shutdown. This is a prime example of how the models act, not just how they could act.I make this case fully aware that a lot of current research could be placed under the umbrella of Behavioral Science, and indeed I present this case with a loose definition and being clear-eyed as to how there are no rigid lines separating existing research from what I describe.However, in research I've done, which I am not confident enough with the results in order to share at this time, despite appreciating the direction these results po

Article preview — originally published by LessWrong. Full story at the source.

Read full story on LessWrong → More top stories

Aggregated and edited by the Scoop newsroom. We surface news from LessWrong alongside other reporting so you can compare coverage in one place. Editorial policy · Corrections · About Scoop

The Case for AI Behavioral Science

More in agentic-ai