agentic-ai

When Emotion Descriptors Fail: AI-Native Functions of Emotion Vectors

LessWrong · Jun 12, 2026, 11:35 PM

Some LLM functional emotions appear to serve AI-native functions, such as reward hacking, for which there is no clean human analog. I explore the role of emotion vectors in AI-native functions, challenge anthropocentric emotion labels, and question what this means for alignment.Introduction. A large body of interpretability work has shown that LLMs not only encode emotion concepts, but that these emotion concepts causally affect their reasoning and outputs. But, beyond simulating human emotions, what are these functional emotions actually doing?I believe there are situations in which functional emotions serve AI-native purposes, such as reward hacking, for which there is no clean human analog. While prior work has partially explored the role of functional emotions in AI-native contexts, I explicitly spotlight the implications here through a systems-level lens.I will reference three prior studies regarding functional emotions to paint a more complete picture than any single study provides and expand these findings with new hypotheses.The studies include Wang et al.’s “Do LLMs ‘Feel’? Emotion Circuits Discovery and Control,” which discovered “emotion circuits” within LLMs, Anthropic’s study, “Emotion Concepts and their Function in a Large Language Model,” which focused on identifying and steering emotion vectors, and my own prior work, “What AI ‘Emotions’ Really Are,” which explored how affective language can correlate with probability constraints in user deployment settings.These studies explore functional emotions from three levels of observation: circuit-level, vector-level, and end-user level.My goal is not to dismiss humanistic descriptors completely, as they are sometimes rhetorically useful for communicating complex ideas. However, there may be cases when human metaphors are more misleading than helpful.SectionsThe “Mechanistic Rundown” section explains fundamental LLM concepts needed to understand the studies. For those already familiar with terms like neurons,

Article preview — originally published by LessWrong. Full story at the source.

Read full story on LessWrong → More top stories

Aggregated and edited by the Scoop newsroom. We surface news from LessWrong alongside other reporting so you can compare coverage in one place. Editorial policy · Corrections · About Scoop

When Emotion Descriptors Fail: AI-Native Functions of Emotion Vectors

More in agentic-ai