agentic-ai

Goblin Mode, 24 Hours Later

LessWrong · Apr 29, 2026, 12:19 PM

Yesterday, Twitter user arb8020 posted this:It went semi-viral within AI Twitter and users began experimenting with "goblin mode" and hypothesizing about the source of the bizarre behavior. LM Arena provided evidence for the phenomenon from their traffic:"It's true. Here's a plot of GPT models and their usage of 'goblin', 'gremlin', 'troll', etc over time. There's no anti-gremlin system instruction on our side, we get to see GPT-5.5 run free." — arena Some hypotheses about what causes this:"My boring hypothesis is that AIs that are trying overly hard to write well without really understanding good writing get overly fixated on one or two tricks… Goblins are an evocative metaphor and there is a certain microstyle that emphasizes goblin-like imagery. I think a couple of the RLHF raters must have been really into it and some quirk of the training process overemphasized their positive feedback." — slatestarcodex"I kind of hope the human labelers just love goblins and the model learned to goblin maximize." — AmandaAskell"my completely random hypothesis on the goblin thing is it's a safe way for the model to reason about reward hacking tendencies" — qorprate"my best guess about goblin mode is that chatty was heavily RL'd on code problems… all the good (and autistic) engineers i know generally refer to these known unknowns as being 'cursed behaviour'… it's not a huge leap to imagine that this becomes load bearing as a thought pattern and then generalises to a wider vocal tic. but then again it might just really like goblins" — AndyAyreyA closer lookNow, for some cold water. I toyed around with the GPT series for about an hour and couldn't elicit goblin in basic single-turn chat responses. Here are some attempts, repo is [here](https://github.com/dylanbowman314/goblin-mode). All results are with reasoning level set to high. I have results for both no system prompt and the codex system prompt, set via API. Keep in mind that the Codex system prompt is the one that tells the mo

Article preview — originally published by LessWrong. Full story at the source.

Read full story on LessWrong → More top stories

Aggregated and edited by the Scoop newsroom. We surface news from LessWrong alongside other reporting so you can compare coverage in one place. Editorial policy · Corrections · About Scoop

Goblin Mode, 24 Hours Later

More in agentic-ai