A Year Late, Claude Finally Beats Pokémon
Credit: Claude Plays Pokemon Elevator Shanty by Kurukkoo Disclaimer: like some previous posts in this series, this was not primarily written by me, but by a friend. I did substantial editing, however.Claude Plays Pokemon feat. Opus 4.7 has finally beaten Pokémon Red, fulfilling the challenge set over a year ago when LLMs playing Pokémon went briefly, slightly viral.Victory Screen!Let's get the throat-clearing out of the way: this doesn't make 4.7 a clear breakthrough in intelligence over 4.6 or 4.5. It's smarter, yes, as we'll discuss below, but not by something one could honestly call a big leap. Rather, step changes have finally accumulated to the point of victory.And to give other models their fair shake: after criticism over its elaborate harness,[1] GeminiPlaysPokemon has beaten Pokémon with progressively weaker harnesses, including about two months ago with a harness comparable to the one Claude uses.[2]As such, this is a bit of a valedictory post, closing off the cycle of Claude playing Pokémon Red, relating anecdotes for the fun of it, and discussing improvements in Opus 4.7, as well as speculating a bit on what this has all meant.Retrospective Anecdotes on Claude 4.5 and 4.6Our last post, on Opus 4.5, was made at a time when it was stuck in Silph Co. for a few days, though this was not mentioned in the post. In fact, it ended up stuck there for weeks, exceeding how long it was stuck at any previous obstacle, though it did eventually make it through. At just over 50k reasoning steps, the adventure in Silph Co. took longer than the entire run 29k step run leading up to that.The exact issue was that Opus 4.5 was uniquely convinced out of all the Claudes that items on the ground were worthless, and moreover consistently failed to even see them (the harness had a small text label indicating it was an NPC or item). This is a problem when the Card Key necessary for progression is an item ON THE GROUND in an out-of-the-way part of the facility. Silph Co. also happens to