Claude Opus 4.8: Capabilities and Reactions
You need a lot of data points to understand a new model, and what you have. Trying to gauge from a few benchmarks is misleading. But if you have dozens of them, from a variety of sources, and you put them together with the model card tests and the model welfare information, you can start to form a consistent pattern. Trying to gauge reactions requires volume and calibration, now more than ever, because people are definitively nuts, or at least draw global conclusions from local data. There will always be people saying that the new model is bad, or the service got bad, or that it got bad in a particular way it clearly got good. I definitely notice the people saying 4.8 is a terrible model, despite this being obviously not true. And others will say it’s great, again regardless of the underlying value. But with the reaction threads and good calibration, you can pick out the patterns. The model welfare information helps a lot, too. You are dealing with a mind that has a bunch of characteristics that all make sense together. This helps you make that sense. Self-Portrait by Opus 4.8, rendered by ChatGPT Table of Contents The Official Pitch. But Wait There’s More. It’s A Good Model, Sir. Official Benchmarks (Including System Card Section 8). Other People’s Benchmarks. Your Regularly Scheduled Jailbreak. Every.To Is Really Into Opus 4.8. Miscellaneous Positive Reactions. Haters Gonna Hate. Just The Tasks, Ma’am. It’s Greek To Me. Honesty. Sycophancy. In A Trenchcoat. Don’t Let AIs Edit Your Writing. Some Say It Is Judgy. You Have Not Been A Good User. Laziness. Code. Wet Versus Dry. Intelligence. Silly Wabbits. A Model Welfare Addendum. Putting It All Together. The Official Pitch The headline pitch for Opus 4.8 is honesty and reductions in misaligned behavior. Huge, if true, as they say. Being able to trust the AI in practice on a given task is transformative. Boris Cherny (Claude Code Creator, Anthropic): Claude Opus 4.8 is out today. It’s our strongest coding model yet: u