agentic-ai

Construct validity of Claude Opus 4.8's System Card – A commentary

LessWrong · Jun 12, 2026, 12:52 AM

TL;DR: A read of the Claude Opus 4.8 system card with a focus on alignment assessment and construct validity of evaluation methods. Three main concerns: 1) chain-of-thought monitoring misses reasoning that never surfaces in the text; 2) evaluation awareness is under-estimated; 3) the evaluators come largely from the same model family, so agreement may reflect shared assumptions. None of this shows Opus 4.8 is unsafe but only that some verdicts are more confident than the methods warrant.Introduction Claude Opus 4.8’s system card is particularly thorough and honest about its own limits. I go through its main results and stop where I think there is an implicit assumption that does more work than stated. I find there are a few recurring patterns: a reassuring conclusion grounded on a comparison or an evaluation that the card’s own findings put in doubt. It is most evident in the alignment assessment section, where I focus on three main concerns. I do not believe that any of this individually indicates that Opus 4.8 is unsafe. My claim is more specific: in some places the card’s reassurance outruns the evidence it can actually give. The alignment verdict (’very low’) must be read in the context of behavioural metrics with questionable construct validity, of evaluation awareness, and of model judges carrying out alignment assessments that are very similar. The agentic safety section reports regression in adversarial robustness on computer use that is under-addressed. The remaining observations that I make are softer.Responsible scaling policy evaluationsAnthropic ran no new Risk Report for Opus 4.8 on the grounds that it ‘does not advance the capability frontier beyond Mythos Preview’ (§2.1.3), and so inherits Mythos’s profile. The premise is that the non-advancement makes RSP evaluations mere duplicates. It is doubtful whether these assumptions can justify this decision. It seems that this equivalence would require Opus 4.8 to be at most as capable as Mythos on every re

Article preview — originally published by LessWrong. Full story at the source.

Read full story on LessWrong → More top stories

Aggregated and edited by the Scoop newsroom. We surface news from LessWrong alongside other reporting so you can compare coverage in one place. Editorial policy · Corrections · About Scoop

Construct validity of Claude Opus 4.8's System Card – A commentary

More in agentic-ai