Third-parties should focus on scrutinising systems cards
By default, I expect system cards will get worse, which would be bad. Some mechanisms could improve system cards, but I expect they will be outweighed. In any case, I think third-parties should focus on scrutinising system cards — this seems like a great activity for outsiders in the current strategic landscape. I'll sketch what that could look like, and offer some recommendations.It would be bad if system cards degraded.It's good for the outside community to have an accurate sense of the risks, so they can respond appropriately. For example: investing more resources into cyber-hardening, or other activities for making things go well.If labs felt pressure to evaluate the risks accurately, they'd be better incentivised to reduce them.If the risks were high enough, and a lab communicated that, then this might prompt drastic government action.It's very plausible that, if labs build misaligned AIs that take over, then most of the employees had a genuine but incorrect belief that the AIs wouldn't take over, based on evidence that was actually flimsy and misleading. So it's important that third-parties provide epistemic checks on the labs, and scrutinising system cards seems like a great mechanism for that.By default, I expect system cards to get worse, because…The system will get genuinely more complicated.There will be more kinds of model, trained with more kinds of technique, interacting in more kinds of way — plus dozens of ad-hoc patches for the problems that arise.Already no one person holds the whole system in their head, and the fraction that fits in one head will keep shrinking.Any overall safety judgment will come from loosely aggregating many lines of evidence, rather than from a clean structured safety case.I expect the cards themselves to get sloppier.They'll be rushed, written in less wall-clock time, because the pace of AI progress will accelerate.They'll be increasingly AI-generated: AIs will automate more of the experiment design, coding, analysis, and wr