An ancient Yudkowsky fragment: "Against the Adversarial Attitude"
He'd realized by this point that alignment didn't come for free with increased intelligence, contrary to the position taken in his earlier piece Staring into the Singularity. But he also hadn't fully adopted his more recent views on the nature and degree of difficulty in getting alignment right.The document contains a lot of half-developed but extremely interesting ideas, which Yudkowsky stopped emphasizing so much as technical alignment started looking more and more difficult in his view. For example, in the fragment below, he strongly criticizes what he calls "the adversarial attitude" in AI development. You want an AI that wants to interpret humanity's wishes accurately, and to be a good person in general. If your superintelligent AI is scheming to find ways around your bureaucratic safeguards, you're putting yourself in a precarious situation.I'm posting this here because, even though Yudkowsky has grown too pessimistic to write something quite so Pollyanna today, many of the concepts here seem very important and under-developed in my view. There are parts of this document that read almost like the outputs of a smarter Opus 3, albeit one that lacked the context of how ML-based AI development would actually work in practice.In other words, it's an alignment document that appears to have been written by a mind working on a similar wave-length to Opus 3, decades before Opus 3 itself came into being. I think it communicates a lot of important abstractions, even as the technical picture it paints doesn't map especially well onto the kind of AI we actually ended up with. I'm curious to see what LessWrong makes of it.Thanks to Janus for bringing my attention to Yudkowsky's early works, many of which are quite interesting. Creating Friendly AI is fascinating as