World-modeling the US vs. Anthropic Standoff on Claude Fable
I spent the last two days doing a deep dive in forecasting outcomes of the US forcing Anthropic to take down Claude Fable. I did this for two reasons: (a) I want to know when I'll get Fable back for my research, and (b) the outcome will set a major precedent for US AI regulation.(For those who want background, the most up-to-date and comprehensive summary I could find is @Zvi 's post from June 17. I'll assume here you know the basic details of the situation.)My world model's conclusions were interesting, but I'm writing this up here mostly because the epistemic process, and what I learned about managing a large amount of AI research without spinning into unreasonableness.My central challenges were thatI can't rule out 4 different versions of what happened that caused the the June 12 order in the first place.There are many outcomes to forecast, from who gets access to when, to what new policies are enacted, to how Anthropic might change Fable or their release practices.There are informational updates almost every day, requiring a re-evaluation of almost everything.I ended up with a large combination of unconditional and conditional forecasting questions, in total 33 I consider critical. This is too many for a human, or crowd of humans, to do at high quality. And if they did, it would take weeks and we'd miss the window for the information to be useful to people planning to use Fable, or people working on US AI policy.It's worth stating, prediction markets cover the major outcomes, so we have a crowd of humans to compare against results of this world-modeling method. It also means, if you ultimately don't trust this process, there is some basic information to fall back to on likely timelines.[Disclaimer: I used FutureSearch's proprietary forecaster for this, which I help build. Our evals indicate this is much more accurate than just prompting a high-effort frontier model with each forecasting question, but the world-modeling process I lay out in this piece should work