Thoughts on Likelihood of Existential Risks by Misaligned AIs
TLDR:AI safety is confusing to navigate, because it is a pre-paradigmatic field composed of people making different, theoretical arguments for why x-risk is likely (or unlikely).Arguments that x-risk is likely are unfalsifiable and have little empirical evidence. This does not mean they’re wrong.Much of your probability of x-risk boils down to your priors, and whether you more heavily weight theory or empiricism I think AI safety is important to work on, but I’m optimistic that alignment will be solved through iterative development of technology.In this post, the cofounders of Mechanize, Tamay Besiroglu, Matthew Barnett, and Ege Erdil, wrote a rebuttal of the case that current trends of AI progress will likely lead to a misaligned superintelligence leading to extinction. I will call the people who believe this view pessimists (people with p(doom) > 50%). The post is not arguing that we do not need safety research, but rather that they expect safety to work.Their first claim: “...there is no standard argument to respond to, no single text that unifies the AI safety community.”This argument is further explained in this article, written by the blogger a3orn. For example, many cite Yudkowsky’s arguments as their main source for concern. This argument roughly goes: under sufficient optimization pressure, we should expect an AI to act as an “optimizer” for certain values. These values are likely to be different from ours, due to goal misgeneralization, and even small differences in values result in the AI optimizing for goals that kill everyone. Alex Turner meanwhile does not find the “reward optimizer” hypothesis or “inner/outer misalignment” distinction plausible. Central figures in AI safety are unable to come to a consensus despite 100+ hours of debate. Richard Ngo points out five clusters of alignment researchers. Some are focused on LLM safety, while others, like Steve Byrnes, think the big risks lie not with LLMs, but with future architectures that will likely be de