Counting Arguments in AI Safety
A counting argument is a style of argument that looks something like this:We are drawing from a space where there are many more Xs than Ys Therefore, absent any strong reason to expect Ys, we are much more likely to get Xs For example, when trying to answer the question “what is the probability that superintelligent systems will want to kill us” we might use an argument like:A superintelligent AI will land somewhere in a vast space of possible goals. The goals compatible with our survival occupy only a tiny corner of that space.Absent a good reason to believe our training process selects strongly for human-friendly goals, it is much more likely that the goals of the AI end up somewhere else - in the region where everyone dies.I wonder what you think of this argument. Does it sound familiar to you? Does it seem reasonable?Bertrand's ParadoxConsider an equilateral triangle inscribed in a circle. Suppose a chord of the circle is chosen at random. What is the probability that the chord is longer than a side of the triangle?Method 1: random endpoints. Pick two points at random on the circumference and draw the chord between them. By rotational symmetry, fix one endpoint at a vertex of the triangle. The chord is longer than a side iff the other endpoint lands on the arc between the two opposite vertices — one third of the circumference. The probability is 1/3.Method 2: random radial point. Pick a radius at random, then pick a point on that radius uniformly, then draw the chord perpendicular to the radius at that point. The chord is longer than a side iff the point lies within half the radius of the centre. The probability is 1/2.Method 3: random midpoint. Pick a point at random inside the disk; it is the midpoint of a unique chord. The chord is longer than a side iff the midpoint l