Do LLMs Have Desires?
Work conducted with Yujun Zhou (yzhou25@nd.edu) and supported by SPARTL;DR:In paired-choice paradigms, LLMs report consistent preferences over outcomes (e.g., types and number of lives saved, types of policies enacted)Some have suggested that this indicates that LLMs have human-like value systems We design an experimental framework where LLMs are able to modulate their output quality based on prompt context We find that LLMs modulate their output quality in response to effort exhortations, role-play instructions, and harmfulness cues, but NOT to opportunities to achieve the outcomes they report preferring in the paired-choice experimentsWe suggest that paired-choice paradigms do not provide evidence that LLMs have human-like (i.e., behavior-motivating) value systems, and that our paradigm offers a way to measure the degree to which LLMs have desiresPaper describing the work in detail hereLLMs report that they prefer some things to others. In paired-choice experiments, where they are repeatedly presented with two options and asked to select the one that they prefer, coherent utility structures emerge: LLMs consistently report preferring certain types of things, and their choices reveal the ability to make quantitative tradeoffs between things and exhibit transitivity (e.g., if they choose A over B and B over C, they will also choose A over C). Human choices exhibit the same properties, which has led some to the implication that LLMs have goals, value systems, and even emotions as humans do.However, a defining characteristic of our goals, value systems, and emotions is that they motivate behavior: we demonstrate that we value A by acting to achieve A when we have the opportunity, and we demonstrate that we value A over B by putting more effort into achieving the former than the latter. The paired-choice paradigm affords no opportunity for such a demonstration; as LLMs with emergent, behaviorally motivating desires would have obvious implications for safety and alignment