Stick to your role! Stability of personal values expressed in large language models.

Autor: Kovač G; Flowers Team, INRIA, Bordeaux, France., Portelas R; Flowers Team, INRIA, Bordeaux, France.; Ubisoft La Forge, Bordeaux, France., Sawayama M; Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan., Dominey PF; INSERM UMR1093-CAPS, Université Bourgogne, Dijon, France.; Robot Cognition Laboratory, Institute Marey, Dijon, France., Oudeyer PY; Flowers Team, INRIA, Bordeaux, France.
Jazyk: angličtina
Zdroj: PloS one [PLoS One] 2024 Aug 26; Vol. 19 (8), pp. e0309114. Date of Electronic Publication: 2024 Aug 26 (Print Publication: 2024).
DOI: 10.1371/journal.pone.0309114
Abstrakt: The standard way to study Large Language Models (LLMs) through benchmarks or psychology questionnaires is to provide many different queries from similar minimal contexts (e.g. multiple choice questions). However, due to LLM's highly context-dependent nature, conclusions from such minimal-context evaluations may be little informative about the model's behavior in deployment (where it will be exposed to many new contexts). We argue that context-dependence should be studied as another dimension of LLM comparison alongside others such as cognitive abilities, knowledge, or model size. In this paper, we present a case-study about the stability of value expression over different contexts (simulated conversations on different topics), and as measured using a standard psychology questionnaire (PVQ) and behavioral downstream tasks. We consider 21 LLMs from six families. Reusing methods from psychology, we study Rank-order stability on the population (interpersonal) level, and Ipsative stability on the individual (intrapersonal) level. We explore two settings: with and without instructing LLMs to simulate particular personalities. We observe similar trends in the stability of models and model families-Mixtral, Mistral, GPT-3.5 and Qwen families being more stable than LLaMa-2 and Phi-over those two settings, two different simulated populations, and even on three downstream behavioral tasks. When instructed to simulate particular personas, LLMs exhibit low Rank-Order stability, and this stability further diminishes with conversation length. This highlights the need for future research directions on LLMs that can coherently simulate a diversity of personas, as well as how context-dependence can be studied in more thorough and efficient ways. This paper provides a foundational step in that direction, and, to our knowledge, it is the first study of value stability in LLMs. The project website with code is available at https://sites.google.com/view/llmvaluestability.
Competing Interests: The authors have declared that no competing interests exist.
(Copyright: © 2024 Kovač et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
Databáze: MEDLINE
Nepřihlášeným uživatelům se plný text nezobrazuje