The actions of others act as a pseudo-reward to drive imitation in the context of social reinforcement learning

Autor: Stefano Palminteri, Bahador Bahrami, Emmanuelle Bonnet, Anis Najar
Jazyk: angličtina
Rok vydání: 2020
Předmět:
Male
0301 basic medicine
Value (ethics)
Social Sciences
Task (project management)
Learning and Memory
Cognition
Mathematical and Statistical Techniques
0302 clinical medicine
Reinforcement
Social

Psychology
Reinforcement learning
Biology (General)
media_common
Simulation and Modeling
Experimental Design
General Neuroscience
Statistics
Research Design
Autocorrelation
Physical Sciences
Engineering and Technology
Female
Imitation
General Agricultural and Biological Sciences
Reinforcement
Psychology

Research Article
Cognitive psychology
Adult
Process (engineering)
QH301-705.5
media_common.quotation_subject
Decision Making
Context (language use)
Biology
Research and Analysis Methods
Affect (psychology)
Action selection
General Biochemistry
Genetics and Molecular Biology

Young Adult
Human Learning
03 medical and health sciences
Reward
Humans
Learning
Statistical Methods
Behavior
General Immunology and Microbiology
Cognitive Psychology
Biology and Life Sciences
Models
Theoretical

Imitative Behavior
Social Learning
030104 developmental biology
Signal Processing
Cognitive Science
Mathematics
030217 neurology & neurosurgery
Neuroscience
Zdroj: PLoS Biology, Vol 18, Iss 12, p e3001028 (2020)
PLoS Biology
PLOS Biology
ISSN: 1545-7885
1544-9173
Popis: While there is no doubt that social signals affect human reinforcement learning, there is still no consensus about how this process is computationally implemented. To address this issue, we compared three psychologically plausible hypotheses about the algorithmic implementation of imitation in reinforcement learning. The first hypothesis, decision biasing (DB), postulates that imitation consists in transiently biasing the learner’s action selection without affecting their value function. According to the second hypothesis, model-based imitation (MB), the learner infers the demonstrator’s value function through inverse reinforcement learning and uses it to bias action selection. Finally, according to the third hypothesis, value shaping (VS), the demonstrator’s actions directly affect the learner’s value function. We tested these three hypotheses in 2 experiments (N = 24 and N = 44) featuring a new variant of a social reinforcement learning task. We show through model comparison and model simulation that VS provides the best explanation of learner’s behavior. Results replicated in a third independent experiment featuring a larger cohort and a different design (N = 302). In our experiments, we also manipulated the quality of the demonstrators’ choices and found that learners were able to adapt their imitation rate, so that only skilled demonstrators were imitated. We proposed and tested an efficient meta-learning process to account for this effect, where imitation is regulated by the agreement between the learner and the demonstrator. In sum, our findings provide new insights and perspectives on the computational mechanisms underlying adaptive imitation in human reinforcement learning.
This study investigates imitation from a computational perspective; three experiments show that, in the context of reinforcement learning, imitation operates via a durable modification of the learner's values, shedding new light on how imitation is computationally implemented and shapes learning and decision-making.
Databáze: OpenAIRE
Nepřihlášeným uživatelům se plný text nezobrazuje