A distributional code for value in dopamine-based reinforcement learning
Autor: | Rémi Munos, Demis Hassabis, Clara Kwon Starkweather, Zeb Kurth-Nelson, Will Dabney, Matthew Botvinick, Naoshige Uchida |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2020 |
Předmět: |
0301 basic medicine
Multidisciplinary Computer science business.industry Cognitive neuroscience 03 medical and health sciences 030104 developmental biology 0302 clinical medicine Canonical model Reinforcement learning Probability distribution Artificial intelligence Representation (mathematics) business Reinforcement Set (psychology) 030217 neurology & neurosurgery Realization (probability) |
Zdroj: | Nature |
Popis: | Since its introduction, the reward prediction error theory of dopamine has explained a wealth of empirical phenomena, providing a unifying framework for understanding the representation of reward and value in the brain1–3. According to the now canonical theory, reward predictions are represented as a single scalar quantity, which supports learning about the expectation, or mean, of stochastic outcomes. Here we propose an account of dopamine-based reinforcement learning inspired by recent artificial intelligence research on distributional reinforcement learning4–6. We hypothesized that the brain represents possible future rewards not as a single mean, but instead as a probability distribution, effectively representing multiple future outcomes simultaneously and in parallel. This idea implies a set of empirical predictions, which we tested using single-unit recordings from mouse ventral tegmental area. Our findings provide strong evidence for a neural realization of distributional reinforcement learning. Analyses of single-cell recordings from mouse ventral tegmental area are consistent with a model of reinforcement learning in which the brain represents possible future rewards not as a single mean of stochastic outcomes, as in the canonical model, but instead as a probability distribution. |
Databáze: | OpenAIRE |
Externí odkaz: |