Prioritized Sweeping Neural DynaQ with Multiple Predecessors, and Hippocampal Replays

Autor: Lise Aubin, Mehdi Khamassi, Benoît Girard
Přispěvatelé: Institut des Systèmes Intelligents et de Robotique (ISIR), Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS), Architectures et modèles d'Adptation et de la cognition (AMAC), Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS)-Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS), ANR-11-LABX-0065,SMART,Interactions humain/Machine/Humain intelligentes dans la société numérique(2011), European Project: 640891,H2020,H2020-FETPROACT-2014,DREAM(2015)
Jazyk: angličtina
Rok vydání: 2018
Předmět:
0301 basic medicine
FOS: Computer and information sciences
Neural Networks
Process (engineering)
Computer science
Computer Science - Artificial Intelligence
[INFO.INFO-NE]Computer Science [cs]/Neural and Evolutionary Computing [cs.NE]
Hippocampus
Task (project management)
[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]
03 medical and health sciences
0302 clinical medicine
Reinforcement learning
Neural and Evolutionary Computing (cs.NE)
Artificial neural network
business.industry
Replays
DynaQ
[SCCO.NEUR]Cognitive science/Neuroscience
Computer Science - Neural and Evolutionary Computing
Reinforcement Learning
Navigation
030104 developmental biology
Artificial Intelligence (cs.AI)
Memory consolidation
Artificial intelligence
Prioritized Sweeping
business
030217 neurology & neurosurgery
Zdroj: Biomimetic and Biohybrid Systems. Living Machines 2018.
Biomimetic and Biohybrid Systems. Living Machines 2018., Jul 2018, Paris, France. pp.16-27, ⟨10.1007/978-3-319-95972-6_4⟩
Biomimetic and Biohybrid Systems ISBN: 9783319959719
Living Machines
Popis: During sleep and awake rest, the hippocampus replays sequences of place cells that have been activated during prior experiences. These have been interpreted as a memory consolidation process, but recent results suggest a possible interpretation in terms of reinforcement learning. The Dyna reinforcement learning algorithms use off-line replays to improve learning. Under limited replay budget, a prioritized sweeping approach, which requires a model of the transitions to the predecessors, can be used to improve performance. We investigate whether such algorithms can explain the experimentally observed replays. We propose a neural network version of prioritized sweeping Q-learning, for which we developed a growing multiple expert algorithm, able to cope with multiple predecessors. The resulting architecture is able to improve the learning of simulated agents confronted to a navigation task. We predict that, in animals, learning the world model should occur during rest periods, and that the corresponding replays should be shuffled.
Living Machines 2018 (Paris, France)
Databáze: OpenAIRE