Prioritized Sweeping Neural DynaQ with Multiple Predecessors, and Hippocampal Replays
Autor: | Lise Aubin, Mehdi Khamassi, Benoît Girard |
---|---|
Přispěvatelé: | Institut des Systèmes Intelligents et de Robotique (ISIR), Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS), Architectures et modèles d'Adptation et de la cognition (AMAC), Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS)-Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS), ANR-11-LABX-0065,SMART,Interactions humain/Machine/Humain intelligentes dans la société numérique(2011), European Project: 640891,H2020,H2020-FETPROACT-2014,DREAM(2015) |
Jazyk: | angličtina |
Rok vydání: | 2018 |
Předmět: |
0301 basic medicine
FOS: Computer and information sciences Neural Networks Process (engineering) Computer science Computer Science - Artificial Intelligence [INFO.INFO-NE]Computer Science [cs]/Neural and Evolutionary Computing [cs.NE] Hippocampus Task (project management) [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] 03 medical and health sciences 0302 clinical medicine Reinforcement learning Neural and Evolutionary Computing (cs.NE) Artificial neural network business.industry Replays DynaQ [SCCO.NEUR]Cognitive science/Neuroscience Computer Science - Neural and Evolutionary Computing Reinforcement Learning Navigation 030104 developmental biology Artificial Intelligence (cs.AI) Memory consolidation Artificial intelligence Prioritized Sweeping business 030217 neurology & neurosurgery |
Zdroj: | Biomimetic and Biohybrid Systems. Living Machines 2018. Biomimetic and Biohybrid Systems. Living Machines 2018., Jul 2018, Paris, France. pp.16-27, ⟨10.1007/978-3-319-95972-6_4⟩ Biomimetic and Biohybrid Systems ISBN: 9783319959719 Living Machines |
Popis: | During sleep and awake rest, the hippocampus replays sequences of place cells that have been activated during prior experiences. These have been interpreted as a memory consolidation process, but recent results suggest a possible interpretation in terms of reinforcement learning. The Dyna reinforcement learning algorithms use off-line replays to improve learning. Under limited replay budget, a prioritized sweeping approach, which requires a model of the transitions to the predecessors, can be used to improve performance. We investigate whether such algorithms can explain the experimentally observed replays. We propose a neural network version of prioritized sweeping Q-learning, for which we developed a growing multiple expert algorithm, able to cope with multiple predecessors. The resulting architecture is able to improve the learning of simulated agents confronted to a navigation task. We predict that, in animals, learning the world model should occur during rest periods, and that the corresponding replays should be shuffled. Living Machines 2018 (Paris, France) |
Databáze: | OpenAIRE |
Externí odkaz: |