Prioritized Sweeping Neural DynaQ with Multiple Predecessors, and Hippocampal Replays

Autor:	Lise Aubin, Mehdi Khamassi, Benoît Girard
Přispěvatelé:	Institut des Systèmes Intelligents et de Robotique (ISIR), Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS), Architectures et modèles d'Adptation et de la cognition (AMAC), Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS)-Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS), ANR-11-LABX-0065,SMART,Interactions humain/Machine/Humain intelligentes dans la société numérique(2011), European Project: 640891,H2020,H2020-FETPROACT-2014,DREAM(2015)
Jazyk:	angličtina
Rok vydání:	2018
Předmět:	0301 basic medicine FOS: Computer and information sciences Neural Networks Process (engineering) Computer science Computer Science - Artificial Intelligence [INFO.INFO-NE]Computer Science [cs]/Neural and Evolutionary Computing [cs.NE] Hippocampus Task (project management) [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] 03 medical and health sciences 0302 clinical medicine Reinforcement learning Neural and Evolutionary Computing (cs.NE) Artificial neural network business.industry Replays DynaQ [SCCO.NEUR]Cognitive science/Neuroscience Computer Science - Neural and Evolutionary Computing Reinforcement Learning Navigation 030104 developmental biology Artificial Intelligence (cs.AI) Memory consolidation Artificial intelligence Prioritized Sweeping business 030217 neurology & neurosurgery
Zdroj:	Biomimetic and Biohybrid Systems. Living Machines 2018. Biomimetic and Biohybrid Systems. Living Machines 2018., Jul 2018, Paris, France. pp.16-27, ⟨10.1007/978-3-319-95972-6_4⟩ Biomimetic and Biohybrid Systems ISBN: 9783319959719 Living Machines
Popis:	During sleep and awake rest, the hippocampus replays sequences of place cells that have been activated during prior experiences. These have been interpreted as a memory consolidation process, but recent results suggest a possible interpretation in terms of reinforcement learning. The Dyna reinforcement learning algorithms use off-line replays to improve learning. Under limited replay budget, a prioritized sweeping approach, which requires a model of the transitions to the predecessors, can be used to improve performance. We investigate whether such algorithms can explain the experimentally observed replays. We propose a neural network version of prioritized sweeping Q-learning, for which we developed a growing multiple expert algorithm, able to cope with multiple predecessors. The resulting architecture is able to improve the learning of simulated agents confronted to a navigation task. We predict that, in animals, learning the world model should occur during rest periods, and that the corresponding replays should be shuffled. Living Machines 2018 (Paris, France)
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::c9fe490d7aea0c08ca71579aef142e5a https://hal.archives-ouvertes.fr/hal-01709275/document Zobrazit plný text záznamu