Few-shot Quality-Diversity Optimization

Autor:	Achkan, Salehi, Salehi, Achkan, Coninx, Alexandre, Doncieux, Stephane
Přispěvatelé:	Institut des Systèmes Intelligents et de Robotique (ISIR), Centre National de la Recherche Scientifique (CNRS)-Sorbonne Université (SU), Institut Pascal (IP), SIGMA Clermont (SIGMA Clermont)-Université Clermont Auvergne [2017-2020] (UCA [2017-2020])-Centre National de la Recherche Scientifique (CNRS), Université Pierre et Marie Curie - Paris 6 (UPMC)-Centre National de la Recherche Scientifique (CNRS)
Jazyk:	angličtina
Rok vydání:	2022
Předmět:	FOS: Computer and information sciences Computer Science - Machine Learning Control and Optimization Computer Science - Artificial Intelligence Mechanical Engineering Biomedical Engineering Computer Science - Neural and Evolutionary Computing Computer Science Applications Machine Learning (cs.LG) [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] Human-Computer Interaction Artificial Intelligence (cs.AI) [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] Artificial Intelligence Control and Systems Engineering Computer Vision and Pattern Recognition Neural and Evolutionary Computing (cs.NE) ComputingMilieux_MISCELLANEOUS
Zdroj:	IEEE Robotics and Automation Letters IEEE Robotics and Automation Letters, IEEE In press, pp.1-10. ⟨10.1109/LRA.2022.3148438⟩
ISSN:	2377-3766
DOI:	10.1109/LRA.2022.3148438⟩
Popis:	In the past few years, a considerable amount of research has been dedicated to the exploitation of previous learning experiences and the design of Few-shot and Meta Learning approaches, in problem domains ranging from Computer Vision to Reinforcement Learning based control. A notable exception, where to the best of our knowledge, little to no effort has been made in this direction is Quality-Diversity (QD) optimization. QD methods have been shown to be effective tools in dealing with deceptive minima and sparse rewards in Reinforcement Learning. However, they remain costly due to their reliance on inherently sample inefficient evolutionary processes. We show that, given examples from a task distribution, information about the paths taken by optimization in parameter space can be leveraged to build a prior population, which when used to initialize QD methods in unseen environments, allows for few-shot adaptation. Our proposed method does not require backpropagation. It is simple to implement and scale, and furthermore, it is agnostic to the underlying models that are being trained. Experiments carried in both sparse and dense reward settings using robotic manipulation and navigation benchmarks show that it considerably reduces the number of generations that are required for QD optimization in these environments. Comment: Accepted for publication in the IEEE Robotics and Automation Letters (RA-L) journal
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::22d7e88c537dcb60819afaccf8280541 https://hal.archives-ouvertes.fr/hal-03569179 Zobrazit plný text záznamu