A Survey on Policy Search Algorithms for Learning Robot Controllers in a Handful of Trials
Autor: | Konstantinos Chatzilygeroudis, Vassilis Vassiliades, Freek Stulp, Sylvain Calinon, Jean-Baptiste Mouret |
---|---|
Přispěvatelé: | Lifelong Autonomy and interaction skills for Robots in a Sensing ENvironment (LARSEN), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Complex Systems, Artificial Intelligence & Robotics (LORIA - AIS), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Ecole Polytechnique Fédérale de Lausanne (EPFL), Research Centre on Interactive Media, Smart Systems and Emerging Technologies (RISE), German Aerospace Center (DLR), IDIAP Research Institute, ANR-18-CHR3-0001,Heap,Human-Guided Learning and Benchmarking of Robotic Heap Sorting(2018), European Project: 637972,H2020 ERC,ERC-2014-STG,ResiBots(2015), European Project: 731540,H2020,An.Dy(2017), European Project: 780684,H2020,MEMMO(2018), Research Centre on Interactive Media, Smart Systems and Emerging Tech- nologies (RISE), ANR-18-CHR3-0001,HEAP,HUMAN-GUIDED LEARNING AND BENCHMARKING OF ROBOTIC HEAP SORTING(2018) |
Jazyk: | angličtina |
Předmět: |
FOS: Computer and information sciences
robotics Computer Science - Machine Learning reinforcement learning Computer Science - Artificial Intelligence Robot Learning Machine Learning (stat.ML) Micro-Data Policy Search Data-efficiency Autonomous Agents Machine Learning (cs.LG) [SPI.AUTO]Engineering Sciences [physics]/Automatic [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] Computer Science - Robotics Artificial Intelligence (cs.AI) Statistics - Machine Learning Machine learning [INFO.INFO-RB]Computer Science [cs]/Robotics [cs.RO] Robotics (cs.RO) |
Zdroj: | IEEE Transactions on Robotics IEEE Transactions on Robotics, IEEE, 2020, 36 (2), pp.328-347. ⟨10.1109/TRO.2019.2958211⟩ IEEE Transactions on Robotics, IEEE, In press, ⟨10.1109/TRO.2019.2958211⟩ IEEE Transactions on Robotics, 2020, 36 (2), pp.328-347. ⟨10.1109/TRO.2019.2958211⟩ |
ISSN: | 1941-0468 1552-3098 2374-958X |
DOI: | 10.1109/tro.2019.2958211 |
Popis: | Most policy search algorithms require thousands of training episodes to find an effective policy, which is often infeasible with a physical robot. This survey article focuses on the extreme other end of the spectrum: how can a robot adapt with only a handful of trials (a dozen) and a few minutes? By analogy with the word "big-data", we refer to this challenge as "micro-data reinforcement learning". We show that a first strategy is to leverage prior knowledge on the policy structure (e.g., dynamic movement primitives), on the policy parameters (e.g., demonstrations), or on the dynamics (e.g., simulators). A second strategy is to create data-driven surrogate models of the expected reward (e.g., Bayesian optimization) or the dynamical model (e.g., model-based policy search), so that the policy optimizer queries the model instead of the real system. Overall, all successful micro-data algorithms combine these two strategies by varying the kind of model and prior knowledge. The current scientific challenges essentially revolve around scaling up to complex robots (e.g., humanoids), designing generic priors, and optimizing the computing time. Comment: 21 pages, 3 figures, 4 algorithms, accepted at IEEE Transactions on Robotics |
Databáze: | OpenAIRE |
Externí odkaz: |