A Survey on Policy Search Algorithms for Learning Robot Controllers in a Handful of Trials

Autor: Konstantinos Chatzilygeroudis, Vassilis Vassiliades, Freek Stulp, Sylvain Calinon, Jean-Baptiste Mouret
Přispěvatelé: Lifelong Autonomy and interaction skills for Robots in a Sensing ENvironment (LARSEN), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Complex Systems, Artificial Intelligence & Robotics (LORIA - AIS), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Ecole Polytechnique Fédérale de Lausanne (EPFL), Research Centre on Interactive Media, Smart Systems and Emerging Technologies (RISE), German Aerospace Center (DLR), IDIAP Research Institute, ANR-18-CHR3-0001,Heap,Human-Guided Learning and Benchmarking of Robotic Heap Sorting(2018), European Project: 637972,H2020 ERC,ERC-2014-STG,ResiBots(2015), European Project: 731540,H2020,An.Dy(2017), European Project: 780684,H2020,MEMMO(2018), Research Centre on Interactive Media, Smart Systems and Emerging Tech- nologies (RISE), ANR-18-CHR3-0001,HEAP,HUMAN-GUIDED LEARNING AND BENCHMARKING OF ROBOTIC HEAP SORTING(2018)
Jazyk: angličtina
Předmět:
Zdroj: IEEE Transactions on Robotics
IEEE Transactions on Robotics, IEEE, 2020, 36 (2), pp.328-347. ⟨10.1109/TRO.2019.2958211⟩
IEEE Transactions on Robotics, IEEE, In press, ⟨10.1109/TRO.2019.2958211⟩
IEEE Transactions on Robotics, 2020, 36 (2), pp.328-347. ⟨10.1109/TRO.2019.2958211⟩
ISSN: 1941-0468
1552-3098
2374-958X
DOI: 10.1109/tro.2019.2958211
Popis: Most policy search algorithms require thousands of training episodes to find an effective policy, which is often infeasible with a physical robot. This survey article focuses on the extreme other end of the spectrum: how can a robot adapt with only a handful of trials (a dozen) and a few minutes? By analogy with the word "big-data", we refer to this challenge as "micro-data reinforcement learning". We show that a first strategy is to leverage prior knowledge on the policy structure (e.g., dynamic movement primitives), on the policy parameters (e.g., demonstrations), or on the dynamics (e.g., simulators). A second strategy is to create data-driven surrogate models of the expected reward (e.g., Bayesian optimization) or the dynamical model (e.g., model-based policy search), so that the policy optimizer queries the model instead of the real system. Overall, all successful micro-data algorithms combine these two strategies by varying the kind of model and prior knowledge. The current scientific challenges essentially revolve around scaling up to complex robots (e.g., humanoids), designing generic priors, and optimizing the computing time.
Comment: 21 pages, 3 figures, 4 algorithms, accepted at IEEE Transactions on Robotics
Databáze: OpenAIRE