Ordinal inverse reinforcement learning applied to robot learning with small data

Autor:	Colomé Figueras, Adrià, Torras, Carme
Přispěvatelé:	European Commission, Universitat Politècnica de Catalunya. Departament de Matemàtiques, Institut de Robòtica i Informàtica Industrial, Universitat Politècnica de Catalunya. ROBiri - Grup de Percepció i Manipulació Robotitzada de l'IRI
Rok vydání:	2022
Předmět:	Learning (artificial intelligence) Reinforcement learning Aprenentatge per reforç Informàtica::Robòtica [Àrees temàtiques de la UPC] Intelligent robots Robots
Zdroj:	IEEE International Conference on Intelligent Robots and Systems 1: 2490-2496 (2022).
ISSN:	2153-0858
Popis:	Trabajo presentado en el International Conference on Intelligent Robots and Systems (IROS), celebrado en Kyoto (Japón), del 23 al 27 de octubre de 2022 Over the last decade, the ability to teach actions to robots in a user-friendly way has gained relevance, and a practical way of teaching robots a new task is to use Inverse Reinforcement Learning (IRL). In IRL, an expert teacher shows the robot a desired behaviour and an agent builds a model of the reward. The agent can also infer a policy that performs in an optimal way within the limitations of the knowledge provided to it. However, most IRL approaches assume an (almost) optimal performance of the teaching agent, which might become unpractical if the teacher is not actually an expert. In addition, most IRL focus on discrete state-action spaces that limit their applicability to certain real-world problems such as within the context of direct Policy Search (PS) reinforcement learning. Therefore, in this paper we introduce Ordinal Inverse Reinforcement Learning (OrdIRL) for continuous state variables, in which the teacher can qualitatively evaluate robot performance by selecting one among the predefined performance levels (e.g. bad, medium, goodu for three tiers of performance). Once the OrdIRL has fit an ordinal distribution to the data, we propose to use Bayesian Optimization (BO) to either gain knowledge on the inferred model (exploration) or find a policy or action that maximizes the expected reward given the prior knowledge on the reward (exploitation). In the case of large-dimensional state-action spaces, we use Dimensionality Reduction (DR) techniques and perform the BO in the latent space. Experimental results on simulation and with a robot arm show how this approach allows for learning the reward function with small data. CLOTHILDE - CLOTH manIpulation Learning from DEmonstrations (EC-H2020-741930)
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::16738701c82b5550fd2db5fd8ff3a06b http://hdl.handle.net/10261/306598 Zobrazit plný text záznamu