Meta-inverse Reinforcement Learning Method Based on Relative Entropy

Autor:	WU Shao-bo, FU Qi-ming, CHEN Jian-ping, WU Hong-jie, LU You
Jazyk:	čínština
Rok vydání:	2021
Předmět:	inverse reinforcement learning meta-learning reward function relative entropy gradient decent Computer software QA76.75-76.765 Technology (General) T1-995
Zdroj:	Jisuanji kexue, Vol 48, Iss 9, Pp 257-263 (2021)
Druh dokumentu:	article
ISSN:	1002-137X 20070004
DOI:	10.11896/jsjkx.200700044
Popis:	Aiming at the problem that traditional inverse reinforcement learning algorithms are slow,imprecise,or even unsolvable when solving the reward function owing to insufficient expert demonstration samples and unknown state transition probabilitie,a meta-reinforcement learning method based on relative entropy is proposed.Using meta-learning methods,the target task learning prior is constructed by integrating a set of meta-training sets that meet the same distribution as the target task.In the model-free reinforcement learning problem,the relative entropy probability model is used to model the reward function and combined with the prior to achieve the goal of quickly solving the reward function of the target task using a small number of samples of the target task.The proposed algorithm and the RE IRL algorithm are applied to the classic Gridworld and Object World pro-blems.Experiments show that the proposed algorithm can still solve the reward function better when the target task lacks a sufficient number of expert demonstration samples and state transition probabilities information
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/aad678e5c42e4a1f89ce8a94a9fa9a2a Zobrazit plný text záznamu View record in DOAJ