SADRL: Merging human experience with machine intelligence via supervised assisted deep reinforcement learning

Autor:	Junchen Jin, Yanhao Huang, Fei-Yue Wang, Xiaoshuang Li, Xiao Wang, Xinhu Zheng, Jun Jason Zhang
Rok vydání:	2022
Předmět:	Cloning (programming) business.industry Computer science Cognitive Neuroscience media_common.quotation_subject Supervised learning Computer Science Applications Artificial Intelligence Convergence (routing) Reinforcement learning Leverage (statistics) Artificial intelligence Baseline (configuration management) business Function (engineering) media_common
Zdroj:	Neurocomputing. 467:300-309
ISSN:	0925-2312
DOI:	10.1016/j.neucom.2021.09.064
Popis:	Deep Reinforcement Learning (DRL) has proven its capability to learn optimal policies in decision-making problems by directly interacting with environments. Meanwhile, supervised learning methods also show great capability of learning from data. However, how to combine DRL with supervised learning and leverage additional knowledge and data to assist the DRL agent remains difficult. This study proposes a novel Supervised Assisted Deep Reinforcement Learning (SADRL) framework integrating deep Q-learning from dynamic demonstrations with a behavioral cloning model (DQfDD-BC). Specifically, the proposed DQfDD-BC method leverages historical demonstrations to pre-train a behavioral cloning model and consistently update it by learning the dynamically updated demonstrations. A supervised expert loss function is designed to compare actions generated by the DRL model with those obtained from the BC model to provide advantageous guidance for policy improvements. Experimental results in several OpenAI Gym environments show that the proposed approach accelerates the learning processes, and meanwhile, adapts to different performance levels of demonstrations. As illustrated in an ablation study, the dynamic demonstration and expert loss mechanisms using a BC model contribute to improving the learning convergence performance compared with the baseline models. We believe that SADRL provides an elegant framework and the proposed method can promote the integration of human experience and machine intelligence.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::255cad3f1b884016a8b38bd1a0ddcf40 https://doi.org/10.1016/j.neucom.2021.09.064 Zobrazit plný text záznamu Full Text from ScienceDirect