SADRL: Merging human experience with machine intelligence via supervised assisted deep reinforcement learning
Autor: | Junchen Jin, Yanhao Huang, Fei-Yue Wang, Xiaoshuang Li, Xiao Wang, Xinhu Zheng, Jun Jason Zhang |
---|---|
Rok vydání: | 2022 |
Předmět: |
Cloning (programming)
business.industry Computer science Cognitive Neuroscience media_common.quotation_subject Supervised learning Computer Science Applications Artificial Intelligence Convergence (routing) Reinforcement learning Leverage (statistics) Artificial intelligence Baseline (configuration management) business Function (engineering) media_common |
Zdroj: | Neurocomputing. 467:300-309 |
ISSN: | 0925-2312 |
DOI: | 10.1016/j.neucom.2021.09.064 |
Popis: | Deep Reinforcement Learning (DRL) has proven its capability to learn optimal policies in decision-making problems by directly interacting with environments. Meanwhile, supervised learning methods also show great capability of learning from data. However, how to combine DRL with supervised learning and leverage additional knowledge and data to assist the DRL agent remains difficult. This study proposes a novel Supervised Assisted Deep Reinforcement Learning (SADRL) framework integrating deep Q-learning from dynamic demonstrations with a behavioral cloning model (DQfDD-BC). Specifically, the proposed DQfDD-BC method leverages historical demonstrations to pre-train a behavioral cloning model and consistently update it by learning the dynamically updated demonstrations. A supervised expert loss function is designed to compare actions generated by the DRL model with those obtained from the BC model to provide advantageous guidance for policy improvements. Experimental results in several OpenAI Gym environments show that the proposed approach accelerates the learning processes, and meanwhile, adapts to different performance levels of demonstrations. As illustrated in an ablation study, the dynamic demonstration and expert loss mechanisms using a BC model contribute to improving the learning convergence performance compared with the baseline models. We believe that SADRL provides an elegant framework and the proposed method can promote the integration of human experience and machine intelligence. |
Databáze: | OpenAIRE |
Externí odkaz: |