Intelligent trainer for Dyna-style model-based deep reinforcement learning

Autor:	Kyle Guan, Yonggang Wen, Xin Zhou, Yuanlong Li, Linsen Dong
Přispěvatelé:	School of Computer Science and Engineering
Jazyk:	angličtina
Rok vydání:	2020
Předmět:	Hyperparameter Computer Networks and Communications Trainer Computer science business.industry Process (engineering) 02 engineering and technology Reinforcement Learning Computer Science Applications Data modeling Artificial Intelligence 0202 electrical engineering electronic engineering information engineering Ensemble Algorithm Process control Reinforcement learning Computer science and engineering [Engineering] 020201 artificial intelligence & image processing Markov decision process Artificial intelligence business Reinforcement Software
Popis:	Model-based reinforcement learning (MBRL) has been proposed as a promising alternative solution to tackle the high sampling cost challenge in the canonical RL, by leveraging a system dynamics model to generate synthetic data for policy training purpose. The MBRL framework, nevertheless, is inherently limited by the convoluted process of jointly optimizing control policy, learning system dynamics, and sampling data from two sources controlled by complicated hyperparameters. As such, the training process involves overwhelmingly manual tuning and is prohibitively costly. In this research, we propose a "reinforcement on reinforcement" (RoR) architecture to decompose the convoluted tasks into two decoupled layers of RL. The inner layer is the canonical MBRL training process which is formulated as a Markov decision process, called training process environment (TPE). The outer layer serves as an RL agent, called intelligent trainer, to learn an optimal hyperparameter configuration for the inner TPE. This decomposition approach provides much-needed flexibility to implement different trainer designs, referred to "train the trainer." In our research, we propose and optimize two alternative trainer designs: 1) an unihead trainer and 2) a multihead trainer. Our proposed RoR framework is evaluated for five tasks in the OpenAI gym. Compared with three other baseline methods, our proposed intelligent trainer methods have a competitive performance in autotuning capability, with up to 56% expected sampling cost saving without knowing the best parameter configurations in advance. The proposed trainer framework can be easily extended to tasks that require costly hyperparameter tuning. Energy Market Authority (EMA) Info-communications Media Development Authority (IMDA) National Research Foundation (NRF) This work was supported in part by the Energy Program, Nation Research Foundation, Prime Minister’s Office, Singapore, administrated by the Energy Market Authority of Singapore, under Award NRF2017EWT-EP003-023, in part by the Green Data Centre Research administrated by the Info-communications Media Development Authority, under Award NRF2015ENC-GDCR01001-003, and in part by the Behavioral Studies in the Energy, Water, Waste and Transportation Sector under Award BSEWWT2017_2_06.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::bac523c966ec919b55449bc33cddb491 https://hdl.handle.net/10356/159633 Zobrazit plný text záznamu