Discrete-to-deep reinforcement learning methods

Autor:	Richard Dazeley, Budi Kurniawan, Michael Papasimeon, Peter Vamplew, Cameron Foale
Rok vydání:	2021
Předmět:	State variable Artificial neural network Computer science business.industry Supervised learning Context (language use) Function (mathematics) Artificial Intelligence Control theory Classifier (linguistics) Reinforcement learning Artificial intelligence business Software
Zdroj:	Neural Computing and Applications. 34:1713-1733
ISSN:	1433-3058 0941-0643
Popis:	Neural networks are effective function approximators, but hard to train in the reinforcement learning (RL) context mainly because samples are correlated. In complex problems, a neural RL approach is often able to learn a better solution than tabular RL, but generally takes longer. This paper proposes two methods, Discrete-to-Deep Supervised Policy Learning (D2D-SPL) and Discrete-to-Deep Supervised Q-value Learning (D2D-SQL), whose objective is to acquire the generalisability of a neural network at a cost nearer to that of a tabular method. Both methods combine RL and supervised learning (SL) and are based on the idea that a fast-learning tabular method can generate off-policy data to accelerate learning in neural RL. D2D-SPL uses the data to train a classifier which is then used as a controller for the RL problem. D2D-SQL uses the data to initialise a neural network which is then allowed to continue learning using another RL method. We demonstrate the viability of our algorithms with Cartpole, Lunar Lander and an aircraft manoeuvring problem, three continuous-space environments with low-dimensional state variables. Both methods learn at least 38% faster than baseline methods and yield policies that outperform them.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::d40302fa0487118da231bfcd77c8d8ce https://doi.org/10.1007/s00521-021-06270-6 Zobrazit plný text záznamu Full text from SpringerLink