Asynchronous framework with Reptile+ algorithm to meta learn partially observable Markov decision process

Autor:	Viet-Hung Dang, Dang Quang Nguyen, Ngo Anh Vien, TaeChoong Chung
Rok vydání:	2020
Předmět:	Artificial neural network Computer science business.industry Computation Partially observable Markov decision process 02 engineering and technology Machine learning computer.software_genre Task (computing) Artificial Intelligence Control theory Asynchronous communication 0202 electrical engineering electronic engineering information engineering Reinforcement learning 020201 artificial intelligence & image processing Artificial intelligence Transfer of learning business computer
Zdroj:	Applied Intelligence. 50:4050-4062
ISSN:	1573-7497 0924-669X
DOI:	10.1007/s10489-020-01748-7
Popis:	Meta-learning has recently received much attention in a wide variety of deep reinforcement learning (DRL). In non-meta-learning, we have to train a deep neural network as a controller to learn a specific control task from scratch using a large amount of data. This way of training has shown many limitations in handling different related tasks. Therefore, meta-learning on control domains becomes a powerful tool for transfer learning on related tasks. However, it is widely known that meta-learning requires massive computation and training time. This paper will propose a novel DRL framework, which is called HCGF-R2-DDPG (Hybrid CPU/GPU Framework for Reptile+ and Recurrent Deep Deterministic Policy Gradient). HCGF-R2-DDPG will integrate meta-learning into a general asynchronous training architecture. The proposed framework will allow utilising both CPU and GPU to boost the training speed for the meta network initialisation. We will evaluate HCGF-R2-DDPG on various Partially Observable Markov Decision Process (POMDP) domains.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::15135302af7265d98107e38187b0c79a https://doi.org/10.1007/s10489-020-01748-7 Zobrazit plný text záznamu Full text from SpringerLink