Analysis of hyper-parameters for AlphaZero-like deep reinforcement learning

Autor:	Hui Wang, Michael Emmerich, Mike Preuss, Aske Plaat
Jazyk:	angličtina
Rok vydání:	2022
Předmět:	Computer Science (miscellaneous)
Zdroj:	International Journal of Information Technology & Decision Making. WORLD SCIENTIFIC PUBL CO PTE LTD International Journal of Information Technology & Decision Making
Popis:	The landmark achievements of AlphaGo Zero have created great research interest into self-play in reinforcement learning. In self-play, Monte Carlo Tree Search (MCTS) is used to train a deep neural network, which is then used itself in tree searches. The training is governed by many hyper-parameters. There has been surprisingly little research on design choices for hyper-parameter values and loss functions, presumably because of the prohibitive computational cost to explore the parameter space. In this paper, we investigate 12 hyper-parameters in an AlphaZero-like self-play algorithm and evaluate how these parameters contribute to training. Through multi-objective analysis, we identify four important hyper-parameters to further assess. To start, we find surprising results where too much training can sometimes lead to lower performance. Our main result is that the number of self-play iterations subsumes MCTS-search simulations, game episodes and training epochs. As a consequence of our experiments, we provide recommendations on setting hyper-parameter values in self-play. The outer loop of self-play iterations should be emphasized, in favor of the inner loop. This means hyper-parameters for the inner loop, should be set to lower values. A secondary result of our experiments concerns the choice of optimization goals, for which we also provide recommendations.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::0439a4710c3abcf1b9a66a73c2fb8989 http://hdl.handle.net/1887/3502407 Zobrazit plný text záznamu