Analysis of hyper-parameters for AlphaZero-like deep reinforcement learning

Autor: Hui Wang, Michael Emmerich, Mike Preuss, Aske Plaat
Jazyk: angličtina
Rok vydání: 2022
Předmět:
Zdroj: International Journal of Information Technology & Decision Making. WORLD SCIENTIFIC PUBL CO PTE LTD
International Journal of Information Technology & Decision Making
Popis: The landmark achievements of AlphaGo Zero have created great research interest into self-play in reinforcement learning. In self-play, Monte Carlo Tree Search (MCTS) is used to train a deep neural network, which is then used itself in tree searches. The training is governed by many hyper-parameters. There has been surprisingly little research on design choices for hyper-parameter values and loss functions, presumably because of the prohibitive computational cost to explore the parameter space. In this paper, we investigate 12 hyper-parameters in an AlphaZero-like self-play algorithm and evaluate how these parameters contribute to training. Through multi-objective analysis, we identify four important hyper-parameters to further assess. To start, we find surprising results where too much training can sometimes lead to lower performance. Our main result is that the number of self-play iterations subsumes MCTS-search simulations, game episodes and training epochs. As a consequence of our experiments, we provide recommendations on setting hyper-parameter values in self-play. The outer loop of self-play iterations should be emphasized, in favor of the inner loop. This means hyper-parameters for the inner loop, should be set to lower values. A secondary result of our experiments concerns the choice of optimization goals, for which we also provide recommendations.
Databáze: OpenAIRE