Analysis of hyper-parameters for AlphaZero-like deep reinforcement learning
Autor: | Hui Wang, Michael Emmerich, Mike Preuss, Aske Plaat |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2022 |
Předmět: | |
Zdroj: | International Journal of Information Technology & Decision Making. WORLD SCIENTIFIC PUBL CO PTE LTD International Journal of Information Technology & Decision Making |
Popis: | The landmark achievements of AlphaGo Zero have created great research interest into self-play in reinforcement learning. In self-play, Monte Carlo Tree Search (MCTS) is used to train a deep neural network, which is then used itself in tree searches. The training is governed by many hyper-parameters. There has been surprisingly little research on design choices for hyper-parameter values and loss functions, presumably because of the prohibitive computational cost to explore the parameter space. In this paper, we investigate 12 hyper-parameters in an AlphaZero-like self-play algorithm and evaluate how these parameters contribute to training. Through multi-objective analysis, we identify four important hyper-parameters to further assess. To start, we find surprising results where too much training can sometimes lead to lower performance. Our main result is that the number of self-play iterations subsumes MCTS-search simulations, game episodes and training epochs. As a consequence of our experiments, we provide recommendations on setting hyper-parameter values in self-play. The outer loop of self-play iterations should be emphasized, in favor of the inner loop. This means hyper-parameters for the inner loop, should be set to lower values. A secondary result of our experiments concerns the choice of optimization goals, for which we also provide recommendations. |
Databáze: | OpenAIRE |
Externí odkaz: |