AlphaZero with Real-Time Opponent Skill Adaptation

Autor: Peter Tarabek, Marek Balaz
Rok vydání: 2021
Předmět:
Zdroj: IDT
DOI: 10.1109/idt52577.2021.9497522
Popis: Reinforcement learning based methods achieved super-human score in many complex games. Ability to play on super-human level can be impractical when playing against casual players as the skill gap can be too big for the game to be enjoyable and challenging. In this paper, we propose modification of AlphaZero method that allows us to adapt agent to weaker opponent skill level during a single game. We added another output head to the neural network that predicts remaining game length. Based on this prediction, we added new action selection mechanism to Monte Carlo Tree Search. This mechanism allows us to make trade-off between original and new action selection strategy. The results of experiments show that the proposed modifications reduce the gap between strong and weak agents by increasing the number of draws which is our primary measurement of adaptability.
Databáze: OpenAIRE