AlphaZero with Real-Time Opponent Skill Adaptation
Autor: | Peter Tarabek, Marek Balaz |
---|---|
Rok vydání: | 2021 |
Předmět: |
Artificial neural network
business.industry Mechanism (biology) Computer science media_common.quotation_subject Monte Carlo tree search ComputingMilieux_PERSONALCOMPUTING Adversary Action selection Adaptability Reinforcement learning Artificial intelligence business Adaptation (computer science) media_common |
Zdroj: | IDT |
DOI: | 10.1109/idt52577.2021.9497522 |
Popis: | Reinforcement learning based methods achieved super-human score in many complex games. Ability to play on super-human level can be impractical when playing against casual players as the skill gap can be too big for the game to be enjoyable and challenging. In this paper, we propose modification of AlphaZero method that allows us to adapt agent to weaker opponent skill level during a single game. We added another output head to the neural network that predicts remaining game length. Based on this prediction, we added new action selection mechanism to Monte Carlo Tree Search. This mechanism allows us to make trade-off between original and new action selection strategy. The results of experiments show that the proposed modifications reduce the gap between strong and weak agents by increasing the number of draws which is our primary measurement of adaptability. |
Databáze: | OpenAIRE |
Externí odkaz: |