AlphaZero with Real-Time Opponent Skill Adaptation

Autor:	Peter Tarabek, Marek Balaz
Rok vydání:	2021
Předmět:	Artificial neural network business.industry Mechanism (biology) Computer science media_common.quotation_subject Monte Carlo tree search ComputingMilieux_PERSONALCOMPUTING Adversary Action selection Adaptability Reinforcement learning Artificial intelligence business Adaptation (computer science) media_common
Zdroj:	IDT
DOI:	10.1109/idt52577.2021.9497522
Popis:	Reinforcement learning based methods achieved super-human score in many complex games. Ability to play on super-human level can be impractical when playing against casual players as the skill gap can be too big for the game to be enjoyable and challenging. In this paper, we propose modification of AlphaZero method that allows us to adapt agent to weaker opponent skill level during a single game. We added another output head to the neural network that predicts remaining game length. Based on this prediction, we added new action selection mechanism to Monte Carlo Tree Search. This mechanism allows us to make trade-off between original and new action selection strategy. The results of experiments show that the proposed modifications reduce the gap between strong and weak agents by increasing the number of draws which is our primary measurement of adaptability.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::ceff603a53a85b3cf222a32ef441b484 https://doi.org/10.1109/idt52577.2021.9497522 Zobrazit plný text záznamu