Temporal difference learning for the game Tic-Tac-Toe 3D : applying structure to neural networks

Autor:	Michiel Van De Steeg, Marco A. Wiering, Madalina M. Drugan
Přispěvatelé:	Artificial Intelligence, Security
Jazyk:	angličtina
Rok vydání:	2015
Předmět:	Computer Science::Machine Learning Computer Science::Computer Science and Game Theory Learning (artificial intelligence) Computer science Computer Science::Neural and Evolutionary Computation Machine learning Reinforcement learning Training Layer (object-oriented design) integrated pattern detectors deep structured neural network Benchmark testing Artificial neural network business.industry reinforcement learning (RL) Deep learning Function (mathematics) Perceptron neural networks Tic-Tac-Toe 3D game multilayer perceptrons Three-dimensional displays computer games Artificial intelligence temporal difference learning business Temporal difference learning Games Row
Zdroj:	Proceedings-2015 IEEE Symposium Series on Computational Intelligence, SSCI 2015, 564-570 STARTPAGE=564;ENDPAGE=570;TITLE=Proceedings-2015 IEEE Symposium Series on Computational Intelligence, SSCI 2015 2015 IEEE Symposium Series on Computational Intelligence, 7-10 December 2015, Cape Town, South Africa, 564-570 STARTPAGE=564;ENDPAGE=570;TITLE=2015 IEEE Symposium Series on Computational Intelligence, 7-10 December 2015, Cape Town, South Africa SSCI
Popis:	When reinforcement learning is applied to large state spaces, such as those occurring in playing board games, the use of a good function approximator to learn to approximate the value function is very important. In previous research, multi-layer perceptrons have often been quite successfully used as function approximator for learning to play particular games with temporal difference learning. With the recent developments in deep learning, it is important to study if using multiple hidden layers or particular network structures can help to improve learning the value function. In this paper, we compare five different structures of multilayer perceptrons for learning to play the game Tic-Tac-Toe 3D, both when training through self-play and when training against the same fixed opponent they are tested against. We compare three fully connected multilayer perceptrons with a different number of hidden layers and/or hidden units, as well as two structured ones. These structured multilayer perceptrons have a first hidden layer that is only sparsely connected to the input layer, and has units that correspond to the rows in Tic-Tac-Toe 3D. This allows them to more easily learn the contribution of specific patterns on the corresponding rows. One of the two structured multilayer perceptrons has a second hidden layer that is fully connected to the first one, which allows the neural network to learn to non-linearly integrate the information in these detected patterns. The results on Tic-Tac-Toe 3D show that the deep structured neural network with integrated pattern detectors has the strongest performance out of the compared multilayer perceptrons against a fixed opponent, both through self-training and through training against this fixed opponent.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::964d70c6b4e01c70bcd94415c16a62de https://doi.org/10.1109/ssci.2015.89 Zobrazit plný text záznamu