Improving reinforcement learning with context detection
Autor: | Filipo Studzinski Perotto, Bruno da Silva, Ana L. C. Bazzan, Paulo Martins Engel, Eduardo W. Basso |
---|---|
Přispěvatelé: | Instituto de Informática da UFRGS (UFRGS), Universidade Federal do Rio Grande do Sul [Porto Alegre] (UFRGS), Institut de recherche en informatique de Toulouse (IRIT), Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université Fédérale Toulouse Midi-Pyrénées, Hideyuki Nakashima and Michael P. Wellman and Gerhard Weiss and Peter Stone |
Rok vydání: | 2006 |
Předmět: |
reinforcement learning
Learning classifier system Error-driven learning Computer science business.industry Mechanism (biology) Context (language use) 02 engineering and technology Machine learning computer.software_genre Robot learning [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] multimodel learning non-stationary environments 020204 information systems 0202 electrical engineering electronic engineering information engineering Unsupervised learning Reinforcement learning 020201 artificial intelligence & image processing Artificial intelligence business Temporal difference learning computer |
Zdroj: | AAMAS 5th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 2006), Hakodate, Japan, May 8-12, 2006 5th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 2006), Hakodate, Japan, May 8-12, 2006, 2006, New York, NY, USA, United States. pp.810--812, ⟨10.1145/1160633.1160779⟩ |
DOI: | 10.1145/1160633.1160779 |
Popis: | International audience; In this paper we propose a method for solving reinforcement learning problems in non-stationary environments. The basic idea is to create and simultaneously update multiple partial models of the environment dynamics. The learning mechanism is based on the detection of context changes, that is, on the detection of significant changes in the dynamics of the environment. Based on this motivation, we propose, formalize and show the efficiency of a method for detecting the current context and the associated model of prediction, as well as a method for updating each of the incrementally built models. |
Databáze: | OpenAIRE |
Externí odkaz: |