A competitive strategy for function approximation in Q-learning

Autor:	Agostini, Alejandro Gabriel, Celaya Llover, Enric\|\|\|0000-0001-8480-7706
Přispěvatelé:	Institut de Robòtica i Informàtica Industrial, Universitat Politècnica de Catalunya. ROBiri - Grup de Robòtica de l'IRI, Universitat Politècnica de Catalunya. VIS - Visió Artificial i Sistemes Intel.ligents, Universitat Politècnica de Catalunya. VIS - Visió Artificial i Sistemes Intel·ligents
Předmět:	Reinforcement learning reinforcement learning [generalisation (artificial intelligence) learning (artificial intelligence) AUTOR] Informàtica::Intel·ligència artificial [Àrees temàtiques de la UPC] Aprenentatge -- Tècniques Q-Learning
Zdroj:	Recercat. Dipósit de la Recerca de Catalunya instname UPCommons. Portal del coneixement obert de la UPC Universitat Politècnica de Catalunya (UPC)
Popis:	In this work we propose an approach for generalization in continuous domain Reinforcement Learning that, instead of using a single function approximator, tries many different function approximators in parallel, each one defined in a different region of the domain. Associated with each approximator is a relevance function that locally quantifies the quality of its approximation, so that, at each input point, the approximator with highest relevance can be selected. The relevance function is defined using parametric estimations of the variance of the q-values and the density of samples in the input space, which are used to quantify the accuracy and the confidence in the approximation, respectively. These parametric estimations are obtained from a probability density distribution represented as a Gaussian Mixture Model embedded in the input-output space of each approximator. In our experiments, the proposed approach required a lesser number of experiences for learning and produced more stable convergence profiles than when using a single function approximator.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=dedup_wf_001::1d64da0948781b37bb134d1c38cf7b26 http://hdl.handle.net/2117/14123 Zobrazit plný text záznamu