Improved Learning Complexity in Combinatorial Pure Exploration Bandits

Autor: Gabillon, V., Lazaric, A., Ghavamzadeh, M., Ronald Ortner, Bartlett, P.
Přispěvatelé: Queensland University of Technology [Brisbane] (QUT), Sequential Learning (SEQUEL), Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS), Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS), Adobe Systems Inc. (Adobe Advanced Technology Labs), Adobe Systems Inc., Montanuniversität Leoben (MUL), ANR-14-CE24-0010,ExTra-Learn,Extraction et transfert de connaissances dans l'apprentissage par renforcement(2014)
Jazyk: angličtina
Rok vydání: 2016
Předmět:
Zdroj: Proceedings of the 19th International Conference on Artificial Intelligence (AISTATS)
Proceedings of the 19th International Conference on Artificial Intelligence (AISTATS), May 2016, Cadiz, Spain
Scopus-Elsevier
Popis: International audience; We study the problem of combinatorial pure exploration in the stochastic multi-armed bandit problem. We first construct a new measure of complexity that provably characterizes the learning performance of the algorithms we propose for the fixed confidence and the fixed budget setting. We show that this complexity is never higher than the one in existing work and illustrate a number of configurations in which it can be significantly smaller. While in general this improvement comes at the cost of increased computational complexity, we provide a series of examples , including a planning problem, where this extra cost is not significant.
Databáze: OpenAIRE