A PAC algorithm in relative precision for bandit problem with costly sampling
Autor: | Marie Billaud Friess, Arthur Macherey, Anthony Nouy, Clémentine Prieur |
---|---|
Přispěvatelé: | Laboratoire de Mathématiques Jean Leray (LMJL), Centre National de la Recherche Scientifique (CNRS)-Université de Nantes - UFR des Sciences et des Techniques (UN UFR ST), Université de Nantes (UN)-Université de Nantes (UN), École Centrale de Nantes (ECN), Méthodes d'Analyse Stochastique des Codes et Traitements Numériques (GdR MASCOT-NUM), Centre National de la Recherche Scientifique (CNRS), Mathematics and computing applied to oceanic and atmospheric flows (AIRSEA), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université Grenoble Alpes (UGA)-Laboratoire Jean Kuntzmann (LJK), Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA)-Centre National de la Recherche Scientifique (CNRS)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA), Centre National de la Recherche Scientifique (CNRS)-Nantes université - UFR des Sciences et des Techniques (Nantes univ - UFR ST), Nantes Université - pôle Sciences et technologie, Nantes Université (Nantes Univ)-Nantes Université (Nantes Univ)-Nantes Université - pôle Sciences et technologie, Nantes Université (Nantes Univ)-Nantes Université (Nantes Univ), École Centrale de Nantes (Nantes Univ - ECN), Nantes Université (Nantes Univ), ANR-11-LABX-0020,LEBESGUE,Centre de Mathématiques Henri Lebesgue : fondements, interactions, applications et Formation(2011) |
Rok vydání: | 2022 |
Předmět: |
FOS: Computer and information sciences
Computer Science - Machine Learning probably approximately correct algorithm General Mathematics Machine Learning (stat.ML) Management Science and Operations Research Machine Learning (cs.LG) bandit algorithm Monte-Carlo estimates Statistics - Machine Learning Optimization and Control (math.OC) relative precision FOS: Mathematics bounded random variables [MATH]Mathematics [math] Mathematics - Optimization and Control Software |
Zdroj: | Mathematical Methods of Operations Research Mathematical Methods of Operations Research, 2022, 96, pp.161-185. ⟨10.1007/s00186-022-00769-x⟩ |
ISSN: | 1432-5217 1432-2994 |
DOI: | 10.1007/s00186-022-00769-x |
Popis: | International audience; This paper considers the problem of maximizing an expectation function over a finite set, or finite-arm bandit problem. We first propose a naive stochastic bandit algorithm for obtaining a probably approximately correct (PAC) solution to this discrete optimization problem in relative precision, that is a solution which solves the optimization problem up to a relative error smaller than a prescribed tolerance, with high probability. We also propose an adaptive stochastic bandit algorithm which provides a PAC-solution with the same guarantees. The adaptive algorithm outperforms the mean complexity of the naive algorithm in terms of number of generated samples and is particularly well suited for applications with high sampling cost. |
Databáze: | OpenAIRE |
Externí odkaz: | |
Nepřihlášeným uživatelům se plný text nezobrazuje | K zobrazení výsledku je třeba se přihlásit. |