Sur quelques questions d'adaptation dans des problèmes de bandits stochastiques

Autor:	Hadiji, Hédi
Přispěvatelé:	Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS), Université Paris-Saclay, Gilles Stoltz, Pascal Massart
Jazyk:	angličtina
Rok vydání:	2020
Předmět:	Upper confidence bound (UCB) Statistiques adaptatives Minimax optimality Asymptotic optimality Algorithme Upper Confidence Bound (UCB) Stochastic multi-armed bandits Bandits stochastiques à plusieurs bras Bandits à continuum de bras [STAT.ML]Statistics [stat]/Machine Learning [stat.ML] [MATH.MATH-ST]Mathematics [math]/Statistics [math.ST] Optimalité minimax Adaptive statistics Optimalité asymptotique Continuum-armed bandits
Zdroj:	Statistics [math.ST]. Université Paris-Saclay, 2020. English. ⟨NNT : 2020UPASM021⟩
Popis:	The main topics adressed in this thesis lie in the general domain of sequential learning, and in particular stochastic multi-armed bandits. The thesis is divided into four chapters and an introduction. In the first part of the main body of the thesis, we design a new algorithm achieving, simultaneously, distribution-dependent and distribution-free optimal guarantees. The next two chapters are devoted to adaptivity questions. First, in the context of continuum-armed bandits, we present a new algorithm which, for the first time, does not require the knowledge of the regularity of the bandit problem it is facing. Then, we study the issue of adapting to the unknown support of the payoffs in bounded K-armed bandits. We provide a procedure that (almost) obtains the same guarantees as if it was given the support in advance. In the final chapter, we study a slightly different bandit setting, designed to enforce diversity-preserving conditions on the strategies. We show that the optimal regert in this setting at a speed that is quite different from the traditional bandit setting. In particular, we observe that bounded regret is possible under some specific hypotheses.; Cette thèse s'inscrit dans le domaine des statistiques séquentielles. Le cadre principal étudié est celui des bandits stochastiques à plusieurs bras, cadre idéal qui modélise le dilemme exploration-exploitation face à des choix répétés. La thèse est composée de quatre chapitres, précédés d'une introduction. Dans la première partie du corps de la thèse, on présente un nouvel algorithme capable d'atteindre des garanties optimales à la fois d'un point de vue distribution-dépendent et distribution-free. Les deux chapitres suivants sont consacrés à des questions dites d'adaptation. D'abord, on propose un algorithme capable de s'adapter à la régularité inconnue dans des problèmes de bandits continus, mettant en évidence le coût polynomial de l'adaptation en bandits continus. Ensuite, on considère un problème d'adaptation au supports pour des problèmes de bandits à K bras, à distributions de paiements bornés dans des intervalles inconnus. Enfin, dans un dernier chapitre un peu à part, on étudie un cadre légèrement différent de bandits préservant la diversité. On montre que le regret optimal dans ce cadre croît à des vitesses différentes des vitesses classiques, avec notamment la possibilité d'atteindre un regret constant sous certaines hypothèses.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=od_______212::5fd4861be3e79005608d891c5644a38b https://tel.archives-ouvertes.fr/tel-03117852/document Zobrazit plný text záznamu