Bandits with many optimal arms

Autor:	de Heide, Rianne, Cheshire, James, Ménard, Pierre, Carpentier, Alexandra
Rok vydání:	2021
Předmět:	Computer Science - Machine Learning Statistics - Machine Learning
Druh dokumentu:	Working Paper
Popis:	We consider a stochastic bandit problem with a possibly infinite number of arms. We write $p^$ for the proportion of optimal arms and $\Delta$ for the minimal mean-gap between optimal and sub-optimal arms. We characterize the optimal learning rates both in the cumulative regret setting, and in the best-arm identification setting in terms of the problem parameters $T$ (the budget), $p^$ and $\Delta$. For the objective of minimizing the cumulative regret, we provide a lower bound of order $\Omega(\log(T)/(p^\Delta))$ and a UCB-style algorithm with matching upper bound up to a factor of $\log(1/\Delta)$. Our algorithm needs $p^$ to calibrate its parameters, and we prove that this knowledge is necessary, since adapting to $p^$ in this setting is impossible. For best-arm identification we also provide a lower bound of order $\Omega(\exp(-cT\Delta^2 p^))$ on the probability of outputting a sub-optimal arm where $c>0$ is an absolute constant. We also provide an elimination algorithm with an upper bound matching the lower bound up to a factor of order $\log(T)$ in the exponential, and that does not need $p^*$ or $\Delta$ as parameter. Our results apply directly to the three related problems of competing against the $j$-th best arm, identifying an $\epsilon$ good arm, and finding an arm with mean larger than a quantile of a known order. Comment: Substantial rewrite and added experiments. Accepted for NeurIPS 2021
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2103.12452 Zobrazit plný text záznamu View this record from Arxiv