A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits

Autor:	Gajane, Pratik, Urvoy, Tanguy, Clérot, Fabrice
Rok vydání:	2016
Předmět:	Computer Science - Learning
Zdroj:	The 32nd International Conference on Machine Learning, Jul 2015, Lille, France. 37, pp.218-227, Proceedings of The 32nd International Conference on Machine Learning
Druh dokumentu:	Working Paper
Popis:	We study the K-armed dueling bandit problem which is a variation of the classical Multi-Armed Bandit (MAB) problem in which the learner receives only relative feedback about the selected pairs of arms. We propose a new algorithm called Relative Exponential-weight algorithm for Exploration and Exploitation (REX3) to handle the adversarial utility-based formulation of this problem. This algorithm is a non-trivial extension of the Exponential-weight algorithm for Exploration and Exploitation (EXP3) algorithm. We prove a finite time expected regret upper bound of order O(sqrt(K ln(K)T)) for this algorithm and a general lower bound of order omega(sqrt(KT)). At the end, we provide experimental results using real data from information retrieval applications.
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/1601.03855 Zobrazit plný text záznamu View this record from Arxiv