Zobrazeno 1 - 3
of 3
pro vyhledávání: '"Julian Ulf Zimmert"'
Autor:
Julian Ulf Zimmert, Yevgeny Seldin
Publikováno v:
University of Copenhagen
Zimmert, J U & Seldin, Y 2020, An optimal algorithm for adversarial bandits with arbitrary delays . in Proceedings of the 23rdInternational Conference on Artificial Intelligence and Statistics (AISTATS) 2020 . PMLR, Proceedings of Machine Learning Research, vol. 108 . < https://proceedings.mlr.press/v108/ >
Zimmert, J U & Seldin, Y 2020, An optimal algorithm for adversarial bandits with arbitrary delays . in Proceedings of the 23rdInternational Conference on Artificial Intelligence and Statistics (AISTATS) 2020 . PMLR, Proceedings of Machine Learning Research, vol. 108 . < https://proceedings.mlr.press/v108/ >
We propose a new algorithm for adversarial multi-armed bandits with unrestricted delays. The algorithm is based on a novel hybrid regularizer applied in the Follow the Regularized Leader (FTRL) framework. It achieves $\mathcal{O}(\sqrt{kn}+\sqrt{D\lo
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::500139715e96edfb2d08daf20e9e8e43
http://arxiv.org/abs/1910.06054
http://arxiv.org/abs/1910.06054
Autor:
Julian Ulf Zimmert, Yevgeny Seldin
Publikováno v:
Zimmert, J U & Seldin, Y 2021, ' Tsallis-INF: An optimal algorithm for stochastic and adversarial bandits ', Journal of Machine Learning Research, vol. 22, 28 .
University of Copenhagen
University of Copenhagen
We derive an algorithm that achieves the optimal (within constants) pseudo-regret in both adversarial and stochastic multi-armed bandits without prior knowledge of the regime and time horizon. The algorithm is based on online mirror descent (OMD) wit
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::82186c275c190d04b0b20e118faedec3
http://arxiv.org/abs/1807.07623
http://arxiv.org/abs/1807.07623
Autor:
Julian Ulf Zimmert, Yevgeny Seldin
Publikováno v:
University of Copenhagen
We introduce the factored bandits model, which is a framework for learning with limited (bandit) feedback, where actions can be decomposed into a Cartesian product of atomic actions. Factored bandits incorporate rank-1 bandits as a special case, but
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::1ea41fdbe8f847ce62dae469c6b01a9e
https://curis.ku.dk/portal/en/publications/factored-bandits(593f14d7-b200-4a89-83b9-4c83b5c50b7d).html
https://curis.ku.dk/portal/en/publications/factored-bandits(593f14d7-b200-4a89-83b9-4c83b5c50b7d).html