Zobrazeno 1 - 1
of 1
pro vyhledávání: '"Ozdemir, Mehmet Ufuk"'
Multi-armed bandits (MAB) are extensively studied in various settings where the objective is to \textit{maximize} the actions' outcomes (i.e., rewards) over time. Since safety is crucial in many real-world problems, safe versions of MAB algorithms ha
Externí odkaz:
http://arxiv.org/abs/2112.06728