Zobrazeno 1 - 2
of 2
pro vyhledávání: '"Saux, P. (Patrick)"'
In decision-making problems such as the multi-armed bandit, an agent learns sequentially by optimizing a certain feedback. While the mean reward criterion has been extensively studied, other measures that reflect an aversion to adverse outcomes, such
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=od______4198::e7c1bdd2f695b248ccf6b7e237ba0e21
http://hdl.handle.net/20.500.12210/79797
http://hdl.handle.net/20.500.12210/79797
The stochastic multi-arm bandit problem has been extensively studied under standard assumptions on the arm's distribution (e.g bounded with known support, exponential family, etc). These assumptions are suitable for many real-world problems but somet
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=od______4198::b5171ea11ffe4ca0d076a9bff57668c7
http://hdl.handle.net/20.500.12210/57852
http://hdl.handle.net/20.500.12210/57852