Autor:	Gimelfarb, Michael, Kim, Michael Jong
Rok vydání:	2023
Předmět:	Electrical Engineering and Systems Science - Systems and Control Computer Science - Machine Learning
Druh dokumentu:	Working Paper
Popis:	We study parameterized MDPs (PMDPs) in which the key parameters of interest are unknown and must be learned using Bayesian inference. One key defining feature of such models is the presence of "uninformative" actions that provide no information about the unknown parameters. We contribute a set of assumptions for PMDPs under which Thompson sampling guarantees an asymptotically optimal expected regret bound of $O(T^{-1})$, which are easily verified for many classes of problems such as queuing, inventory control, and dynamic pricing.
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2305.07844 Zobrazit plný text záznamu View this record from Arxiv