Some Upper Bounds on the Running Time of Policy Iteration on Deterministic MDPs

Autor:	Goenka, Ritesh, Gupta, Eashan, Khyalia, Sushil, Agarwal, Pratyush, Wajid, Mulinti Shaik, Kalyanakrishnan, Shivaram
Jazyk:	angličtina
Rok vydání:	2022
Předmět:	FOS: Computer and information sciences Computer Science - Computational Complexity Discrete Mathematics (cs.DM) FOS: Mathematics Mathematics - Combinatorics Combinatorics (math.CO) Computational Complexity (cs.CC) 90C40 (Primary) 68Q25 05C35 05C38 (Secondary) Computer Science - Discrete Mathematics
Popis:	Policy Iteration (PI) is a widely used family of algorithms to compute optimal policies for Markov Decision Problems (MDPs). We derive upper bounds on the running time of PI on Deterministic MDPs (DMDPs): the class of MDPs in which every state-action pair has a unique next state. Our results include a non-trivial upper bound that applies to the entire family of PI algorithms, and affirmation that a conjecture regarding Howard's PI on MDPs is true for DMDPs. Our analysis is based on certain graph-theoretic results, which may be of independent interest.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::d08b0b01a6c763079d03469646236e38 http://arxiv.org/abs/2211.15602 Zobrazit plný text záznamu