Autor: |
Goenka, Ritesh, Gupta, Eashan, Khyalia, Sushil, Agarwal, Pratyush, Wajid, Mulinti Shaik, Kalyanakrishnan, Shivaram |
Jazyk: |
angličtina |
Rok vydání: |
2022 |
Předmět: |
|
Popis: |
Policy Iteration (PI) is a widely used family of algorithms to compute optimal policies for Markov Decision Problems (MDPs). We derive upper bounds on the running time of PI on Deterministic MDPs (DMDPs): the class of MDPs in which every state-action pair has a unique next state. Our results include a non-trivial upper bound that applies to the entire family of PI algorithms, and affirmation that a conjecture regarding Howard's PI on MDPs is true for DMDPs. Our analysis is based on certain graph-theoretic results, which may be of independent interest. |
Databáze: |
OpenAIRE |
Externí odkaz: |
|