Zobrazeno 1 - 2
of 2
pro vyhledávání: '"Wajid, Mulinti Shaik"'
Autor:
Goenka, Ritesh, Gupta, Eashan, Khyalia, Sushil, Agarwal, Pratyush, Wajid, Mulinti Shaik, Kalyanakrishnan, Shivaram
Policy Iteration (PI) is a widely used family of algorithms to compute optimal policies for Markov Decision Problems (MDPs). We derive upper bounds on the running time of PI on Deterministic MDPs (DMDPs): the class of MDPs in which every state-action
Externí odkaz:
http://arxiv.org/abs/2211.15602
Autor:
Goenka, Ritesh, Gupta, Eashan, Khyalia, Sushil, Agarwal, Pratyush, Wajid, Mulinti Shaik, Kalyanakrishnan, Shivaram
Policy Iteration (PI) is a widely used family of algorithms to compute optimal policies for Markov Decision Problems (MDPs). We derive upper bounds on the running time of PI on Deterministic MDPs (DMDPs): the class of MDPs in which every state-action
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::d08b0b01a6c763079d03469646236e38
http://arxiv.org/abs/2211.15602
http://arxiv.org/abs/2211.15602