Výsledky vyhledávání - "Wajid, Mulinti Shaik"

Report

Upper Bounds for All and Max-gain Policy Iteration Algorithms on Deterministic MDPs

Autor: Goenka, Ritesh, Gupta, Eashan, Khyalia, Sushil, Agarwal, Pratyush, Wajid, Mulinti Shaik, Kalyanakrishnan, Shivaram

Policy Iteration (PI) is a widely used family of algorithms to compute optimal policies for Markov Decision Problems (MDPs). We derive upper bounds on the running time of PI on Deterministic MDPs (DMDPs): the class of MDPs in which every state-action

Externí odkaz: http://arxiv.org/abs/2211.15602

Zobrazit plný text záznamu

Some Upper Bounds on the Running Time of Policy Iteration on Deterministic MDPs

Autor: Goenka, Ritesh, Gupta, Eashan, Khyalia, Sushil, Agarwal, Pratyush, Wajid, Mulinti Shaik, Kalyanakrishnan, Shivaram

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::d08b0b01a6c763079d03469646236e38
http://arxiv.org/abs/2211.15602

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání