Order-Optimal Global Convergence for Average Reward Reinforcement Learning via Actor-Critic Approach

Autor:	Ganesh, Swetha, Mondal, Washim Uddin, Aggarwal, Vaneet
Rok vydání:	2024
Předmět:	Computer Science - Machine Learning
Druh dokumentu:	Working Paper
Popis:	This work analyzes average-reward reinforcement learning with general parametrization. Current state-of-the-art (SOTA) guarantees for this problem are either suboptimal or demand prior knowledge of the mixing time of the underlying Markov process, which is unavailable in most practical scenarios. We introduce a Multi-level Monte Carlo-based Natural Actor-Critic (MLMC-NAC) algorithm to address these issues. Our approach is the first to achieve a global convergence rate of $\tilde{\mathcal{O}}(1/\sqrt{T})$ without needing the knowledge of mixing time. It significantly surpasses the SOTA bound of $\tilde{\mathcal{O}}(T^{-1/4})$ where $T$ is the horizon length. Comment: 23 pages, 1 table
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2407.18878 Zobrazit plný text záznamu View this record from Arxiv