Settling the Horizon-Dependence of Sample Complexity in Reinforcement Learning

Autor:	Yuanzhi Li, Ruosong Wang, Lin F. Yang
Jazyk:	angličtina
Rok vydání:	2021
Předmět:	FOS: Computer and information sciences Computer Science - Machine Learning Artificial Intelligence (cs.AI) Optimization and Control (math.OC) Computer Science - Artificial Intelligence Statistics - Machine Learning Computer Science - Data Structures and Algorithms FOS: Mathematics Data Structures and Algorithms (cs.DS) Machine Learning (stat.ML) Mathematics - Optimization and Control Machine Learning (cs.LG)
Popis:	Recently there is a surge of interest in understanding the horizon-dependence of the sample complexity in reinforcement learning (RL). Notably, for an RL environment with horizon length $H$, previous work have shown that there is a probably approximately correct (PAC) algorithm that learns an $O(1)$-optimal policy using $\mathrm{polylog}(H)$ episodes of environment interactions when the number of states and actions is fixed. It is yet unknown whether the $\mathrm{polylog}(H)$ dependence is necessary or not. In this work, we resolve this question by developing an algorithm that achieves the same PAC guarantee while using only $O(1)$ episodes of environment interactions, completely settling the horizon-dependence of the sample complexity in RL. We achieve this bound by (i) establishing a connection between value functions in discounted and finite-horizon Markov decision processes (MDPs) and (ii) a novel perturbation analysis in MDPs. We believe our new techniques are of independent interest and could be applied in related questions in RL.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::5a77364418d7e3b290e21e44fe6afd82 http://arxiv.org/abs/2111.00633 Zobrazit plný text záznamu