Instance-Dependent ℓ∞-Bounds for Policy Evaluation in Tabular Reinforcement Learning
Autor: | Martin J. Wainwright, Ashwin Pananjady |
---|---|
Rok vydání: | 2021 |
Předmět: |
Mathematical optimization
education.field_of_study Material requirements planning Markov chain Computer science Population Markov process Approximation algorithm 020206 networking & telecommunications 02 engineering and technology Library and Information Sciences Computer Science Applications symbols.namesake Bellman equation 0202 electrical engineering electronic engineering information engineering symbols Reinforcement learning Constant (mathematics) education Information Systems |
Zdroj: | IEEE Transactions on Information Theory. 67:566-585 |
ISSN: | 1557-9654 0018-9448 |
DOI: | 10.1109/tit.2020.3027316 |
Popis: | Markov reward processes (MRPs) are used to model stochastic phenomena arising in operations research, control engineering, robotics, and artificial intelligence, as well as communication and transportation networks. In many of these cases, such as in the policy evaluation problem encountered in reinforcement learning, the goal is to estimate the long-term value function of such a process without access to the underlying population transition and reward functions. Working with samples generated under the synchronous model, we study the problem of estimating the value function of an infinite-horizon discounted MRP with finite state space in the $\ell _{\infty }$ -norm. We analyze both the standard plug-in approach to this problem and a more robust variant, and establish non-asymptotic bounds that depend on the (unknown) problem instance, as well as data-dependent bounds that can be evaluated based on the observations of state-transitions and rewards. We show that these approaches are minimax-optimal up to constant factors over natural sub-classes of MRPs. Our analysis makes use of a leave-one-out decoupling argument tailored to the policy evaluation problem, one which may be of independent interest. |
Databáze: | OpenAIRE |
Externí odkaz: |