Autor:	Amortila, Philip, Jiang, Nan, Xie, Tengyang
Rok vydání:	2020
Předmět:	Computer Science - Machine Learning Computer Science - Artificial Intelligence Statistics - Machine Learning
Druh dokumentu:	Working Paper
Popis:	Recently, Wang et al. (2020) showed a highly intriguing hardness result for batch reinforcement learning (RL) with linearly realizable value function and good feature coverage in the finite-horizon case. In this note we show that once adapted to the discounted setting, the construction can be simplified to a 2-state MDP with 1-dimensional features, such that learning is impossible even with an infinite amount of data.
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2011.01075 Zobrazit plný text záznamu View this record from Arxiv