A Variant of the Wang-Foster-Kakade Lower Bound for the Discounted Setting
Autor: | Amortila, Philip, Jiang, Nan, Xie, Tengyang |
---|---|
Rok vydání: | 2020 |
Předmět: | |
Druh dokumentu: | Working Paper |
Popis: | Recently, Wang et al. (2020) showed a highly intriguing hardness result for batch reinforcement learning (RL) with linearly realizable value function and good feature coverage in the finite-horizon case. In this note we show that once adapted to the discounted setting, the construction can be simplified to a 2-state MDP with 1-dimensional features, such that learning is impossible even with an infinite amount of data. |
Databáze: | arXiv |
Externí odkaz: |