Kernelized Reinforcement Learning with Order Optimal Regret Bounds
Autor: | Vakili, Sattar, Olkhovskaya, Julia |
---|---|
Rok vydání: | 2023 |
Předmět: | |
Druh dokumentu: | Working Paper |
Popis: | Reinforcement learning (RL) has shown empirical success in various real world settings with complex models and large state-action spaces. The existing analytical results, however, typically focus on settings with a small number of state-actions or simple models such as linearly modeled state-action value functions. To derive RL policies that efficiently handle large state-action spaces with more general value functions, some recent works have considered nonlinear function approximation using kernel ridge regression. We propose $\pi$-KRVI, an optimistic modification of least-squares value iteration, when the state-action value function is represented by a reproducing kernel Hilbert space (RKHS). We prove the first order-optimal regret guarantees under a general setting. Our results show a significant polynomial in the number of episodes improvement over the state of the art. In particular, with highly non-smooth kernels (such as Neural Tangent kernel or some Mat\'ern kernels) the existing results lead to trivial (superlinear in the number of episodes) regret bounds. We show a sublinear regret bound that is order optimal in the case of Mat\'ern kernels where a lower bound on regret is known. Comment: Advances in Neural Information Processing Systems (NeurIPS), 2023. In the previous version, we utilized Lemma C.1 from Yang et al., 2020a to bound the RKHS norm of the kernel ridge predictor. In the current version, this is proven in Lemma 5 |
Databáze: | arXiv |
Externí odkaz: |