Zobrazeno 1 - 5
of 5
pro vyhledávání: '"Hong, Kihyuk"'
This paper proposes a computationally tractable algorithm for learning infinite-horizon average-reward linear mixture Markov decision processes (MDPs) under the Bellman optimality condition. Our algorithm for linear mixture MDPs achieves a nearly min
Externí odkaz:
http://arxiv.org/abs/2410.14992
We study the infinite-horizon average-reward reinforcement learning with linear MDPs. Previous approaches either suffer from computational inefficiency or require strong assumptions on dynamics, such as ergodicity, for achieving a regret bound of $\w
Externí odkaz:
http://arxiv.org/abs/2405.15050
Autor:
Hong, Kihyuk, Tewari, Ambuj
We study offline reinforcement learning (RL) with linear MDPs under the infinite-horizon discounted setting which aims to learn a policy that maximizes the expected discounted cumulative reward using a pre-collected dataset. Existing algorithms for t
Externí odkaz:
http://arxiv.org/abs/2402.04493
Offline constrained reinforcement learning (RL) aims to learn a policy that maximizes the expected cumulative reward subject to constraints on expected cumulative cost using an existing dataset. In this paper, we propose Primal-Dual-Critic Algorithm
Externí odkaz:
http://arxiv.org/abs/2306.07818
We propose an algorithm for non-stationary kernel bandits that does not require prior knowledge of the degree of non-stationarity. The algorithm follows randomized strategies obtained by solving optimization problems that balance exploration and expl
Externí odkaz:
http://arxiv.org/abs/2205.14775