Zobrazeno 1 - 6
of 6
pro vyhledávání: '"Gummadi, Ramki"'
A default assumption in the design of reinforcement-learning algorithms is that a decision-making agent always explores to learn optimal behavior. In sufficiently complex environments that approach the vastness and scale of the real world, however, a
Externí odkaz:
http://arxiv.org/abs/2407.12185
Autor:
Che, Fengdi, Xiao, Chenjun, Mei, Jincheng, Dai, Bo, Gummadi, Ramki, Ramirez, Oscar A, Harris, Christopher K, Mahmood, A. Rupam, Schuurmans, Dale
Publikováno v:
Proceedings of the 41 st International Conference on Machine Learning, 2024
We prove that the combination of a target network and over-parameterized linear function approximation establishes a weaker convergence condition for bootstrapped value estimation in certain cases, even with off-policy data. Our condition is naturall
Externí odkaz:
http://arxiv.org/abs/2405.21043
Publikováno v:
ICML 2022
Approaches to policy optimization have been motivated from diverse principles, based on how the parametric model is interpreted (e.g. value versus policy representation) or how the learning objective is formulated, yet they share a common goal of max
Externí odkaz:
http://arxiv.org/abs/2206.08499
Actor-critic (AC) methods are ubiquitous in reinforcement learning. Although it is understood that AC methods are closely related to policy gradient (PG), their precise connection has not been fully characterized previously. In this paper, we explain
Externí odkaz:
http://arxiv.org/abs/2106.06932
Learning latent variable models with stochastic variational inference is challenging when the approximate posterior is far from the true posterior, due to high variance in the gradient estimates. We propose a novel rejection sampling step that discar
Externí odkaz:
http://arxiv.org/abs/1804.01712
Publikováno v:
2012 50th Annual Allerton Conference on Communication, Control & Computing (Allerton); 2012, p1110-1110, 1p