Zobrazeno 1 - 10
of 67
pro vyhledávání: '"Gottesman, Omer"'
Autor:
Allen, Cameron, Kirtland, Aaron, Tao, Ruo Yu, Lobel, Sam, Scott, Daniel, Petrocelli, Nicholas, Gottesman, Omer, Parr, Ronald, Littman, Michael L., Konidaris, George
Reinforcement learning algorithms typically rely on the assumption that the environment dynamics and value function can be expressed in terms of a Markovian state representation. However, when state information is only partially observable, how can a
Externí odkaz:
http://arxiv.org/abs/2407.07333
We study the convergence behavior of the celebrated temporal-difference (TD) learning algorithm. By looking at the algorithm through the lens of optimization, we first argue that TD can be viewed as an iterative optimization algorithm where the funct
Externí odkaz:
http://arxiv.org/abs/2306.17750
Decision-focused (DF) model-based reinforcement learning has recently been introduced as a powerful algorithm that can focus on learning the MDP dynamics that are most relevant for obtaining high returns. While this approach increases the agent's per
Externí odkaz:
http://arxiv.org/abs/2304.03365
Advances in reinforcement learning have led to its successful application in complex tasks with continuous state and action spaces. Despite these advances in practice, most theoretical work pertains to finite state and action spaces. We propose build
Externí odkaz:
http://arxiv.org/abs/2301.00009
In the reinforcement learning literature, there are many algorithms developed for either Contextual Bandit (CB) or Markov Decision Processes (MDP) environments. However, when deploying reinforcement learning algorithms in the real world, even with do
Externí odkaz:
http://arxiv.org/abs/2208.00250
Autor:
Asadi, Kavosh, Fakoor, Rasool, Gottesman, Omer, Kim, Taesup, Littman, Michael L., Smola, Alexander J.
Deep reinforcement learning algorithms often use two networks for value function optimization: an online network, and a target network that tracks the online network with some delay. Using two separate networks enables the agent to hedge against issu
Externí odkaz:
http://arxiv.org/abs/2112.05848
Off-policy policy evaluation methods for sequential decision making can be used to help identify if a proposed decision policy is better than a current baseline policy. However, a new decision policy may be better than a baseline policy for some indi
Externí odkaz:
http://arxiv.org/abs/2111.14272
Autor:
Gottesman, Omer, Asadi, Kavosh, Allen, Cameron, Lobel, Sam, Konidaris, George, Littman, Michael
Principled decision-making in continuous state--action spaces is impossible without some assumptions. A common approach is to assume Lipschitz continuity of the Q-function. We show that, unfortunately, this property fails to hold in many typical doma
Externí odkaz:
http://arxiv.org/abs/2110.12276
Publikováno v:
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:9537-9546, 2021
Importance sampling-based estimators for off-policy evaluation (OPE) are valued for their simplicity, unbiasedness, and reliance on relatively few assumptions. However, the variance of these estimators is often high, especially when trajectories are
Externí odkaz:
http://arxiv.org/abs/2109.06310
A fundamental assumption of reinforcement learning in Markov decision processes (MDPs) is that the relevant decision process is, in fact, Markov. However, when MDPs have rich observations, agents typically learn by way of an abstract state representa
Externí odkaz:
http://arxiv.org/abs/2106.04379