Zobrazeno 1 - 10
of 2 003
pro vyhledávání: '"SUTTON, RICHARD"'
This paper studies asynchronous stochastic approximation (SA) algorithms and their application to reinforcement learning in semi-Markov decision processes (SMDPs) with an average-reward criterion. We first extend Borkar and Meyn's stability proof met
Externí odkaz:
http://arxiv.org/abs/2409.03915
This paper analyzes reinforcement learning (RL) algorithms for Markov decision processes (MDPs) under the average-reward criterion. We focus on Q-learning algorithms based on relative value iteration (RVI), which are model-free stochastic analogues o
Externí odkaz:
http://arxiv.org/abs/2408.16262
Autor:
De Asis, Kris, Sutton, Richard S.
Many reinforcement learning algorithms are built on an assumption that an agent interacts with an environment over fixed-duration, discrete time steps. However, physical systems are continuous in time, requiring a choice of time-discretization granul
Externí odkaz:
http://arxiv.org/abs/2406.14951
We show that discounted methods for solving continuing reinforcement learning problems can perform significantly better if they center their rewards by subtracting out the rewards' empirical average. The improvement is substantial at commonly used di
Externí odkaz:
http://arxiv.org/abs/2405.09999
This paper addresses the challenge of optimizing meta-parameters (i.e., hyperparameters) in machine learning algorithms, a critical factor influencing training efficiency and model performance. Moving away from the computationally expensive tradition
Externí odkaz:
http://arxiv.org/abs/2402.02342
In continual learning, a learner has to keep learning from the data over its whole life time. A key issue is to decide what knowledge to keep and what knowledge to let go. In a neural network, this can be implemented by using a step-size vector to sc
Externí odkaz:
http://arxiv.org/abs/2401.17401
In this paper, we study asynchronous stochastic approximation algorithms without communication delays. Our main contribution is a stability proof for these algorithms that extends a method of Borkar and Meyn by accommodating more general noise condit
Externí odkaz:
http://arxiv.org/abs/2312.15091
Autor:
Young, Kenny, Sutton, Richard S.
Discovering useful temporal abstractions, in the form of options, is widely thought to be key to applying reinforcement learning and planning to increasingly complex domains. Building on the empirical success of the Expert Iteration approach to polic
Externí odkaz:
http://arxiv.org/abs/2310.01569
Importance sampling is a central idea underlying off-policy prediction in reinforcement learning. It provides a strategy for re-weighting samples from a distribution to obtain unbiased estimates under another distribution. However, importance samplin
Externí odkaz:
http://arxiv.org/abs/2306.15625