Zobrazeno 1 - 2
of 2
pro vyhledávání: '"Bourel, Hippolyte"'
The upper confidence reinforcement learning (UCRL2) algorithm introduced in (Jaksch et al., 2010) is a popular method to perform regret minimization in unknown discrete Markov Decision Processes under the average-reward criterion. Despite its nice an
Externí odkaz:
http://arxiv.org/abs/2004.09656
Leveraging an equivalence property in the state-space of a Markov Decision Process (MDP) has been investigated in several studies. This paper studies equivalence structure in the reinforcement learning (RL) setup, where transition distributions are n
Externí odkaz:
http://arxiv.org/abs/1910.04077