Zobrazeno 1 - 1
of 1
pro vyhledávání: '"Ma, Jason Yecheng"'
A key challenge to deploying reinforcement learning in practice is avoiding excessive (harmful) exploration in individual episodes. We propose a natural constraint on exploration -- \textit{uniformly} outperforming a conservative policy (adaptively e
Externí odkaz:
http://arxiv.org/abs/2110.13060