Zobrazeno 1 - 1
of 1
pro vyhledávání: '"Brita, Catalin E."'
In offline reinforcement learning, deriving an effective policy from a pre-collected set of experiences is challenging due to the distribution mismatch between the target policy and the behavioral policy used to collect the data, as well as the limit
Externí odkaz:
http://arxiv.org/abs/2412.06486