Výsledky vyhledávání - "Brita, Catalin E."

Report

SimuDICE: Offline Policy Optimization Through World Model Updates and DICE Estimation

Autor: Brita, Catalin E., Bongers, Stephan, Oliehoek, Frans A.

In offline reinforcement learning, deriving an effective policy from a pre-collected set of experiences is challenging due to the distribution mismatch between the target policy and the behavioral policy used to collect the data, as well as the limit

Externí odkaz: http://arxiv.org/abs/2412.06486

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání