Zobrazeno 1 - 1
of 1
pro vyhledávání: '"Hong, Sungee"'
We study distributional off-policy evaluation (OPE), of which the goal is to learn the distribution of the return for a target policy using offline data generated by a different policy. The theoretical foundation of many existing work relies on the s
Externí odkaz:
http://arxiv.org/abs/2402.01900