Zobrazeno 1 - 10
of 29 419
pro vyhledávání: '"Barbara, E."'
Off-policy evaluation (OPE) provides safety guarantees by estimating the performance of a policy before deployment. Recent work introduced IS+, an importance sampling (IS) estimator that uses expert-annotated counterfactual samples to improve behavio
Externí odkaz:
http://arxiv.org/abs/2412.08052
Publikováno v:
Reinforcement Learning Journal 3 (2024) 1138-1167
In this work, we study an inverse reinforcement learning (IRL) problem where the experts are planning under a shared reward function but with different, unknown planning horizons. Without the knowledge of discount factors, the reward function has a l
Externí odkaz:
http://arxiv.org/abs/2409.18051
Contrastive dimension reduction methods have been developed for case-control study data to identify variation that is enriched in the foreground (case) data X relative to the background (control) data Y. Here, we develop contrastive regression for th
Externí odkaz:
http://arxiv.org/abs/2401.03106