Zobrazeno 1 - 1
of 1
pro vyhledávání: '"Choi, Heewoong"'
In Reinforcement Learning (RL), designing precise reward functions remains to be a challenge, particularly when aligning with human intent. Preference-based RL (PbRL) was introduced to address this problem by learning reward models from human feedbac
Externí odkaz:
http://arxiv.org/abs/2408.04190