Zobrazeno 1 - 1
of 1
pro vyhledávání: '"Pokorny, Rai Michael"'
Autor:
McAleese, Nat, Pokorny, Rai Michael, Uribe, Juan Felipe Ceron, Nitishinskaya, Evgenia, Trebacz, Maja, Leike, Jan
Reinforcement learning from human feedback (RLHF) is fundamentally limited by the capacity of humans to correctly evaluate model output. To improve human evaluation ability and overcome that limitation this work trains "critic" models that help human
Externí odkaz:
http://arxiv.org/abs/2407.00215