Zobrazeno 1 - 1
of 1
pro vyhledávání: '"Papadatos, Henry"'
Autor:
Papadatos, Henry, Freedman, Rachel
Large language models (LLMs) are often sycophantic, prioritizing agreement with their users over accurate or objective statements. This problematic behavior becomes more pronounced during reinforcement learning from human feedback (RLHF), an LLM fine
Externí odkaz:
http://arxiv.org/abs/2412.00967