Výsledky vyhledávání - "Papadatos, Henry"

Report

Linear Probe Penalties Reduce LLM Sycophancy

Autor: Papadatos, Henry, Freedman, Rachel

Large language models (LLMs) are often sycophantic, prioritizing agreement with their users over accurate or objective statements. This problematic behavior becomes more pronounced during reinforcement learning from human feedback (RLHF), an LLM fine

Externí odkaz: http://arxiv.org/abs/2412.00967

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání