Zobrazeno 1 - 1
of 1
pro vyhledávání: '"Obeso, Oscar"'
Autor:
Arditi, Andy, Obeso, Oscar, Syed, Aaquib, Paleka, Daniel, Panickssery, Nina, Gurnee, Wes, Nanda, Neel
Conversational large language models are fine-tuned for both instruction-following and safety, resulting in models that obey benign requests but refuse harmful ones. While this refusal behavior is widespread across chat models, its underlying mechani
Externí odkaz:
http://arxiv.org/abs/2406.11717