Zobrazeno 1 - 1
of 1
pro vyhledávání: '"McLean, Brayden"'
Autor:
Kundu, Sandipan, Bai, Yuntao, Kadavath, Saurav, Askell, Amanda, Callahan, Andrew, Chen, Anna, Goldie, Anna, Balwit, Avital, Mirhoseini, Azalia, McLean, Brayden, Olsson, Catherine, Evraets, Cassie, Tran-Johnson, Eli, Durmus, Esin, Perez, Ethan, Kernion, Jackson, Kerr, Jamie, Ndousse, Kamal, Nguyen, Karina, Elhage, Nelson, Cheng, Newton, Schiefer, Nicholas, DasSarma, Nova, Rausch, Oliver, Larson, Robin, Yang, Shannon, Kravec, Shauna, Telleen-Lawton, Timothy, Liao, Thomas I., Henighan, Tom, Hume, Tristan, Hatfield-Dodds, Zac, Mindermann, Sören, Joseph, Nicholas, McCandlish, Sam, Kaplan, Jared
Human feedback can prevent overtly harmful utterances in conversational models, but may not automatically mitigate subtle problematic behaviors such as a stated desire for self-preservation or power. Constitutional AI offers an alternative, replacing
Externí odkaz:
http://arxiv.org/abs/2310.13798