Zobrazeno 1 - 4
of 4
pro vyhledávání: '"Sleight, Henry"'
Autor:
Sheshadri, Abhay, Ewart, Aidan, Guo, Phillip, Lynch, Aengus, Wu, Cindy, Hebbar, Vivek, Sleight, Henry, Stickland, Asa Cooper, Perez, Ethan, Hadfield-Menell, Dylan, Casper, Stephen
Large language models (LLMs) can often be made to behave in undesirable ways that they are explicitly fine-tuned not to. For example, the LLM red-teaming literature has produced a wide variety of 'jailbreaking' techniques to elicit harmful text from
Externí odkaz:
http://arxiv.org/abs/2407.15549
Autor:
Schaeffer, Rylan, Valentine, Dan, Bailey, Luke, Chua, James, Eyzaguirre, Cristóbal, Durante, Zane, Benton, Joe, Miranda, Brando, Sleight, Henry, Hughes, John, Agrawal, Rajashree, Sharma, Mrinank, Emmons, Scott, Koyejo, Sanmi, Perez, Ethan
The integration of new modalities into frontier AI systems offers exciting capabilities, but also increases the possibility such systems can be adversarially manipulated in undesirable ways. In this work, we focus on a popular class of vision-languag
Externí odkaz:
http://arxiv.org/abs/2407.15211
Autor:
Gerstgrasser, Matthias, Schaeffer, Rylan, Dey, Apratim, Rafailov, Rafael, Sleight, Henry, Hughes, John, Korbak, Tomasz, Agrawal, Rajashree, Pai, Dhruv, Gromov, Andrey, Roberts, Daniel A., Yang, Diyi, Donoho, David L., Koyejo, Sanmi
The proliferation of generative models, combined with pretraining on web-scale data, raises a timely question: what happens when these models are trained on their own generated outputs? Recent investigations into model-data feedback loops proposed th
Externí odkaz:
http://arxiv.org/abs/2404.01413
Autor:
SLEIGHT, HENRY C.
Publikováno v:
Country Gentleman; 02/09/1860, Vol. 15 Issue 6, p94-94, 1/3p, 2 Illustrations