Zobrazeno 1 - 10
of 3 031
pro vyhledávání: '"SLEIGHT, P."'
Autor:
Wen, Jiaxin, Hebbar, Vivek, Larson, Caleb, Bhatt, Aryan, Radhakrishnan, Ansh, Sharma, Mrinank, Sleight, Henry, Feng, Shi, He, He, Perez, Ethan, Shlegeris, Buck, Khan, Akbir
As large language models (LLMs) become increasingly capable, it is prudent to assess whether safety measures remain effective even if LLMs intentionally try to bypass them. Previous work introduced control evaluations, an adversarial framework for te
Externí odkaz:
http://arxiv.org/abs/2411.17693
As large language models (LLMs) grow more powerful, ensuring their safety against misuse becomes crucial. While researchers have focused on developing robust defenses, no method has yet achieved complete invulnerability to attacks. We propose an alte
Externí odkaz:
http://arxiv.org/abs/2411.07494
Autor:
Binder, Felix J, Chua, James, Korbak, Tomek, Sleight, Henry, Hughes, John, Long, Robert, Perez, Ethan, Turpin, Miles, Evans, Owain
Humans acquire knowledge by observing the external world, but also by introspection. Introspection gives a person privileged access to their current state of mind (e.g., thoughts and feelings) that is not accessible to external observers. Can LLMs in
Externí odkaz:
http://arxiv.org/abs/2410.13787
We consider late-time correlators in de Sitter (dS) space for initial states related to the Bunch-Davies vacuum by a Bogoliubov transformation. We propose to study such late-time correlators by reformulating them in the familiar language of Witten di
Externí odkaz:
http://arxiv.org/abs/2407.16652
Autor:
Sheshadri, Abhay, Ewart, Aidan, Guo, Phillip, Lynch, Aengus, Wu, Cindy, Hebbar, Vivek, Sleight, Henry, Stickland, Asa Cooper, Perez, Ethan, Hadfield-Menell, Dylan, Casper, Stephen
Large language models (LLMs) can often be made to behave in undesirable ways that they are explicitly fine-tuned not to. For example, the LLM red-teaming literature has produced a wide variety of 'jailbreaking' techniques to elicit harmful text from
Externí odkaz:
http://arxiv.org/abs/2407.15549
Autor:
Schaeffer, Rylan, Valentine, Dan, Bailey, Luke, Chua, James, Eyzaguirre, Cristóbal, Durante, Zane, Benton, Joe, Miranda, Brando, Sleight, Henry, Hughes, John, Agrawal, Rajashree, Sharma, Mrinank, Emmons, Scott, Koyejo, Sanmi, Perez, Ethan
The integration of new modalities into frontier AI systems offers exciting capabilities, but also increases the possibility such systems can be adversarially manipulated in undesirable ways. In this work, we focus on a popular class of vision-languag
Externí odkaz:
http://arxiv.org/abs/2407.15211
Autor:
Gerstgrasser, Matthias, Schaeffer, Rylan, Dey, Apratim, Rafailov, Rafael, Sleight, Henry, Hughes, John, Korbak, Tomasz, Agrawal, Rajashree, Pai, Dhruv, Gromov, Andrey, Roberts, Daniel A., Yang, Diyi, Donoho, David L., Koyejo, Sanmi
The proliferation of generative models, combined with pretraining on web-scale data, raises a timely question: what happens when these models are trained on their own generated outputs? Recent investigations into model-data feedback loops proposed th
Externí odkaz:
http://arxiv.org/abs/2404.01413
In this work we continue the investigation of the extrapolate dictionary for celestial holography recently proposed in [2301.01810], at both the perturbative and non-perturbative level. Focusing on scalar field theories, we give a complete set of Fey
Externí odkaz:
http://arxiv.org/abs/2401.16591
Publikováno v:
Journal of High Energy Physics, Vol 2024, Iss 9, Pp 1-49 (2024)
Abstract We consider late-time correlators in de Sitter (dS) space for initial states related to the Bunch-Davies vacuum by a Bogoliubov transformation. We propose to study such late-time correlators by reformulating them in the familiar language of
Externí odkaz:
https://doaj.org/article/7684db9471fe4e33879912015b5dbf8f
Publikováno v:
Journal of High Energy Physics, Vol 2024, Iss 8, Pp 1-38 (2024)
Abstract In this work we continue the investigation of the extrapolate dictionary for celestial holography recently proposed in [1], at both the perturbative and non-perturbative level. Focusing on scalar field theories, we give a complete set of Fey
Externí odkaz:
https://doaj.org/article/8d6b09035aec479b870b26dad6e27636