Výsledky vyhledávání

Report

Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats

Autor: Wen, Jiaxin, Hebbar, Vivek, Larson, Caleb, Bhatt, Aryan, Radhakrishnan, Ansh, Sharma, Mrinank, Sleight, Henry, Feng, Shi, He, He, Perez, Ethan, Shlegeris, Buck, Khan, Akbir

As large language models (LLMs) become increasingly capable, it is prudent to assess whether safety measures remain effective even if LLMs intentionally try to bypass them. Previous work introduced control evaluations, an adversarial framework for te

Externí odkaz: http://arxiv.org/abs/2411.17693

Zobrazit plný text záznamu

Report

Rapid Response: Mitigating LLM Jailbreaks with a Few Examples

Autor: Peng, Alwin, Michael, Julian, Sleight, Henry, Perez, Ethan, Sharma, Mrinank

As large language models (LLMs) grow more powerful, ensuring their safety against misuse becomes crucial. While researchers have focused on developing robust defenses, no method has yet achieved complete invulnerability to attacks. We propose an alte

Externí odkaz: http://arxiv.org/abs/2411.07494

Zobrazit plný text záznamu

Report

Looking Inward: Language Models Can Learn About Themselves by Introspection

Autor: Binder, Felix J, Chua, James, Korbak, Tomek, Sleight, Henry, Hughes, John, Long, Robert, Perez, Ethan, Turpin, Miles, Evans, Owain

Humans acquire knowledge by observing the external world, but also by introspection. Introspection gives a person privileged access to their current state of mind (e.g., thoughts and feelings) that is not accessible to external observers. Can LLMs in

Externí odkaz: http://arxiv.org/abs/2410.13787

Zobrazit plný text záznamu

Report

Cosmological Correlators for Bogoliubov Initial States

Autor: Chopping, Alistair J., Sleight, Charlotte, Taronna, Massimo

We consider late-time correlators in de Sitter (dS) space for initial states related to the Bunch-Davies vacuum by a Bogoliubov transformation. We propose to study such late-time correlators by reformulating them in the familiar language of Witten di

Externí odkaz: http://arxiv.org/abs/2407.16652

Zobrazit plný text záznamu

Report

Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs

Autor: Sheshadri, Abhay, Ewart, Aidan, Guo, Phillip, Lynch, Aengus, Wu, Cindy, Hebbar, Vivek, Sleight, Henry, Stickland, Asa Cooper, Perez, Ethan, Hadfield-Menell, Dylan, Casper, Stephen

Large language models (LLMs) can often be made to behave in undesirable ways that they are explicitly fine-tuned not to. For example, the LLM red-teaming literature has produced a wide variety of 'jailbreaking' techniques to elicit harmful text from

Externí odkaz: http://arxiv.org/abs/2407.15549

Zobrazit plný text záznamu

Report

When Do Universal Image Jailbreaks Transfer Between Vision-Language Models?

Autor: Schaeffer, Rylan, Valentine, Dan, Bailey, Luke, Chua, James, Eyzaguirre, Cristóbal, Durante, Zane, Benton, Joe, Miranda, Brando, Sleight, Henry, Hughes, John, Agrawal, Rajashree, Sharma, Mrinank, Emmons, Scott, Koyejo, Sanmi, Perez, Ethan

The integration of new modalities into frontier AI systems offers exciting capabilities, but also increases the possibility such systems can be adversarially manipulated in undesirable ways. In this work, we focus on a popular class of vision-languag

Externí odkaz: http://arxiv.org/abs/2407.15211

Zobrazit plný text záznamu

Report

Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data

Autor: Gerstgrasser, Matthias, Schaeffer, Rylan, Dey, Apratim, Rafailov, Rafael, Sleight, Henry, Hughes, John, Korbak, Tomasz, Agrawal, Rajashree, Pai, Dhruv, Gromov, Andrey, Roberts, Daniel A., Yang, Diyi, Donoho, David L., Koyejo, Sanmi

The proliferation of generative models, combined with pretraining on web-scale data, raises a timely question: what happens when these models are trained on their own generated outputs? Recent investigations into model-data feedback loops proposed th

Externí odkaz: http://arxiv.org/abs/2404.01413

Zobrazit plný text záznamu

Report

Celestial Holography Revisited II: Correlators and K\'all\'en-Lehmann

Autor: Iacobacci, Lorenzo, Sleight, Charlotte, Taronna, Massimo

In this work we continue the investigation of the extrapolate dictionary for celestial holography recently proposed in [2301.01810], at both the perturbative and non-perturbative level. Focusing on scalar field theories, we give a complete set of Fey

Externí odkaz: http://arxiv.org/abs/2401.16591

Zobrazit plný text záznamu

Akademický článek

Cosmological correlators for Bogoliubov initial states

Autor: Alistair J. Chopping, Charlotte Sleight, Massimo Taronna

Publikováno v: Journal of High Energy Physics, Vol 2024, Iss 9, Pp 1-49 (2024)

Abstract We consider late-time correlators in de Sitter (dS) space for initial states related to the Bunch-Davies vacuum by a Bogoliubov transformation. We propose to study such late-time correlators by reformulating them in the familiar language of

Externí odkaz: https://doaj.org/article/7684db9471fe4e33879912015b5dbf8f

Zobrazit plný text záznamu

Akademický článek

Celestial holography revisited. Part II. Correlators and Källén-Lehmann

Autor: Lorenzo Iacobacci, Charlotte Sleight, Massimo Taronna

Publikováno v: Journal of High Energy Physics, Vol 2024, Iss 8, Pp 1-38 (2024)

Abstract In this work we continue the investigation of the extrapolate dictionary for celestial holography recently proposed in [1], at both the perturbative and non-perturbative level. Focusing on scalar field theories, we give a complete set of Fey

Externí odkaz: https://doaj.org/article/8d6b09035aec479b870b26dad6e27636

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání