Výsledky vyhledávání - "WITT, CHRISTIAN"

Report

MALT: Improving Reasoning with Multi-Agent LLM Training

Autor: Motwani, Sumeet Ramesh, Smith, Chandler, Das, Rocktim Jyoti, Rybchuk, Markian, Torr, Philip H. S., Laptev, Ivan, Pizzati, Fabio, Clark, Ronald, de Witt, Christian Schroeder

Enabling effective collaboration among LLMs is a crucial step toward developing autonomous systems capable of solving complex problems. While LLMs are typically used as single-model generators, where humans critique and refine their outputs, the pote

Externí odkaz: http://arxiv.org/abs/2412.01928

Zobrazit plný text záznamu

Report

Delta-Influence: Unlearning Poisons via Influence Functions

Autor: Li, Wenjie, Li, Jiawei, de Witt, Christian Schroeder, Prabhu, Ameya, Sanyal, Amartya

Addressing data integrity challenges, such as unlearning the effects of data poisoning after model training, is necessary for the reliable deployment of machine learning models. State-of-the-art influence functions, such as EK-FAC, often fail to accu

Externí odkaz: http://arxiv.org/abs/2411.13731

Zobrazit plný text záznamu

Report

MAD-Sherlock: Multi-Agent Debates for Out-of-Context Misinformation Detection

Autor: Lakara, Kumud, Sock, Juil, Rupprecht, Christian, Torr, Philip, Collomosse, John, de Witt, Christian Schroeder

One of the most challenging forms of misinformation involves the out-of-context (OOC) use of images paired with misleading text, creating false narratives. Existing AI-driven detection systems lack explainability and require expensive fine-tuning. We

Externí odkaz: http://arxiv.org/abs/2410.20140

Zobrazit plný text záznamu

Report

Efficient Dictionary Learning with Switch Sparse Autoencoders

Autor: Mudide, Anish, Engels, Joshua, Michaud, Eric J., Tegmark, Max, de Witt, Christian Schroeder

Sparse autoencoders (SAEs) are a recent technique for decomposing neural network activations into human-interpretable features. However, in order for SAEs to identify all features represented in frontier models, it will be necessary to scale them up

Externí odkaz: http://arxiv.org/abs/2410.08201

Zobrazit plný text záznamu

Report

SAGE: Scalable Ground Truth Evaluations for Large Sparse Autoencoders

Autor: Venhoff, Constantin, Calinescu, Anisoara, Torr, Philip, de Witt, Christian Schroeder

A key challenge in interpretability is to decompose model activations into meaningful features. Sparse autoencoders (SAEs) have emerged as a promising tool for this task. However, a central problem in evaluating the quality of SAEs is the absence of

Externí odkaz: http://arxiv.org/abs/2410.07456

Zobrazit plný text záznamu

Report

Toward Robust Real-World Audio Deepfake Detection: Closing the Explainability Gap

Autor: Channing, Georgia, Sock, Juil, Clark, Ronald, Torr, Philip, de Witt, Christian Schroeder

The rapid proliferation of AI-manipulated or generated audio deepfakes poses serious challenges to media integrity and election security. Current AI-driven detection solutions lack explainability and underperform in real-world settings. In this paper

Externí odkaz: http://arxiv.org/abs/2410.07436

Zobrazit plný text záznamu

Report

Comparative Global AI Regulation: Policy Perspectives from the EU, China, and the US

Autor: Chun, Jon, de Witt, Christian Schroeder, Elkins, Katherine

As a powerful and rapidly advancing dual-use technology, AI offers both immense benefits and worrisome risks. In response, governing bodies around the world are developing a range of regulatory AI laws and policies. This paper compares three distinct

Externí odkaz: http://arxiv.org/abs/2410.21279

Zobrazit plný text záznamu

Report

Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs

Autor: Mathew, Yohan, Matthews, Ollie, McCarthy, Robert, Velja, Joan, de Witt, Christian Schroeder, Cope, Dylan, Schoots, Nandi

The rapid proliferation of frontier model agents promises significant societal advances but also raises concerns about systemic risks arising from unsafe interactions. Collusion to the disadvantage of others has been identified as a central form of u

Externí odkaz: http://arxiv.org/abs/2410.03768

Zobrazit plný text záznamu

Report

IDs for AI Systems

Autor: Chan, Alan, Kolt, Noam, Wills, Peter, Anwar, Usman, de Witt, Christian Schroeder, Rajkumar, Nitarshan, Hammond, Lewis, Krueger, David, Heim, Lennart, Anderljung, Markus

AI systems are increasingly pervasive, yet information needed to decide whether and how to engage with them may not exist or be accessible. A user may not be able to verify whether a system has certain safety certifications. An investigator may not k

Externí odkaz: http://arxiv.org/abs/2406.12137

Zobrazit plný text záznamu

Report

Unelicitable Backdoors in Language Models via Cryptographic Transformer Circuits

Autor: Draguns, Andis, Gritsevskiy, Andrew, Motwani, Sumeet Ramesh, Rogers-Smith, Charlie, Ladish, Jeffrey, de Witt, Christian Schroeder

The rapid proliferation of open-source language models significantly increases the risks of downstream backdoor attacks. These backdoors can introduce dangerous behaviours during model deployment and can evade detection by conventional cybersecurity

Externí odkaz: http://arxiv.org/abs/2406.02619

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání