Zobrazeno 1 - 10
of 644
pro vyhledávání: '"Witt Christian"'
Addressing data integrity challenges, such as unlearning the effects of data poisoning after model training, is necessary for the reliable deployment of machine learning models. State-of-the-art influence functions, such as EK-FAC, often fail to accu
Externí odkaz:
http://arxiv.org/abs/2411.13731
Autor:
Lakara, Kumud, Sock, Juil, Rupprecht, Christian, Torr, Philip, Collomosse, John, de Witt, Christian Schroeder
One of the most challenging forms of misinformation involves the out-of-context (OOC) use of images paired with misleading text, creating false narratives. Existing AI-driven detection systems lack explainability and require expensive fine-tuning. We
Externí odkaz:
http://arxiv.org/abs/2410.20140
Sparse autoencoders (SAEs) are a recent technique for decomposing neural network activations into human-interpretable features. However, in order for SAEs to identify all features represented in frontier models, it will be necessary to scale them up
Externí odkaz:
http://arxiv.org/abs/2410.08201
A key challenge in interpretability is to decompose model activations into meaningful features. Sparse autoencoders (SAEs) have emerged as a promising tool for this task. However, a central problem in evaluating the quality of SAEs is the absence of
Externí odkaz:
http://arxiv.org/abs/2410.07456
The rapid proliferation of AI-manipulated or generated audio deepfakes poses serious challenges to media integrity and election security. Current AI-driven detection solutions lack explainability and underperform in real-world settings. In this paper
Externí odkaz:
http://arxiv.org/abs/2410.07436
As a powerful and rapidly advancing dual-use technology, AI offers both immense benefits and worrisome risks. In response, governing bodies around the world are developing a range of regulatory AI laws and policies. This paper compares three distinct
Externí odkaz:
http://arxiv.org/abs/2410.21279
Autor:
Mathew, Yohan, Matthews, Ollie, McCarthy, Robert, Velja, Joan, de Witt, Christian Schroeder, Cope, Dylan, Schoots, Nandi
The rapid proliferation of frontier model agents promises significant societal advances but also raises concerns about systemic risks arising from unsafe interactions. Collusion to the disadvantage of others has been identified as a central form of u
Externí odkaz:
http://arxiv.org/abs/2410.03768
Autor:
Chan, Alan, Kolt, Noam, Wills, Peter, Anwar, Usman, de Witt, Christian Schroeder, Rajkumar, Nitarshan, Hammond, Lewis, Krueger, David, Heim, Lennart, Anderljung, Markus
AI systems are increasingly pervasive, yet information needed to decide whether and how to engage with them may not exist or be accessible. A user may not be able to verify whether a system has certain safety certifications. An investigator may not k
Externí odkaz:
http://arxiv.org/abs/2406.12137
Autor:
Draguns, Andis, Gritsevskiy, Andrew, Motwani, Sumeet Ramesh, Rogers-Smith, Charlie, Ladish, Jeffrey, de Witt, Christian Schroeder
The rapid proliferation of open-source language models significantly increases the risks of downstream backdoor attacks. These backdoors can introduce dangerous behaviours during model deployment and can evade detection by conventional cybersecurity
Externí odkaz:
http://arxiv.org/abs/2406.02619
Autor:
Sokota, Samuel, Sam, Dylan, de Witt, Christian Schroeder, Compton, Spencer, Foerster, Jakob, Kolter, J. Zico
Minimum-entropy coupling (MEC) -- the process of finding a joint distribution with minimum entropy for given marginals -- has applications in areas such as causality and steganography. However, existing algorithms are either computationally intractab
Externí odkaz:
http://arxiv.org/abs/2405.19540