Zobrazeno 1 - 10
of 15
pro vyhledávání: '"Schoots, Nandi"'
Publikováno v:
ECAI 2024
Does the process of training a neural network to solve a task tend to use all of the available weights even when the task could be solved with fewer weights? To address this question we study the effects of pruning fully connected, convolutional and
Externí odkaz:
http://arxiv.org/abs/2410.14461
Autor:
Mathew, Yohan, Matthews, Ollie, McCarthy, Robert, Velja, Joan, de Witt, Christian Schroeder, Cope, Dylan, Schoots, Nandi
The rapid proliferation of frontier model agents promises significant societal advances but also raises concerns about systemic risks arising from unsafe interactions. Collusion to the disadvantage of others has been identified as a central form of u
Externí odkaz:
http://arxiv.org/abs/2410.03768
An approach to improve network interpretability is via clusterability, i.e., splitting a model into disjoint clusters that can be studied independently. We find pretrained models to be highly unclusterable and thus train models to be more modular usi
Externí odkaz:
http://arxiv.org/abs/2409.15747
Current large language models have dangerous capabilities, which are likely to become more problematic in the future. Activation steering techniques can be used to reduce risks from these capabilities. In this paper, we investigate the efficacy of ac
Externí odkaz:
http://arxiv.org/abs/2403.05767
Autor:
Pochinkov, Nicholas, Schoots, Nandi
Understanding and shaping the behaviour of Large Language Models (LLMs) is increasingly important as applications become more powerful and more frequently adopted. This paper introduces a machine unlearning method specifically designed for LLMs. We i
Externí odkaz:
http://arxiv.org/abs/2403.01267
Recent work in activation steering has demonstrated the potential to better control the outputs of Large Language Models (LLMs), but it involves finding steering vectors. This is difficult because engineers do not typically know how features are repr
Externí odkaz:
http://arxiv.org/abs/2312.03813
We investigate the optimization target of Contrast-Consistent Search (CCS), which aims to recover the internal representations of truth of a large language model. We present a new loss function that we call the Midpoint-Displacement (MD) loss functio
Externí odkaz:
http://arxiv.org/abs/2311.00488
Autor:
Villani, Mattia Jacopo, Schoots, Nandi
We constructively prove that every deep ReLU network can be rewritten as a functionally identical three-layer network with weights valued in the extended reals. Based on this proof, we provide an algorithm that, given a deep ReLU network, finds the e
Externí odkaz:
http://arxiv.org/abs/2306.11827
Autor:
Schoots, Nandi, Cope, Dylan
We study the relationship between the entropy of intermediate representations and a model's robustness to distributional shift. We train models consisting of two feed-forward networks end-to-end separated by a discrete $n$-bit channel on an unsupervi
Externí odkaz:
http://arxiv.org/abs/2305.12238
Autor:
Yang, Adam X., Robeyns, Maxime, Milsom, Edward, Anson, Ben, Schoots, Nandi, Aitchison, Laurence
The successes of modern deep machine learning methods are founded on their ability to transform inputs across multiple layers to build good high-level representations. It is therefore critical to understand this process of representation learning. Ho
Externí odkaz:
http://arxiv.org/abs/2108.13097