Výsledky vyhledávání - "Schoots, Nandi"

Report

The Propensity for Density in Feed-forward Models

Autor: Schoots, Nandi, Jackson, Alex, Kholmovaia, Ali, McBurney, Peter, Shanahan, Murray

Publikováno v: ECAI 2024

Does the process of training a neural network to solve a task tend to use all of the available weights even when the task could be solved with fewer weights? To address this question we study the effects of pruning fully connected, convolutional and

Externí odkaz: http://arxiv.org/abs/2410.14461

Zobrazit plný text záznamu

Report

Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs

Autor: Mathew, Yohan, Matthews, Ollie, McCarthy, Robert, Velja, Joan, de Witt, Christian Schroeder, Cope, Dylan, Schoots, Nandi

The rapid proliferation of frontier model agents promises significant societal advances but also raises concerns about systemic risks arising from unsafe interactions. Collusion to the disadvantage of others has been identified as a central form of u

Externí odkaz: http://arxiv.org/abs/2410.03768

Zobrazit plný text záznamu

Report

Training Neural Networks for Modularity aids Interpretability

Autor: Golechha, Satvik, Cope, Dylan, Schoots, Nandi

An approach to improve network interpretability is via clusterability, i.e., splitting a model into disjoint clusters that can be studied independently. We find pretrained models to be highly unclusterable and thus train models to be more modular usi

Externí odkaz: http://arxiv.org/abs/2409.15747

Zobrazit plný text záznamu

Report

Extending Activation Steering to Broad Skills and Multiple Behaviours

Autor: van der Weij, Teun, Poesio, Massimo, Schoots, Nandi

Current large language models have dangerous capabilities, which are likely to become more problematic in the future. Activation steering techniques can be used to reduce risks from these capabilities. In this paper, we investigate the efficacy of ac

Externí odkaz: http://arxiv.org/abs/2403.05767

Zobrazit plný text záznamu

Report

Dissecting Language Models: Machine Unlearning via Selective Pruning

Autor: Pochinkov, Nicholas, Schoots, Nandi

Understanding and shaping the behaviour of Large Language Models (LLMs) is increasingly important as applications become more powerful and more frequently adopted. This paper introduces a machine unlearning method specifically designed for LLMs. We i

Externí odkaz: http://arxiv.org/abs/2403.01267

Zobrazit plný text záznamu

Report

Improving Activation Steering in Language Models with Mean-Centring

Autor: Jorgensen, Ole, Cope, Dylan, Schoots, Nandi, Shanahan, Murray

Recent work in activation steering has demonstrated the potential to better control the outputs of Large Language Models (LLMs), but it involves finding steering vectors. This is difficult because engineers do not typically know how features are repr

Externí odkaz: http://arxiv.org/abs/2312.03813

Zobrazit plný text záznamu

Report

Comparing Optimization Targets for Contrast-Consistent Search

Autor: Fry, Hugo, Fallows, Seamus, Fan, Ian, Wright, Jamie, Schoots, Nandi

We investigate the optimization target of Contrast-Consistent Search (CCS), which aims to recover the internal representations of truth of a large language model. We present a new loss function that we call the Midpoint-Displacement (MD) loss functio

Externí odkaz: http://arxiv.org/abs/2311.00488

Zobrazit plný text záznamu

Report

Any Deep ReLU Network is Shallow

Autor: Villani, Mattia Jacopo, Schoots, Nandi

We constructively prove that every deep ReLU network can be rewritten as a functionally identical three-layer network with weights valued in the extended reals. Based on this proof, we provide an algorithm that, given a deep ReLU network, finds the e

Externí odkaz: http://arxiv.org/abs/2306.11827

Zobrazit plný text záznamu

Report

Low-Entropy Latent Variables Hurt Out-of-Distribution Performance

Autor: Schoots, Nandi, Cope, Dylan

We study the relationship between the entropy of intermediate representations and a model's robustness to distributional shift. We train models consisting of two feed-forward networks end-to-end separated by a discrete $n$-bit channel on an unsupervi

Externí odkaz: http://arxiv.org/abs/2305.12238

Zobrazit plný text záznamu

Report

A theory of representation learning gives a deep generalisation of kernel methods

Autor: Yang, Adam X., Robeyns, Maxime, Milsom, Edward, Anson, Ben, Schoots, Nandi, Aitchison, Laurence

The successes of modern deep machine learning methods are founded on their ability to transform inputs across multiple layers to build good high-level representations. It is therefore critical to understand this process of representation learning. Ho

Externí odkaz: http://arxiv.org/abs/2108.13097

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání