Zobrazeno 1 - 10
of 1 172
pro vyhledávání: '"Kramář P"'
Cells use signalling pathways as windows into the environment to gather information, transduce it into their interior, and use it to drive behaviours. MAPK (ERK) is a highly conserved signalling pathway in eukaryotes, directing multiple fundamental c
Externí odkaz:
http://arxiv.org/abs/2410.22571
Publikováno v:
PRX Life (2024) 2(3), 033005
Network-forming organisms, like fungi and slime molds, dynamically reorganize their networks during foraging. The resulting re-routing of resource flows within the organism's network can significantly impact local ecosystems. In current analysis limi
Externí odkaz:
http://arxiv.org/abs/2408.17134
Autor:
Lieberum, Tom, Rajamanoharan, Senthooran, Conmy, Arthur, Smith, Lewis, Sonnerat, Nicolas, Varma, Vikrant, Kramár, János, Dragan, Anca, Shah, Rohin, Nanda, Neel
Sparse autoencoders (SAEs) are an unsupervised method for learning a sparse decomposition of a neural network's latent representations into seemingly interpretable features. Despite recent excitement about their potential, research applications outsi
Externí odkaz:
http://arxiv.org/abs/2408.05147
Autor:
Rajamanoharan, Senthooran, Lieberum, Tom, Sonnerat, Nicolas, Conmy, Arthur, Varma, Vikrant, Kramár, János, Nanda, Neel
Sparse autoencoders (SAEs) are a promising unsupervised approach for identifying causally relevant and interpretable linear features in a language model's (LM) activations. To be useful for downstream tasks, SAEs need to decompose LM activations fait
Externí odkaz:
http://arxiv.org/abs/2407.14435
Autor:
Kenton, Zachary, Siegel, Noah Y., Kramár, János, Brown-Cohen, Jonah, Albanie, Samuel, Bulian, Jannis, Agarwal, Rishabh, Lindner, David, Tang, Yunhao, Goodman, Noah D., Shah, Rohin
Scalable oversight protocols aim to enable humans to accurately supervise superhuman AI. In this paper we study debate, where two AI's compete to convince a judge; consultancy, where a single AI tries to convince a judge that asks questions; and comp
Externí odkaz:
http://arxiv.org/abs/2407.04622
Autor:
Kramar, David, Krejcirik, David
We consider Dirac operators on the half-line, subject to generalised infinite-mass boundary conditions. We derive sufficient conditions which guarantee the stability of the spectrum against possibly non-self-adjoint potential perturbations and study
Externí odkaz:
http://arxiv.org/abs/2405.10009
Autor:
Rajamanoharan, Senthooran, Conmy, Arthur, Smith, Lewis, Lieberum, Tom, Varma, Vikrant, Kramár, János, Shah, Rohin, Nanda, Neel
Recent work has found that sparse autoencoders (SAEs) are an effective technique for unsupervised discovery of interpretable features in language models' (LMs) activations, by finding sparse, linear reconstructions of LM activations. We introduce the
Externí odkaz:
http://arxiv.org/abs/2404.16014
Activation Patching is a method of directly computing causal attributions of behavior to model components. However, applying it exhaustively requires a sweep with cost scaling linearly in the number of model components, which can be prohibitively exp
Externí odkaz:
http://arxiv.org/abs/2403.00745
One of the most surprising puzzles in neural network generalisation is grokking: a network with perfect training accuracy but poor generalisation will, upon further training, transition to perfect generalisation. We propose that grokking occurs when
Externí odkaz:
http://arxiv.org/abs/2309.02390
We investigate the internal structure of language model computations using causal analysis and demonstrate two motifs: (1) a form of adaptive computation where ablations of one attention layer of a language model cause another layer to compensate (wh
Externí odkaz:
http://arxiv.org/abs/2307.15771