Zobrazeno 1 - 10
of 12 260
pro vyhledávání: '"Garriga A"'
In a Universe with nearly-Gaussian initial curvature perturbations, the abundance of primordial black holes can be derived from the curvature power spectrum. When the latter is enhanced within a narrow range around a characteristic scale, the resulti
Externí odkaz:
http://arxiv.org/abs/2412.07709
Autor:
Shi, Claudia, Beltran-Velez, Nicolas, Nazaret, Achille, Zheng, Carolina, Garriga-Alonso, Adrià, Jesson, Andrew, Makar, Maggie, Blei, David M.
Large language models (LLMs) demonstrate surprising capabilities, but we do not understand how they are implemented. One hypothesis suggests that these capabilities are primarily executed by small subnetworks within the LLM, known as circuits. But ho
Externí odkaz:
http://arxiv.org/abs/2410.13032
Autor:
Taufeeque, Mohammad, Quirke, Philip, Li, Maximilian, Cundy, Chris, Tucker, Aaron David, Gleave, Adam, Garriga-Alonso, Adrià
How a neural network (NN) generalizes to novel situations depends on whether it has learned to select actions heuristically or via a planning process. "An investigation of model-free planning" (Guez et al. 2019) found that a recurrent NN (RNN) traine
Externí odkaz:
http://arxiv.org/abs/2407.15421
Circuits are supposed to accurately describe how a neural network performs a specific task, but do they really? We evaluate three circuits found in the literature (IOI, greater-than, and docstring) in an adversarial manner, considering inputs where t
Externí odkaz:
http://arxiv.org/abs/2407.15166
When applying reinforcement learning from human feedback (RLHF), the reward is learned from data and, therefore, always has some error. It is common to mitigate this by regularizing the policy with KL divergence from a base model, with the hope that
Externí odkaz:
http://arxiv.org/abs/2407.14503
Mechanistic interpretability methods aim to identify the algorithm a neural network implements, but it is difficult to validate such methods when the true algorithm is unknown. This work presents InterpBench, a collection of semi-synthetic yet realis
Externí odkaz:
http://arxiv.org/abs/2407.14494
How well will current interpretability techniques generalize to future models? A relevant case study is Mamba, a recent recurrent architecture with scaling comparable to Transformers. We adapt pre-Mamba techniques to Mamba and partially reverse-engin
Externí odkaz:
http://arxiv.org/abs/2407.14008
Autor:
Tan, Daniel, Chanin, David, Lynch, Aengus, Kanoulas, Dimitrios, Paige, Brooks, Garriga-Alonso, Adria, Kirk, Robert
Steering vectors (SVs) are a new approach to efficiently adjust language model behaviour at inference time by intervening on intermediate model activations. They have shown promise in terms of improving both capabilities and model alignment. However,
Externí odkaz:
http://arxiv.org/abs/2407.12404
Autor:
Skartados, Evangelos, Yucel, Mehmet Kerim, Manganelli, Bruno, Drosou, Anastasios, Saà-Garriga, Albert
Neural Radiance Fields (NeRF) have quickly become the primary approach for 3D reconstruction and novel view synthesis in recent years due to their remarkable performance. Despite the huge interest in NeRF methods, a practical use case of NeRFs has la
Externí odkaz:
http://arxiv.org/abs/2403.04508
This study provides evidence that personality can be reliably predicted from activity data collected through mobile phone sensors. Employing a set of well informed indicators calculable from accelerometer records and movement patterns, we were able t
Externí odkaz:
http://arxiv.org/abs/2401.10305