Výsledky vyhledávání - "Noci, Lorenzo"

Report

Understanding and Minimising Outlier Features in Neural Network Training

Autor: He, Bobby, Noci, Lorenzo, Paliotta, Daniele, Schlag, Imanol, Hofmann, Thomas

Outlier Features (OFs) are neurons whose activation magnitudes significantly exceed the average over a neural network's (NN) width. They are well known to emerge during standard transformer training and have the undesirable effect of hindering quanti

Externí odkaz: http://arxiv.org/abs/2405.19279

Zobrazit plný text záznamu

Report

Super Consistency of Neural Network Landscapes and Learning Rate Transfer

Autor: Noci, Lorenzo, Meterez, Alexandru, Hofmann, Thomas, Orvieto, Antonio

Recently, there has been growing evidence that if the width and depth of a neural network are scaled toward the so-called rich feature learning limit (\mup and its depth extension), then some hyperparameters -- such as the learning rate -- exhibit tr

Externí odkaz: http://arxiv.org/abs/2402.17457

Zobrazit plný text záznamu

Report

How Good is a Single Basin?

Autor: Lion, Kai, Noci, Lorenzo, Hofmann, Thomas, Bachmann, Gregor

The multi-modal nature of neural loss landscapes is often considered to be the main driver behind the empirical success of deep ensembles. In this work, we probe this belief by constructing various "connected" ensembles which are restricted to lie in

Externí odkaz: http://arxiv.org/abs/2402.03187

Zobrazit plný text záznamu

Report

Disentangling Linear Mode-Connectivity

Autor: Altintas, Gul Sena, Bachmann, Gregor, Noci, Lorenzo, Hofmann, Thomas

Linear mode-connectivity (LMC) (or lack thereof) is one of the intriguing characteristics of neural network loss landscapes. While empirically well established, it unfortunately still lacks a proper theoretical understanding. Even worse, although emp

Externí odkaz: http://arxiv.org/abs/2312.09832

Zobrazit plný text záznamu

Report

Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit

Autor: Bordelon, Blake, Noci, Lorenzo, Li, Mufan Bill, Hanin, Boris, Pehlevan, Cengiz

The cost of hyperparameter tuning in deep learning has been rising with model sizes, prompting practitioners to find new tuning methods using a proxy of smaller networks. One such proposal uses $\mu$P parameterized networks, where the optimal hyperpa

Externí odkaz: http://arxiv.org/abs/2309.16620

Zobrazit plný text záznamu

Report

The Shaped Transformer: Attention Models in the Infinite Depth-and-Width Limit

Autor: Noci, Lorenzo, Li, Chuning, Li, Mufan Bill, He, Bobby, Hofmann, Thomas, Maddison, Chris, Roy, Daniel M.

In deep learning theory, the covariance matrix of the representations serves as a proxy to examine the network's trainability. Motivated by the success of Transformers, we study the covariance matrix of a modified Softmax-based attention model with s

Externí odkaz: http://arxiv.org/abs/2306.17759

Zobrazit plný text záznamu

Report

Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers

Autor: Anagnostidis, Sotiris, Pavllo, Dario, Biggio, Luca, Noci, Lorenzo, Lucchi, Aurelien, Hofmann, Thomas

Autoregressive Transformers adopted in Large Language Models (LLMs) are hard to scale to long sequences. Despite several works trying to reduce their computational cost, most of LLMs still adopt attention layers between all pairs of tokens in the seq

Externí odkaz: http://arxiv.org/abs/2305.15805

Zobrazit plný text záznamu

Report

Achieving a Better Stability-Plasticity Trade-off via Auxiliary Networks in Continual Learning

Autor: Kim, Sanghwan, Noci, Lorenzo, Orvieto, Antonio, Hofmann, Thomas

In contrast to the natural capabilities of humans to learn new tasks in a sequential fashion, neural networks are known to suffer from catastrophic forgetting, where the model's performances on old tasks drop dramatically after being optimized for a

Externí odkaz: http://arxiv.org/abs/2303.09483

Zobrazit plný text záznamu

Report

The Curious Case of Benign Memorization

Autor: Anagnostidis, Sotiris, Bachmann, Gregor, Noci, Lorenzo, Hofmann, Thomas

Despite the empirical advances of deep learning across a variety of learning tasks, our theoretical understanding of its success is still very restricted. One of the key challenges is the overparametrized nature of modern models, enabling complete ov

Externí odkaz: http://arxiv.org/abs/2210.14019

Zobrazit plný text záznamu

Report

Signal Propagation in Transformers: Theoretical Perspectives and the Role of Rank Collapse

Autor: Noci, Lorenzo, Anagnostidis, Sotiris, Biggio, Luca, Orvieto, Antonio, Singh, Sidak Pal, Lucchi, Aurelien

Transformers have achieved remarkable success in several domains, ranging from natural language processing to computer vision. Nevertheless, it has been recently shown that stacking self-attention layers - the distinctive architectural component of T

Externí odkaz: http://arxiv.org/abs/2206.03126

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání