Zobrazeno 1 - 10
of 44
pro vyhledávání: '"Adserà, P."'
Autor:
Boix-Adsera, Enric
Distillation is the task of replacing a complicated machine learning model with a simpler model that approximates the original [BCNM06,HVD15]. Despite many practical applications, basic questions about the extent to which models can be distilled, and
Externí odkaz:
http://arxiv.org/abs/2403.09053
Autor:
Melamed, Rimon, McCabe, Lucas H., Wakhare, Tanay, Kim, Yejin, Huang, H. Howie, Boix-Adsera, Enric
We discover that many natural-language prompts can be replaced by corresponding prompts that are unintelligible to humans but that provably elicit similar behavior in language models. We call these prompts "evil twins" because they are obfuscated and
Externí odkaz:
http://arxiv.org/abs/2311.07064
Autor:
Boix-Adsera, Enric, Saremi, Omid, Abbe, Emmanuel, Bengio, Samy, Littwin, Etai, Susskind, Joshua
We investigate the capabilities of transformer models on relational reasoning tasks. In these tasks, models are trained on a set of strings encoding abstract relations, and are then tested out-of-distribution on data that contains symbols that did no
Externí odkaz:
http://arxiv.org/abs/2310.09753
We identify incremental learning dynamics in transformers, where the difference between trained and initial weights progressively increases in rank. We rigorously prove this occurs under the simplifying assumptions of diagonal weight matrices and sma
Externí odkaz:
http://arxiv.org/abs/2306.07042
Autor:
Boix-Adsera, Enric, Littwin, Etai
We study when the neural tangent kernel (NTK) approximation is valid for training a model with the square loss. In the lazy training setting of Chizat et al. 2019, we show that rescaling the model by a factor of $\alpha = O(T)$ suffices for the NTK a
Externí odkaz:
http://arxiv.org/abs/2305.13141
We investigate the time complexity of SGD learning on fully-connected neural networks with isotropic data. We put forward a complexity measure -- the leap -- which measures how "hierarchical" target functions are. For $d$-dimensional uniform Boolean
Externí odkaz:
http://arxiv.org/abs/2302.11055
Comparing the representations learned by different neural networks has recently emerged as a key tool to understand various architectures and ultimately optimize them. In this work, we introduce GULP, a family of distance measures between representat
Externí odkaz:
http://arxiv.org/abs/2210.06545
Autor:
Abbe, Emmanuel, Boix-Adsera, Enric
We prove limitations on what neural networks trained by noisy gradient descent (GD) can efficiently learn. Our results apply whenever GD training is equivariant, which holds for many standard architectures and initializations. As applications, (i) we
Externí odkaz:
http://arxiv.org/abs/2208.03113
It is currently known how to characterize functions that neural networks can learn with SGD for two extremal parameterizations: neural networks in the linear regime, and neural networks with no structural constraints. However, for the main parametriz
Externí odkaz:
http://arxiv.org/abs/2202.08658
This paper identifies a structural property of data distributions that enables deep neural networks to learn hierarchically. We define the "staircase" property for functions over the Boolean hypercube, which posits that high-order Fourier coefficient
Externí odkaz:
http://arxiv.org/abs/2108.10573