Výsledky vyhledávání

Report

Autor: Boix-Adsera, Enric

Distillation is the task of replacing a complicated machine learning model with a simpler model that approximates the original [BCNM06,HVD15]. Despite many practical applications, basic questions about the extent to which models can be distilled, and

Externí odkaz: http://arxiv.org/abs/2403.09053

Zobrazit plný text záznamu

Report

Prompts have evil twins

Autor: Melamed, Rimon, McCabe, Lucas H., Wakhare, Tanay, Kim, Yejin, Huang, H. Howie, Boix-Adsera, Enric

We discover that many natural-language prompts can be replaced by corresponding prompts that are unintelligible to humans but that provably elicit similar behavior in language models. We call these prompts "evil twins" because they are obfuscated and

Externí odkaz: http://arxiv.org/abs/2311.07064

Zobrazit plný text záznamu

Report

When can transformers reason with abstract symbols?

Autor: Boix-Adsera, Enric, Saremi, Omid, Abbe, Emmanuel, Bengio, Samy, Littwin, Etai, Susskind, Joshua

We investigate the capabilities of transformer models on relational reasoning tasks. In these tasks, models are trained on a set of strings encoding abstract relations, and are then tested out-of-distribution on data that contains symbols that did no

Externí odkaz: http://arxiv.org/abs/2310.09753

Zobrazit plný text záznamu

Report

Transformers learn through gradual rank increase

Autor: Boix-Adsera, Enric, Littwin, Etai, Abbe, Emmanuel, Bengio, Samy, Susskind, Joshua

We identify incremental learning dynamics in transformers, where the difference between trained and initial weights progressively increases in rank. We rigorously prove this occurs under the simplifying assumptions of diagonal weight matrices and sma

Externí odkaz: http://arxiv.org/abs/2306.07042

Zobrazit plný text záznamu

Report

Tight conditions for when the NTK approximation is valid

Autor: Boix-Adsera, Enric, Littwin, Etai

We study when the neural tangent kernel (NTK) approximation is valid for training a model with the square loss. In the lazy training setting of Chizat et al. 2019, we show that rescaling the model by a factor of $\alpha = O(T)$ suffices for the NTK a

Externí odkaz: http://arxiv.org/abs/2305.13141

Zobrazit plný text záznamu

Report

SGD learning on neural networks: leap complexity and saddle-to-saddle dynamics

Autor: Abbe, Emmanuel, Boix-Adsera, Enric, Misiakiewicz, Theodor

We investigate the time complexity of SGD learning on fully-connected neural networks with isotropic data. We put forward a complexity measure -- the leap -- which measures how "hierarchical" target functions are. For $d$-dimensional uniform Boolean

Externí odkaz: http://arxiv.org/abs/2302.11055

Zobrazit plný text záznamu

Report

GULP: a prediction-based metric between representations

Autor: Boix-Adsera, Enric, Lawrence, Hannah, Stepaniants, George, Rigollet, Philippe

Comparing the representations learned by different neural networks has recently emerged as a key tool to understand various architectures and ultimately optimize them. In this work, we introduce GULP, a family of distance measures between representat

Externí odkaz: http://arxiv.org/abs/2210.06545

Zobrazit plný text záznamu

Report

On the non-universality of deep learning: quantifying the cost of symmetry

Autor: Abbe, Emmanuel, Boix-Adsera, Enric

We prove limitations on what neural networks trained by noisy gradient descent (GD) can efficiently learn. Our results apply whenever GD training is equivariant, which holds for many standard architectures and initializations. As applications, (i) we

Externí odkaz: http://arxiv.org/abs/2208.03113

Zobrazit plný text záznamu

Report

The merged-staircase property: a necessary and nearly sufficient condition for SGD learning of sparse functions on two-layer neural networks

Autor: Abbe, Emmanuel, Boix-Adsera, Enric, Misiakiewicz, Theodor

It is currently known how to characterize functions that neural networks can learn with SGD for two extremal parameterizations: neural networks in the linear regime, and neural networks with no structural constraints. However, for the main parametriz

Externí odkaz: http://arxiv.org/abs/2202.08658

Zobrazit plný text záznamu

Report

The staircase property: How hierarchical structure can guide deep learning

Autor: Abbe, Emmanuel, Boix-Adsera, Enric, Brennan, Matthew, Bresler, Guy, Nagaraj, Dheeraj

This paper identifies a structural property of data distributions that enables deep neural networks to learn hierarchically. We define the "staircase" property for functions over the Boolean hypercube, which posits that high-order Fourier coefficient

Externí odkaz: http://arxiv.org/abs/2108.10573

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání