Zobrazeno 1 - 10
of 19
pro vyhledávání: '"Simsek, Berfin"'
Learning Gaussian Multi-Index Models with Gradient Flow: Time Complexity and Directional Convergence
This work focuses on the gradient flow dynamics of a neural network model that uses correlation loss to approximate a multi-index function on high-dimensional standard Gaussian data. Specifically, the multi-index function we consider is a sum of neur
Externí odkaz:
http://arxiv.org/abs/2411.08798
This work focuses on the training dynamics of one associative memory module storing outer products of token embeddings. We reduce this problem to the study of a system of particles, which interact according to properties of the data distribution and
Externí odkaz:
http://arxiv.org/abs/2402.18724
In this paper, we investigate the loss landscape of one-hidden-layer neural networks with ReLU-like activation functions trained with the empirical squared loss. As the activation function is non-differentiable, it is so far unclear how to completely
Externí odkaz:
http://arxiv.org/abs/2402.05626
Any continuous function $f^*$ can be approximated arbitrarily well by a neural network with sufficiently many neurons $k$. We consider the case when $f^*$ itself is a neural network with one hidden layer and $k$ neurons. Approximating $f^*$ with a ne
Externí odkaz:
http://arxiv.org/abs/2311.01644
Lecture notes from the course given by Professor Sara A. Solla at the Les Houches summer school on "Statistical physics of Machine Learning". The notes discuss neural information processing through the lens of Statistical Physics. Contents include Ba
Externí odkaz:
http://arxiv.org/abs/2309.17006
Can we identify the weights of a neural network by probing its input-output mapping? At first glance, this problem seems to have many solutions because of permutation, overparameterisation and activation function symmetries. Yet, we show that the inc
Externí odkaz:
http://arxiv.org/abs/2304.12794
MLPGradientFlow is a software package to solve numerically the gradient flow differential equation $\dot \theta = -\nabla \mathcal L(\theta; \mathcal D)$, where $\theta$ are the parameters of a multi-layer perceptron, $\mathcal D$ is some data set, a
Externí odkaz:
http://arxiv.org/abs/2301.10638
Existing works show that although modern neural networks achieve remarkable generalization performance on the in-distribution (ID) dataset, the accuracy drops significantly on the out-of-distribution (OOD) datasets \cite{recht2018cifar, recht2019imag
Externí odkaz:
http://arxiv.org/abs/2203.15100
The dynamics of Deep Linear Networks (DLNs) is dramatically affected by the variance $\sigma^2$ of the parameters at initialization $\theta_0$. For DLNs of width $w$, we show a phase transition w.r.t. the scaling $\gamma$ of the variance $\sigma^2=w^
Externí odkaz:
http://arxiv.org/abs/2106.15933
Autor:
Şimşek, Berfin, Ged, François, Jacot, Arthur, Spadaro, Francesco, Hongler, Clément, Gerstner, Wulfram, Brea, Johanni
We study how permutation symmetries in overparameterized multi-layer neural networks generate `symmetry-induced' critical points. Assuming a network with $ L $ layers of minimal widths $ r_1^*, \ldots, r_{L-1}^* $ reaches a zero-loss minimum at $ r_1
Externí odkaz:
http://arxiv.org/abs/2105.12221