Výsledky vyhledávání - "Simsek, Berfin"

Report

Learning Gaussian Multi-Index Models with Gradient Flow: Time Complexity and Directional Convergence

Autor: Simsek, Berfin, Bendjeddou, Amire, Hsu, Daniel

This work focuses on the gradient flow dynamics of a neural network model that uses correlation loss to approximate a multi-index function on high-dimensional standard Gaussian data. Specifically, the multi-index function we consider is a sum of neur

Externí odkaz: http://arxiv.org/abs/2411.08798

Zobrazit plný text záznamu

Report

Learning Associative Memories with Gradient Descent

Autor: Cabannes, Vivien, Simsek, Berfin, Bietti, Alberto

This work focuses on the training dynamics of one associative memory module storing outer products of token embeddings. We reduce this problem to the study of a system of particles, which interact according to properties of the data distribution and

Externí odkaz: http://arxiv.org/abs/2402.18724

Zobrazit plný text záznamu

Report

Loss Landscape of Shallow ReLU-like Neural Networks: Stationary Points, Saddle Escaping, and Network Embedding

Autor: Wu, Zhengqing, Simsek, Berfin, Ged, Francois

In this paper, we investigate the loss landscape of one-hidden-layer neural networks with ReLU-like activation functions trained with the empirical squared loss. As the activation function is non-differentiable, it is so far unclear how to completely

Externí odkaz: http://arxiv.org/abs/2402.05626

Zobrazit plný text záznamu

Report

Should Under-parameterized Student Networks Copy or Average Teacher Weights?

Autor: Şimşek, Berfin, Bendjeddou, Amire, Gerstner, Wulfram, Brea, Johanni

Any continuous function $f^*$ can be approximated arbitrarily well by a neural network with sufficiently many neurons $k$. We consider the case when $f^*$ itself is a neural network with one hidden layer and $k$ neurons. Approximating $f^*$ with a ne

Externí odkaz: http://arxiv.org/abs/2311.01644

Zobrazit plný text záznamu

Report

Statistical physics, Bayesian inference and neural information processing

Autor: Grant, Erin, Nestler, Sandra, Şimşek, Berfin, Solla, Sara

Lecture notes from the course given by Professor Sara A. Solla at the Les Houches summer school on "Statistical physics of Machine Learning". The notes discuss neural information processing through the lens of Statistical Physics. Contents include Ba

Externí odkaz: http://arxiv.org/abs/2309.17006

Zobrazit plný text záznamu

Report

Expand-and-Cluster: Parameter Recovery of Neural Networks

Autor: Martinelli, Flavio, Simsek, Berfin, Gerstner, Wulfram, Brea, Johanni

Can we identify the weights of a neural network by probing its input-output mapping? At first glance, this problem seems to have many solutions because of permutation, overparameterisation and activation function symmetries. Yet, we show that the inc

Externí odkaz: http://arxiv.org/abs/2304.12794

Zobrazit plný text záznamu

Report

MLPGradientFlow: going with the flow of multilayer perceptrons (and finding minima fast and accurately)

Autor: Brea, Johanni, Martinelli, Flavio, Şimşek, Berfin, Gerstner, Wulfram

MLPGradientFlow is a software package to solve numerically the gradient flow differential equation $\dot \theta = -\nabla \mathcal L(\theta; \mathcal D)$, where $\theta$ are the parameters of a multi-layer perceptron, $\mathcal D$ is some data set, a

Externí odkaz: http://arxiv.org/abs/2301.10638

Zobrazit plný text záznamu

Report

Understanding out-of-distribution accuracies through quantifying difficulty of test samples

Autor: Simsek, Berfin, Hall, Melissa, Sagun, Levent

Existing works show that although modern neural networks achieve remarkable generalization performance on the in-distribution (ID) dataset, the accuracy drops significantly on the out-of-distribution (OOD) datasets \cite{recht2018cifar, recht2019imag

Externí odkaz: http://arxiv.org/abs/2203.15100

Zobrazit plný text záznamu

Report

Saddle-to-Saddle Dynamics in Deep Linear Networks: Small Initialization Training, Symmetry, and Sparsity

Autor: Jacot, Arthur, Ged, François, Şimşek, Berfin, Hongler, Clément, Gabriel, Franck

The dynamics of Deep Linear Networks (DLNs) is dramatically affected by the variance $\sigma^2$ of the parameters at initialization $\theta_0$. For DLNs of width $w$, we show a phase transition w.r.t. the scaling $\gamma$ of the variance $\sigma^2=w^

Externí odkaz: http://arxiv.org/abs/2106.15933

Zobrazit plný text záznamu

Report

Geometry of the Loss Landscape in Overparameterized Neural Networks: Symmetries and Invariances

Autor: Şimşek, Berfin, Ged, François, Jacot, Arthur, Spadaro, Francesco, Hongler, Clément, Gerstner, Wulfram, Brea, Johanni

We study how permutation symmetries in overparameterized multi-layer neural networks generate `symmetry-induced' critical points. Assuming a network with $ L $ layers of minimal widths $ r_1^*, \ldots, r_{L-1}^* $ reaches a zero-loss minimum at $ r_1

Externí odkaz: http://arxiv.org/abs/2105.12221

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání