Zobrazeno 1 - 10
of 461
pro vyhledávání: '"Krzakala, Florent"'
A key property of neural networks is their capacity of adapting to data during training. Yet, our current mathematical understanding of feature learning and its relationship to generalization remain limited. In this work, we provide a random matrix a
Externí odkaz:
http://arxiv.org/abs/2410.18938
Optical training of large-scale Transformers and deep neural networks with direct feedback alignment
Autor:
Wang, Ziao, Müller, Kilian, Filipovich, Matthew, Launay, Julien, Ohana, Ruben, Pariente, Gustave, Mokaadi, Safa, Brossollet, Charles, Moreau, Fabien, Cappelli, Alessandro, Poli, Iacopo, Carron, Igor, Daudet, Laurent, Krzakala, Florent, Gigan, Sylvain
Modern machine learning relies nearly exclusively on dedicated electronic hardware accelerators. Photonic approaches, with low consumption and high operation speed, are increasingly considered for inference but, to date, remain mostly limited to rela
Externí odkaz:
http://arxiv.org/abs/2409.12965
We consider the problem of learning a target function corresponding to a single hidden layer neural network, with a quadratic activation function after the first layer, and random weights. We consider the asymptotic limit where the input dimension an
Externí odkaz:
http://arxiv.org/abs/2408.03733
Noiseless compressive sensing is a two-steps setting that allows for undersampling a sparse signal and then reconstructing it without loss of information. The LASSO algorithm, based on $\lone$ regularization, provides an efficient and robust to addre
Externí odkaz:
http://arxiv.org/abs/2408.08319
Autor:
Arnaboldi, Luca, Dandi, Yatin, Krzakala, Florent, Loureiro, Bruno, Pesce, Luca, Stephan, Ludovic
Publikováno v:
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:1730-1762, 2024
We study the impact of the batch size $n_b$ on the iteration time $T$ of training two-layer neural networks with one-pass stochastic gradient descent (SGD) on multi-index target functions of isotropic covariates. We characterize the optimal batch siz
Externí odkaz:
http://arxiv.org/abs/2406.02157
Autor:
Troiani, Emanuele, Dandi, Yatin, Defilippis, Leonardo, Zdeborová, Lenka, Loureiro, Bruno, Krzakala, Florent
Multi-index models - functions which only depend on the covariates through a non-linear transformation of their projection on a subspace - are a useful benchmark for investigating feature learning with neural nets. This paper examines the theoretical
Externí odkaz:
http://arxiv.org/abs/2405.15480
Neural networks can identify low-dimensional relevant structures within high-dimensional noisy data, yet our mathematical understanding of how they do so remains scarce. Here, we investigate the training dynamics of two-layer shallow neural networks
Externí odkaz:
http://arxiv.org/abs/2405.15459
Publikováno v:
J. Stat. Mech. (2024) 083302
The Sherrington-Kirkpatrick (SK) model is a prototype of a complex non-convex energy landscape. Dynamical processes evolving on such landscapes and locally aiming to reach minima are generally poorly understood. Here, we study quenches, i.e. dynamics
Externí odkaz:
http://arxiv.org/abs/2405.04267
We consider the task of estimating a low-rank matrix from non-linear and noisy observations. We prove a strong universality result showing that Bayes-optimal performances are characterized by an equivalent Gaussian model with an effective prior, whos
Externí odkaz:
http://arxiv.org/abs/2403.04234
Publikováno v:
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:35470-35491, 2024
We discuss the inhomogeneous spiked Wigner model, a theoretical framework recently introduced to study structured noise in various learning scenarios, through the prism of random matrix theory, with a specific focus on its spectral properties. Our pr
Externí odkaz:
http://arxiv.org/abs/2403.03695