Zobrazeno 1 - 10
of 62
pro vyhledávání: '"Stephan, Ludovic"'
Autor:
Stephan, Ludovic, Zhu, Yizhe
The Bethe-Hessian matrix, introduced by Saade, Krzakala, and Zdeborov\'a (2014), is a Hermitian matrix designed for applying spectral clustering algorithms to sparse networks. Rather than employing a non-symmetric and high-dimensional non-backtrackin
Externí odkaz:
http://arxiv.org/abs/2411.02835
Autor:
Arnaboldi, Luca, Dandi, Yatin, Krzakala, Florent, Loureiro, Bruno, Pesce, Luca, Stephan, Ludovic
Publikováno v:
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:1730-1762, 2024
We study the impact of the batch size $n_b$ on the iteration time $T$ of training two-layer neural networks with one-pass stochastic gradient descent (SGD) on multi-index target functions of isotropic covariates. We characterize the optimal batch siz
Externí odkaz:
http://arxiv.org/abs/2406.02157
Neural networks can identify low-dimensional relevant structures within high-dimensional noisy data, yet our mathematical understanding of how they do so remains scarce. Here, we investigate the training dynamics of two-layer shallow neural networks
Externí odkaz:
http://arxiv.org/abs/2405.15459
This study explores the sample complexity for two-layer neural networks to learn a generalized linear target function under Stochastic Gradient Descent (SGD), focusing on the challenging regime where many flat directions are present at initialization
Externí odkaz:
http://arxiv.org/abs/2305.18502
Publikováno v:
Journal of Machine Learning Research 25 (2004) 1-65
We investigate theoretically how the features of a two-layer neural network adapt to the structure of the target function through a few large batch gradient descent steps, leading to improvement in the approximation capacity with respect to the initi
Externí odkaz:
http://arxiv.org/abs/2305.18270
Autor:
Stephan, Ludovic, Zhu, Yizhe
We consider the problem of low-rank rectangular matrix completion in the regime where the matrix $M$ of size $n\times m$ is ``long", i.e., the aspect ratio $m/n$ diverges to infinity. Such matrices are of particular interest in the study of tensor co
Externí odkaz:
http://arxiv.org/abs/2304.02077
Publikováno v:
Advances in Neural Information Processing Systems 36 (2023)
Let $(x_{i}, y_{i})_{i=1,\dots,n}$ denote independent samples from a general mixture distribution $\sum_{c\in\mathcal{C}}\rho_{c}P_{c}^{x}$, and consider the hypothesis class of generalized linear models $\hat{y} = F(\Theta^{\top}x)$. In this work, w
Externí odkaz:
http://arxiv.org/abs/2302.08933
In this manuscript we consider the problem of generalized linear estimation on Gaussian mixture data with labels given by a single-index model. Our first result is a sharp asymptotic expression for the test and training errors in the high-dimensional
Externí odkaz:
http://arxiv.org/abs/2302.08923
This manuscript investigates the one-pass stochastic gradient descent (SGD) dynamics of a two-layer neural network trained on Gaussian data and labels generated by a similar, though not necessarily identical, target function. We rigorously analyse th
Externí odkaz:
http://arxiv.org/abs/2302.05882
Publikováno v:
Physical Review E 109.3 (2024): 034305
While classical in many theoretical settings - and in particular in statistical physics-inspired works - the assumption of Gaussian i.i.d. input data is often perceived as a strong limitation in the context of statistics and machine learning. In this
Externí odkaz:
http://arxiv.org/abs/2205.13303