Zobrazeno 1 - 10
of 342
pro vyhledávání: '"Lu, Yue M."'
A key property of neural networks is their capacity of adapting to data during training. Yet, our current mathematical understanding of feature learning and its relationship to generalization remain limited. In this work, we provide a random matrix a
Externí odkaz:
http://arxiv.org/abs/2410.18938
Transformers have a remarkable ability to learn and execute tasks based on examples provided within the input itself, without explicit prior training. It has been argued that this capability, known as in-context learning (ICL), is a cornerstone of Tr
Externí odkaz:
http://arxiv.org/abs/2405.11751
Recent advances in machine learning have been achieved by using overparametrized models trained until near interpolation of the training data. It was shown, e.g., through the double descent phenomenon, that the number of parameters is a poor proxy fo
Externí odkaz:
http://arxiv.org/abs/2403.08160
Motivated by the recent application of approximate message passing (AMP) to the analysis of convex optimizations in multi-class classifications [Loureiro, et. al., 2021], we present a convergence analysis of AMP dynamics with non-separable multivaria
Externí odkaz:
http://arxiv.org/abs/2402.08676
Autor:
Cui, Hugo, Pesce, Luca, Dandi, Yatin, Krzakala, Florent, Lu, Yue M., Zdeborová, Lenka, Loureiro, Bruno
Publikováno v:
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:9662-9695, 2024
In this manuscript, we investigate the problem of how two-layer neural networks learn features from data, and improve over the kernel regime, after being trained with a single gradient descent step. Leveraging the insight from (Ba et al., 2022), we m
Externí odkaz:
http://arxiv.org/abs/2402.04980
We consider certain large random matrices, called random inner-product kernel matrices, which are essentially given by a nonlinear function $f$ applied entrywise to a sample-covariance matrix, $f(X^TX)$, where $X \in \mathbb{R}^{d \times N}$ is rando
Externí odkaz:
http://arxiv.org/abs/2310.18280
It has been observed that the performances of many high-dimensional estimation problems are universal with respect to underlying sensing (or design) matrices. Specifically, matrices with markedly different constructions seem to achieve identical perf
Externí odkaz:
http://arxiv.org/abs/2208.02753
As modern machine learning models continue to advance the computational frontier, it has become increasingly important to develop precise estimates for expected performance improvements under different model and data scaling regimes. Currently, theor
Externí odkaz:
http://arxiv.org/abs/2205.14846
Autor:
Hu, Hong, Lu, Yue M.
The generalization performance of kernel ridge regression (KRR) exhibits a multi-phased pattern that crucially depends on the scaling relationship between the sample size $n$ and the underlying dimension $d$. This phenomenon is due to the fact that K
Externí odkaz:
http://arxiv.org/abs/2205.06798
Autor:
Lu, Yue M., Yau, Horng-Tzer
We investigate random matrices whose entries are obtained by applying a nonlinear kernel function to pairwise inner products between $n$ independent data vectors, drawn uniformly from the unit sphere in $\mathbb{R}^d$. This study is motivated by appl
Externí odkaz:
http://arxiv.org/abs/2205.06308