Zobrazeno 1 - 10
of 6 484
pro vyhledávání: '"A, Mondelli"'
A growing number of machine learning scenarios rely on knowledge distillation where one uses the output of a surrogate model as labels to supervise the training of a target model. In this work, we provide a sharp characterization of this process for
Externí odkaz:
http://arxiv.org/abs/2410.18837
Autor:
Bombari, Simone, Mondelli, Marco
Differentially private gradient descent (DP-GD) is a popular algorithm to train deep learning models with provable guarantees on the privacy of the training data. In the last decade, the problem of understanding its performance cost with respect to s
Externí odkaz:
http://arxiv.org/abs/2410.14787
Deep neural networks (DNNs) at convergence consistently represent the training data in the last layer via a highly symmetric geometric structure referred to as neural collapse. This empirical evidence has spurred a line of theoretical research aimed
Externí odkaz:
http://arxiv.org/abs/2410.04887
We consider a prototypical problem of Bayesian inference for a structured spiked model: a low-rank signal is corrupted by additive noise. While both information-theoretic and algorithmic limits are well understood when the noise is a Gaussian Wigner
Externí odkaz:
http://arxiv.org/abs/2405.20993
Deep neural networks (DNNs) exhibit a surprising structure in their final layer known as neural collapse (NC), and a growing body of works has currently investigated the propagation of neural collapse to earlier layers of DNNs -- a phenomenon called
Externí odkaz:
http://arxiv.org/abs/2405.14468
Autor:
Zhang, Yihan, Mondelli, Marco
We study the matrix denoising problem of estimating the singular vectors of a rank-$1$ signal corrupted by noise with both column and row correlations. Existing works are either unable to pinpoint the exact asymptotic estimation error or, when they d
Externí odkaz:
http://arxiv.org/abs/2405.13912
Deep Neural Collapse (DNC) refers to the surprisingly rigid structure of the data representations in the final layers of Deep Neural Networks (DNNs). Though the phenomenon has been measured in a variety of settings, its emergence is typically explain
Externí odkaz:
http://arxiv.org/abs/2402.13728
We introduce a novel concept of convergence for Markovian processes within Orlicz spaces, extending beyond the conventional approach associated with $L_p$ spaces. After showing that Markovian operators are contractive in Orlicz spaces, our key techni
Externí odkaz:
http://arxiv.org/abs/2402.11200
Autoencoders are a prominent model in many empirical branches of machine learning and lossy data compression. However, basic theoretical questions remain unanswered even in a shallow two-layer setting. In particular, to what degree does a shallow aut
Externí odkaz:
http://arxiv.org/abs/2402.05013
Autor:
Bombari, Simone, Mondelli, Marco
Understanding the reasons behind the exceptional success of transformers requires a better analysis of why attention layers are suitable for NLP tasks. In particular, such tasks require predictive models to capture contextual meaning which often depe
Externí odkaz:
http://arxiv.org/abs/2402.02969