Zobrazeno 1 - 10
of 69
pro vyhledávání: '"Yarotsky, Dmitry"'
Autor:
Yarotsky, Dmitry, Velikanov, Maksim
An important open problem is the theoretically feasible acceleration of mini-batch SGD-type algorithms on quadratic problems with power-law spectrum. In the non-stochastic setting, the optimal exponent $\xi$ in the loss convergence $L_t\sim C_Lt^{-\x
Externí odkaz:
http://arxiv.org/abs/2410.04228
The asymptotically precise estimation of the generalization of kernel methods has recently received attention due to the parallels between neural networks and their associated kernels. However, prior works derive such estimates for training by kernel
Externí odkaz:
http://arxiv.org/abs/2403.11696
Autor:
Yarotsky, Dmitry
We explore the theoretical possibility of learning $d$-dimensional targets with $W$-parameter models by gradient flow (GF) when $W
Externí odkaz:
http://arxiv.org/abs/2402.17089
Autor:
Yarotsky, Dmitry
By universal formulas we understand parameterized analytic expressions that have a fixed complexity, but nevertheless can approximate any continuous function on a compact set. There exist various examples of such formulas, including some in the form
Externí odkaz:
http://arxiv.org/abs/2311.03910
Mini-batch SGD with momentum is a fundamental algorithm for learning large predictive models. In this paper we develop a new analytic framework to analyze noise-averaged properties of mini-batch SGD for linear models at constant learning rates, momen
Externí odkaz:
http://arxiv.org/abs/2206.11124
Autor:
Velikanov, Maksim, Kail, Roman, Anokhin, Ivan, Vashurin, Roman, Panov, Maxim, Zaytsev, Alexey, Yarotsky, Dmitry
A memory efficient approach to ensembling neural networks is to share most weights among the ensembled models by means of a single reference network. We refer to this strategy as Embedded Ensembling (EE); its particular examples are BatchEnsembles an
Externí odkaz:
http://arxiv.org/abs/2202.12297
Autor:
Velikanov, Maksim, Yarotsky, Dmitry
Performance of optimization on quadratic problems sensitively depends on the low-lying part of the spectrum. For large (effectively infinite-dimensional) problems, this part of the spectrum can often be naturally represented or approximated by power
Externí odkaz:
http://arxiv.org/abs/2202.00992
Autor:
Velikanov, Maksim, Yarotsky, Dmitry
Current theoretical results on optimization trajectories of neural networks trained by gradient descent typically have the form of rigorous but potentially loose bounds on the loss values. In the present work we take a different approach and show tha
Externí odkaz:
http://arxiv.org/abs/2105.00507
Autor:
Yarotsky, Dmitry
We call a finite family of activation functions superexpressive if any multivariate continuous function can be approximated by a neural network that uses these activations and has a fixed architecture only depending on the number of input variables (
Externí odkaz:
http://arxiv.org/abs/2102.10911
Autor:
Anokhin, Ivan, Yarotsky, Dmitry
Recent research shows that sublevel sets of the loss surfaces of overparameterized networks are connected, exactly or approximately. We describe and compare experimentally a panel of methods used to connect two low-loss points by a low-loss curve on
Externí odkaz:
http://arxiv.org/abs/2008.00741