Zobrazeno 1 - 10
of 24
pro vyhledávání: '"Golikov, Eugene"'
Autor:
Golikov, Eugene
We consider nonlinear networks as perturbations of linear ones. Based on this approach, we present novel generalization bounds that become non-vacuous for networks that are close to being linear. The main advantage over the previous works which propo
Externí odkaz:
http://arxiv.org/abs/2407.06765
A seminal work [Jacot et al., 2018] demonstrated that training a neural network under specific parameterization is equivalent to performing a particular kernel method as width goes to infinity. This equivalence opened a promising direction for applyi
Externí odkaz:
http://arxiv.org/abs/2208.13614
We study the loss surface of DNNs with $L_{2}$ regularization. We show that the loss in terms of the parameters can be reformulated into a loss in terms of the layerwise activations $Z_{\ell}$ of the training set. This reformulation reveals the dynam
Externí odkaz:
http://arxiv.org/abs/2205.15809
Autor:
Golikov, Eugene A.
These are the notes for the lectures that I was giving during Fall 2020 at the Moscow Institute of Physics and Technology (MIPT) and at the Yandex School of Data Analysis (YSDA). The notes cover some aspects of initialization, loss landscape, general
Externí odkaz:
http://arxiv.org/abs/2012.05760
Autor:
Golikov, Eugene A.
Recent research has been focused on two different approaches to studying neural networks training in the limit of infinite width (1) a mean-field (MF) and (2) a constant neural tangent kernel (NTK) approximations. These two approaches have different
Externí odkaz:
http://arxiv.org/abs/2006.06574
Autor:
Golikov, Eugene A.
Obtaining theoretical guarantees for neural networks training appears to be a hard problem in a general case. Recent research has been focused on studying this problem in the limit of infinite width and two different theories have been developed: a m
Externí odkaz:
http://arxiv.org/abs/2003.05884
Autor:
Das, Biswarup, Golikov, Eugene. A.
We prove that if an activation function satisfies some mild conditions and number of neurons in a two-layered fully connected neural network with this activation function is beyond a certain threshold, then gradient descent on quadratic loss function
Externí odkaz:
http://arxiv.org/abs/1911.05402
Autor:
Golikov, Eugene
Despite the huge empirical success of deep learning, theoretical understanding of neural networks learning process is still lacking. This is the reason, why some of its features seem "mysterious". We emphasize two mysteries of deep learning: generali
Externí odkaz:
http://arxiv.org/abs/1905.07187
Autor:
Golikov, Eugene, Kretov, Maksim
Conventional prior for Variational Auto-Encoder (VAE) is a Gaussian distribution. Recent works demonstrated that choice of prior distribution affects learning capacity of VAE models. We propose a general technique (embedding-reparameterization proced
Externí odkaz:
http://arxiv.org/abs/1812.02769
In natural language processing tasks performance of the models is often measured with some non-differentiable metric, such as BLEU score. To use efficient gradient-based methods for optimization, it is a common workaround to optimize some surrogate l
Externí odkaz:
http://arxiv.org/abs/1712.04708