Výsledky vyhledávání - "Golikov, Eugene"

Report

A Generalization Bound for Nearly-Linear Networks

Autor: Golikov, Eugene

We consider nonlinear networks as perturbations of linear ones. Based on this approach, we present novel generalization bounds that become non-vacuous for networks that are close to being linear. The main advantage over the previous works which propo

Externí odkaz: http://arxiv.org/abs/2407.06765

Zobrazit plný text záznamu

Report

Neural Tangent Kernel: A Survey

Autor: Golikov, Eugene, Pokonechnyy, Eduard, Korviakov, Vladimir

A seminal work [Jacot et al., 2018] demonstrated that training a neural network under specific parameterization is equivalent to performing a particular kernel method as width goes to infinity. This equivalence opened a promising direction for applyi

Externí odkaz: http://arxiv.org/abs/2208.13614

Zobrazit plný text záznamu

Report

Feature Learning in $L_{2}$-regularized DNNs: Attraction/Repulsion and Sparsity

Autor: Jacot, Arthur, Golikov, Eugene, Hongler, Clément, Gabriel, Franck

We study the loss surface of DNNs with $L_{2}$ regularization. We show that the loss in terms of the parameters can be reformulated into a loss in terms of the layerwise activations $Z_{\ell}$ of the training set. This reformulation reveals the dynam

Externí odkaz: http://arxiv.org/abs/2205.15809

Zobrazit plný text záznamu

Report

Notes on Deep Learning Theory

Autor: Golikov, Eugene A.

These are the notes for the lectures that I was giving during Fall 2020 at the Moscow Institute of Physics and Technology (MIPT) and at the Yandex School of Data Analysis (YSDA). The notes cover some aspects of initialization, loss landscape, general

Externí odkaz: http://arxiv.org/abs/2012.05760

Zobrazit plný text záznamu

Report

Dynamically Stable Infinite-Width Limits of Neural Classifiers

Autor: Golikov, Eugene A.

Recent research has been focused on two different approaches to studying neural networks training in the limit of infinite width (1) a mean-field (MF) and (2) a constant neural tangent kernel (NTK) approximations. These two approaches have different

Externí odkaz: http://arxiv.org/abs/2006.06574

Zobrazit plný text záznamu

Report

Towards a General Theory of Infinite-Width Limits of Neural Classifiers

Autor: Golikov, Eugene A.

Obtaining theoretical guarantees for neural networks training appears to be a hard problem in a general case. Recent research has been focused on studying this problem in the limit of infinite width and two different theories have been developed: a m

Externí odkaz: http://arxiv.org/abs/2003.05884

Zobrazit plný text záznamu

Report

Quadratic number of nodes is sufficient to learn a dataset via gradient descent

Autor: Das, Biswarup, Golikov, Eugene. A.

We prove that if an activation function satisfies some mild conditions and number of neurons in a two-layered fully connected neural network with this activation function is beyond a certain threshold, then gradient descent on quadratic loss function

Externí odkaz: http://arxiv.org/abs/1911.05402

Zobrazit plný text záznamu

Report

An Essay on Optimization Mystery of Deep Learning

Autor: Golikov, Eugene

Despite the huge empirical success of deep learning, theoretical understanding of neural networks learning process is still lacking. This is the reason, why some of its features seem "mysterious". We emphasize two mysteries of deep learning: generali

Externí odkaz: http://arxiv.org/abs/1905.07187

Zobrazit plný text záznamu

Report

Embedding-reparameterization procedure for manifold-valued latent variables in generative models

Autor: Golikov, Eugene, Kretov, Maksim

Conventional prior for Variational Auto-Encoder (VAE) is a Gaussian distribution. Recent works demonstrated that choice of prior distribution affects learning capacity of VAE models. We propose a general technique (embedding-reparameterization proced

Externí odkaz: http://arxiv.org/abs/1812.02769

Zobrazit plný text záznamu

Report

Differentiable lower bound for expected BLEU score

Autor: Zhukov, Vlad, Golikov, Eugene, Kretov, Maksim

In natural language processing tasks performance of the models is often measured with some non-differentiable metric, such as BLEU score. To use efficient gradient-based methods for optimization, it is a common workaround to optimize some surrogate l

Externí odkaz: http://arxiv.org/abs/1712.04708

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání