Zobrazeno 1 - 10
of 34
pro vyhledávání: '"Garipov, Timur"'
Keyphrase selection is a challenging task in natural language processing that has a wide range of applications. Adapting existing supervised and unsupervised solutions for the Russian language faces several limitations due to the rich morphology of R
Externí odkaz:
http://arxiv.org/abs/2410.18040
High training costs of generative models and the need to fine-tune them for specific tasks have created a strong interest in model reuse and composition. A key challenge in composing iterative generative processes, such as GFlowNets and diffusion mod
Externí odkaz:
http://arxiv.org/abs/2309.16115
We study the problem of aligning the supports of distributions. Compared to the existing work on distribution alignment, support alignment does not require the densities to be matched. We propose symmetric support difference as a divergence measure t
Externí odkaz:
http://arxiv.org/abs/2203.08908
Adversarial training methods typically align distributions by solving two-player games. However, in most current formulations, even if the generator aligns perfectly with data, a sub-optimal discriminator can still drive the two apart. Absent additio
Externí odkaz:
http://arxiv.org/abs/2002.08621
We present MLRG Deep Curvature suite, a PyTorch-based, open-source package for analysis and visualisation of neural network curvature and loss landscape. Despite of providing rich information into properties of neural network and useful for a various
Externí odkaz:
http://arxiv.org/abs/1912.09656
Autor:
Izmailov, Pavel, Maddox, Wesley J., Kirichenko, Polina, Garipov, Timur, Vetrov, Dmitry, Wilson, Andrew Gordon
Bayesian inference was once a gold standard for learning with neural networks, providing accurate full predictive distributions and well calibrated uncertainty. However, scaling Bayesian inference techniques to deep neural networks is challenging due
Externí odkaz:
http://arxiv.org/abs/1907.07504
We propose SWA-Gaussian (SWAG), a simple, scalable, and general purpose approach for uncertainty representation and calibration in deep learning. Stochastic Weight Averaging (SWA), which computes the first moment of stochastic gradient descent (SGD)
Externí odkaz:
http://arxiv.org/abs/1902.02476
Deep neural networks are typically trained by optimizing a loss function with an SGD variant, in conjunction with a decaying learning rate, until convergence. We show that simple averaging of multiple points along the trajectory of SGD, with a cyclic
Externí odkaz:
http://arxiv.org/abs/1803.05407
The loss functions of deep neural networks are complex and their geometric properties are not well understood. We show that the optima of these complex loss functions are in fact connected by simple curves over which training and test accuracy are ne
Externí odkaz:
http://arxiv.org/abs/1802.10026
Autor:
Kochurov, Max, Garipov, Timur, Podoprikhin, Dmitry, Molchanov, Dmitry, Ashukha, Arsenii, Vetrov, Dmitry
In industrial machine learning pipelines, data often arrive in parts. Particularly in the case of deep neural networks, it may be too expensive to train the model from scratch each time, so one would rather use a previously learned model and the new
Externí odkaz:
http://arxiv.org/abs/1802.07329