Zobrazeno 1 - 10
of 275
pro vyhledávání: '"Knyazev Boris A."'
Autor:
Knyazev, Boris, Moudgil, Abhinav, Lajoie, Guillaume, Belilovsky, Eugene, Lacoste-Julien, Simon
Neural network training can be accelerated when a learnable update rule is used in lieu of classic adaptive optimizers (e.g. Adam). However, learnable update rules can be costly and unstable to train and use. A simpler recently proposed approach to a
Externí odkaz:
http://arxiv.org/abs/2409.04434
Generating novel molecules is challenging, with most representations leading to generative models producing many invalid molecules. Spanning Tree-based Graph Generation (STGG) is a promising approach to ensure the generation of valid molecules, outpe
Externí odkaz:
http://arxiv.org/abs/2407.09357
Autor:
Thérien, Benjamin, Joseph, Charles-Étienne, Knyazev, Boris, Oyallon, Edouard, Rish, Irina, Belilovsky, Eugene
Learned optimizers (LOs) can significantly reduce the wall-clock training time of neural networks, substantially reducing training costs. However, they often suffer from poor meta-generalization, especially when training networks larger than those se
Externí odkaz:
http://arxiv.org/abs/2406.00153
LoGAH: Predicting 774-Million-Parameter Transformers using Graph HyperNetworks with 1/100 Parameters
A good initialization of deep learning models is essential since it can help them converge better and faster. However, pretraining large models is unaffordable for many researchers, which makes a desired prediction for initial parameters more necessa
Externí odkaz:
http://arxiv.org/abs/2405.16287
Autor:
Kofinas, Miltiadis, Knyazev, Boris, Zhang, Yan, Chen, Yunlu, Burghouts, Gertjan J., Gavves, Efstratios, Snoek, Cees G. M., Zhang, David W.
Neural networks that process the parameters of other neural networks find applications in domains as diverse as classifying implicit neural representations, generating neural network weights, and predicting generalization errors. However, existing ap
Externí odkaz:
http://arxiv.org/abs/2403.12143
Autor:
Joseph, Charles-Étienne, Thérien, Benjamin, Moudgil, Abhinav, Knyazev, Boris, Belilovsky, Eugene
Communication-efficient variants of SGD, specifically local SGD, have received a great deal of interest in recent years. These approaches compute multiple gradient steps locally, that is on each worker, before averaging model parameters, helping reli
Externí odkaz:
http://arxiv.org/abs/2312.02204
Pretraining a neural network on a large dataset is becoming a cornerstone in machine learning that is within the reach of only a few communities with large-resources. We aim at an ambitious goal of democratizing pretraining. Towards that goal, we tra
Externí odkaz:
http://arxiv.org/abs/2303.04143
Autor:
Kukotenko V.D., Choporova Y.Y., Knyazev Boris A., Gerasimov V.V., Zhukavin R.K., Kovalevsky K.A.
Publikováno v:
EPJ Web of Conferences, Vol 195, p 06007 (2018)
Externí odkaz:
https://doaj.org/article/d29b092459ce47edaee64dac910506ba
In the last years, neural networks (NN) have evolved from laboratory environments to the state-of-the-art for many real-world problems. It was shown that NN models (i.e., their weights and biases) evolve on unique trajectories in weight space during
Externí odkaz:
http://arxiv.org/abs/2209.14764
Learning representations of neural network weights given a model zoo is an emerging and challenging area with many potential applications from model inspection, to neural architecture search or knowledge distillation. Recently, an autoencoder trained
Externí odkaz:
http://arxiv.org/abs/2209.14733