Zobrazeno 1 - 10
of 26
pro vyhledávání: '"Balles, Lukas"'
Autor:
Blake, Charlie, Eichenberg, Constantin, Dean, Josef, Balles, Lukas, Prince, Luke Y., Deiseroth, Björn, Cruz-Salinas, Andres Felipe, Luschi, Carlo, Weinbach, Samuel, Orr, Douglas
The Maximal Update Parametrization ($\mu$P) aims to make the optimal hyperparameters (HPs) of a model independent of its size, allowing them to be swept using a cheap proxy model rather than the full-size target model. We present a new scheme, u-$\mu
Externí odkaz:
http://arxiv.org/abs/2407.17465
Recent Continual Learning (CL) methods have combined pretrained Transformers with prompt tuning, a parameter-efficient fine-tuning (PEFT) technique. We argue that the choice of prompt tuning in prior works was an undefended and unablated decision, wh
Externí odkaz:
http://arxiv.org/abs/2406.03216
With increasing scale in model and dataset size, the training of deep neural networks becomes a massive computational burden. One approach to speed up the training process is Selective Backprop. For this approach, we perform a forward pass to obtain
Externí odkaz:
http://arxiv.org/abs/2312.05021
Continual learning enables the incremental training of machine learning models on non-stationary data streams.While academic interest in the topic is high, there is little indication of the use of state-of-the-art continual learning algorithms in pra
Externí odkaz:
http://arxiv.org/abs/2304.12067
Autor:
Bohdal, Ondrej, Balles, Lukas, Wistuba, Martin, Ermis, Beyza, Archambeau, Cédric, Zappella, Giovanni
Hyperparameter optimization (HPO) and neural architecture search (NAS) are methods of choice to obtain the best-in-class machine learning models, but in practice they can be costly to run. When models are trained on large datasets, tuning them with H
Externí odkaz:
http://arxiv.org/abs/2207.06940
The goal of continual learning (CL) is to efficiently update a machine learning model with new data without forgetting previously-learned knowledge. Most widely-used CL methods rely on a rehearsal memory of data points to be reused while training on
Externí odkaz:
http://arxiv.org/abs/2203.14544
We devise a coreset selection method based on the idea of gradient matching: The gradients induced by the coreset should match, as closely as possible, those induced by the original training dataset. We evaluate the method in the context of continual
Externí odkaz:
http://arxiv.org/abs/2112.05025
Standard first-order stochastic optimization algorithms base their updates solely on the average mini-batch gradient, and it has been shown that tracking additional quantities such as the curvature can help de-sensitize common hyperparameters. Based
Externí odkaz:
http://arxiv.org/abs/2011.04803
Sign-based optimization methods have become popular in machine learning due to their favorable communication cost in distributed optimization and their surprisingly good performance in neural network training. Furthermore, they are closely connected
Externí odkaz:
http://arxiv.org/abs/2002.08056
Natural gradient descent, which preconditions a gradient descent update with the Fisher information matrix of the underlying statistical model, is a way to capture partial second-order information. Several highly visible works have advocated an appro
Externí odkaz:
http://arxiv.org/abs/1905.12558