Výsledky vyhledávání - "Balles, Lukas"

Report

u-$\mu$P: The Unit-Scaled Maximal Update Parametrization

Autor: Blake, Charlie, Eichenberg, Constantin, Dean, Josef, Balles, Lukas, Prince, Luke Y., Deiseroth, Björn, Cruz-Salinas, Andres Felipe, Luschi, Carlo, Weinbach, Samuel, Orr, Douglas

The Maximal Update Parametrization ($\mu$P) aims to make the optimal hyperparameters (HPs) of a model independent of its size, allowing them to be swept using a cheap proxy model rather than the full-size target model. We present a new scheme, u-$\mu

Externí odkaz: http://arxiv.org/abs/2407.17465

Zobrazit plný text záznamu

Report

Choice of PEFT Technique in Continual Learning: Prompt Tuning is Not All You Need

Autor: Wistuba, Martin, Sivaprasad, Prabhu Teja, Balles, Lukas, Zappella, Giovanni

Recent Continual Learning (CL) methods have combined pretrained Transformers with prompt tuning, a parameter-efficient fine-tuning (PEFT) technique. We argue that the choice of prompt tuning in prior works was an undefended and unablated decision, wh

Externí odkaz: http://arxiv.org/abs/2406.03216

Zobrazit plný text záznamu

Report

A Negative Result on Gradient Matching for Selective Backprop

Autor: Balles, Lukas, Archambeau, Cedric, Zappella, Giovanni

With increasing scale in model and dataset size, the training of deep neural networks becomes a massive computational burden. One approach to speed up the training process is Selective Backprop. For this approach, we perform a forward pass to obtain

Externí odkaz: http://arxiv.org/abs/2312.05021

Zobrazit plný text záznamu

Report

Renate: A Library for Real-World Continual Learning

Autor: Wistuba, Martin, Ferianc, Martin, Balles, Lukas, Archambeau, Cedric, Zappella, Giovanni

Continual learning enables the incremental training of machine learning models on non-stationary data streams.While academic interest in the topic is high, there is little indication of the use of state-of-the-art continual learning algorithms in pra

Externí odkaz: http://arxiv.org/abs/2304.12067

Zobrazit plný text záznamu

Report

PASHA: Efficient HPO and NAS with Progressive Resource Allocation

Autor: Bohdal, Ondrej, Balles, Lukas, Wistuba, Martin, Ermis, Beyza, Archambeau, Cédric, Zappella, Giovanni

Hyperparameter optimization (HPO) and neural architecture search (NAS) are methods of choice to obtain the best-in-class machine learning models, but in practice they can be costly to run. When models are trained on large datasets, tuning them with H

Externí odkaz: http://arxiv.org/abs/2207.06940

Zobrazit plný text záznamu

Report

Gradient-Matching Coresets for Rehearsal-Based Continual Learning

Autor: Balles, Lukas, Zappella, Giovanni, Archambeau, Cédric

The goal of continual learning (CL) is to efficiently update a machine learning model with new data without forgetting previously-learned knowledge. Most widely-used CL methods rely on a rehearsal memory of data points to be reused while training on

Externí odkaz: http://arxiv.org/abs/2203.14544

Zobrazit plný text záznamu

Report

Gradient-matching coresets for continual learning

Autor: Balles, Lukas, Zappella, Giovanni, Archambeau, Cédric

We devise a coreset selection method based on the idea of gradient matching: The gradients induced by the coreset should match, as closely as possible, those induced by the original training dataset. We evaluate the method in the context of continual

Externí odkaz: http://arxiv.org/abs/2112.05025

Zobrazit plný text záznamu

Report

Self-Tuning Stochastic Optimization with Curvature-Aware Gradient Filtering

Autor: Chen, Ricky T. Q., Choi, Dami, Balles, Lukas, Duvenaud, David, Hennig, Philipp

Standard first-order stochastic optimization algorithms base their updates solely on the average mini-batch gradient, and it has been shown that tracking additional quantities such as the curvature can help de-sensitize common hyperparameters. Based

Externí odkaz: http://arxiv.org/abs/2011.04803

Zobrazit plný text záznamu

Report

The Geometry of Sign Gradient Descent

Autor: Balles, Lukas, Pedregosa, Fabian, Roux, Nicolas Le

Sign-based optimization methods have become popular in machine learning due to their favorable communication cost in distributed optimization and their surprisingly good performance in neural network training. Furthermore, they are closely connected

Externí odkaz: http://arxiv.org/abs/2002.08056

Zobrazit plný text záznamu

Report

Limitations of the Empirical Fisher Approximation for Natural Gradient Descent

Autor: Kunstner, Frederik, Balles, Lukas, Hennig, Philipp

Natural gradient descent, which preconditions a gradient descent update with the Fisher information matrix of the underlying statistical model, is a way to capture partial second-order information. Several highly visible works have advocated an appro

Externí odkaz: http://arxiv.org/abs/1905.12558

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání