Výsledky vyhledávání

Report

Influence Functions for Scalable Data Attribution in Diffusion Models

Autor: Mlodozeniec, Bruno, Eschenhagen, Runa, Bae, Juhan, Immer, Alexander, Krueger, David, Turner, Richard

Diffusion models have led to significant advancements in generative modelling. Yet their widespread adoption poses challenges regarding data attribution and interpretability. In this paper, we aim to help address such challenges in diffusion models b

Externí odkaz: http://arxiv.org/abs/2410.13850

Zobrazit plný text záznamu

Report

What is Your Data Worth to GPT? LLM-Scale Data Valuation with Influence Functions

Autor: Choe, Sang Keun, Ahn, Hwijeen, Bae, Juhan, Zhao, Kewen, Kang, Minsoo, Chung, Youngseog, Pratapa, Adithya, Neiswanger, Willie, Strubell, Emma, Mitamura, Teruko, Schneider, Jeff, Hovy, Eduard, Grosse, Roger, Xing, Eric

Large language models (LLMs) are trained on a vast amount of human-written data, but data providers often remain uncredited. In response to this issue, data valuation (or data attribution), which quantifies the contribution or value of each data to t

Externí odkaz: http://arxiv.org/abs/2405.13954

Zobrazit plný text záznamu

Report

Training Data Attribution via Approximate Unrolled Differentiation

Autor: Bae, Juhan, Lin, Wu, Lorraine, Jonathan, Grosse, Roger

Many training data attribution (TDA) methods aim to estimate how a model's behavior would change if one or more data points were removed from the training set. Methods based on implicit differentiation, such as influence functions, can be made comput

Externí odkaz: http://arxiv.org/abs/2405.12186

Zobrazit plný text záznamu

Report

Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective

Autor: Lin, Wu, Dangel, Felix, Eschenhagen, Runa, Bae, Juhan, Turner, Richard E., Makhzani, Alireza

Adaptive gradient optimizers like Adam(W) are the default training algorithms for many deep learning architectures, such as transformers. Their diagonal preconditioner is based on the gradient outer product which is incorporated into the parameter up

Externí odkaz: http://arxiv.org/abs/2402.03496

Zobrazit plný text záznamu

Report

Using Large Language Models for Hyperparameter Optimization

Autor: Zhang, Michael R., Desai, Nishkrit, Bae, Juhan, Lorraine, Jonathan, Ba, Jimmy

This paper studies using foundational large language models (LLMs) to make decisions during hyperparameter optimization (HPO). Empirical evaluations demonstrate that in settings with constrained search budgets, LLMs can perform comparably or better t

Externí odkaz: http://arxiv.org/abs/2312.04528

Zobrazit plný text záznamu

Report

Studying Large Language Model Generalization with Influence Functions

Autor: Grosse, Roger, Bae, Juhan, Anil, Cem, Elhage, Nelson, Tamkin, Alex, Tajdini, Amirhossein, Steiner, Benoit, Li, Dustin, Durmus, Esin, Perez, Ethan, Hubinger, Evan, Lukošiūtė, Kamilė, Nguyen, Karina, Joseph, Nicholas, McCandlish, Sam, Kaplan, Jared, Bowman, Samuel R.

When trying to gain better visibility into a machine learning model in order to understand and mitigate the associated risks, a potentially valuable source of evidence is: which training examples most contribute to a given behavior? Influence functio

Externí odkaz: http://arxiv.org/abs/2308.03296

Zobrazit plný text záznamu

Report

Benchmarking Neural Network Training Algorithms

Training algorithms, broadly construed, are an essential part of every deep learning pipeline. Training algorithm improvements that speed up training across a wide variety of workloads (e.g., better update rules, tuning protocols, learning rate sched

Externí odkaz: http://arxiv.org/abs/2306.07179

Zobrazit plný text záznamu

Report

Efficient Parametric Approximations of Neural Network Function Space Distance

Autor: Dhawan, Nikita, Huang, Sicong, Bae, Juhan, Grosse, Roger

It is often useful to compactly summarize important properties of model parameters and training data so that they can be used later without storing and/or iterating over the entire dataset. As a specific case, we consider estimating the Function Spac

Externí odkaz: http://arxiv.org/abs/2302.03519

Zobrazit plný text záznamu

Report

Multi-Rate VAE: Train Once, Get the Full Rate-Distortion Curve

Autor: Bae, Juhan, Zhang, Michael R., Ruan, Michael, Wang, Eric, Hasegawa, So, Ba, Jimmy, Grosse, Roger

Variational autoencoders (VAEs) are powerful tools for learning latent representations of data used in a wide range of applications. In practice, VAEs usually require multiple training rounds to choose the amount of information the latent variable sh

Externí odkaz: http://arxiv.org/abs/2212.03905

Zobrazit plný text záznamu

Report

If Influence Functions are the Answer, Then What is the Question?

Autor: Bae, Juhan, Ng, Nathan, Lo, Alston, Ghassemi, Marzyeh, Grosse, Roger

Influence functions efficiently estimate the effect of removing a single training data point on a model's learned parameters. While influence estimates align well with leave-one-out retraining for linear models, recent works have shown this alignment

Externí odkaz: http://arxiv.org/abs/2209.05364

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání