Výsledky vyhledávání - "Lotfi, Sanae"

Report

Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models

Autor: Lotfi, Sanae, Kuang, Yilun, Amos, Brandon, Goldblum, Micah, Finzi, Marc, Wilson, Andrew Gordon

Large language models (LLMs) with billions of parameters excel at predicting the next token in a sequence. Recent work computes non-vacuous compression-based generalization bounds for LLMs, but these bounds are vacuous for large models at the billion

Externí odkaz: http://arxiv.org/abs/2407.18158

Zobrazit plný text záznamu

Report

Non-Vacuous Generalization Bounds for Large Language Models

Autor: Lotfi, Sanae, Finzi, Marc, Kuang, Yilun, Rudner, Tim G. J., Goldblum, Micah, Wilson, Andrew Gordon

Modern language models can contain billions of parameters, raising the question of whether they can generalize beyond the training data or simply parrot their training corpora. We provide the first non-vacuous generalization bounds for pretrained lar

Externí odkaz: http://arxiv.org/abs/2312.17173

Zobrazit plný text záznamu

Report

PAC-Bayes Compression Bounds So Tight That They Can Explain Generalization

Autor: Lotfi, Sanae, Finzi, Marc, Kapoor, Sanyam, Potapczynski, Andres, Goldblum, Micah, Wilson, Andrew Gordon

While there has been progress in developing non-vacuous generalization bounds for deep neural networks, these bounds tend to be uninformative about why deep learning works. In this paper, we develop a compression approach based on quantizing neural n

Externí odkaz: http://arxiv.org/abs/2211.13609

Zobrazit plný text záznamu

Report

Bayesian Model Selection, the Marginal Likelihood, and Generalization

Autor: Lotfi, Sanae, Izmailov, Pavel, Benton, Gregory, Goldblum, Micah, Wilson, Andrew Gordon

How do we compare between hypotheses that are entirely consistent with observations? The marginal likelihood (aka Bayesian evidence), which represents the probability of generating our observations from a prior, provides a distinctive approach to thi

Externí odkaz: http://arxiv.org/abs/2202.11678

Zobrazit plný text záznamu

Report

Adaptive First- and Second-Order Algorithms for Large-Scale Machine Learning

Autor: Lotfi, Sanae, de Ruisselet, Tiphaine Bonniot, Orban, Dominique, Lodi, Andrea

In this paper, we consider both first- and second-order techniques to address continuous optimization problems arising in machine learning. In the first-order case, we propose a framework of transition from deterministic or semi-deterministic to stoc

Externí odkaz: http://arxiv.org/abs/2111.14761

Zobrazit plný text záznamu

Report

Dangers of Bayesian Model Averaging under Covariate Shift

Autor: Izmailov, Pavel, Nicholson, Patrick, Lotfi, Sanae, Wilson, Andrew Gordon

Approximate Bayesian inference for neural networks is considered a robust alternative to standard training, often providing good performance on out-of-distribution data. However, Bayesian neural networks (BNNs) with high-fidelity approximate inferenc

Externí odkaz: http://arxiv.org/abs/2106.11905

Zobrazit plný text záznamu

Report

Loss Surface Simplexes for Mode Connecting Volumes and Fast Ensembling

Autor: Benton, Gregory W., Maddox, Wesley J., Lotfi, Sanae, Wilson, Andrew Gordon

With a better understanding of the loss surfaces for multilayer networks, we can build more robust and accurate training procedures. Recently it was discovered that independently trained SGD solutions can be connected along one-dimensional paths of n

Externí odkaz: http://arxiv.org/abs/2102.13042

Zobrazit plný text záznamu

Report

Stochastic Damped L-BFGS with Controlled Norm of the Hessian Approximation

Autor: Lotfi, Sanae, de Ruisselet, Tiphaine Bonniot, Orban, Dominique, Lodi, Andrea

We propose a new stochastic variance-reduced damped L-BFGS algorithm, where we leverage estimates of bounds on the largest and smallest eigenvalues of the Hessian approximation to balance its quality and conditioning. Our algorithm, VARCHEN, draws fr

Externí odkaz: http://arxiv.org/abs/2012.05783

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání