Zobrazeno 1 - 8
of 8
pro vyhledávání: '"Lotfi, Sanae"'
Autor:
Lotfi, Sanae, Kuang, Yilun, Amos, Brandon, Goldblum, Micah, Finzi, Marc, Wilson, Andrew Gordon
Large language models (LLMs) with billions of parameters excel at predicting the next token in a sequence. Recent work computes non-vacuous compression-based generalization bounds for LLMs, but these bounds are vacuous for large models at the billion
Externí odkaz:
http://arxiv.org/abs/2407.18158
Autor:
Lotfi, Sanae, Finzi, Marc, Kuang, Yilun, Rudner, Tim G. J., Goldblum, Micah, Wilson, Andrew Gordon
Modern language models can contain billions of parameters, raising the question of whether they can generalize beyond the training data or simply parrot their training corpora. We provide the first non-vacuous generalization bounds for pretrained lar
Externí odkaz:
http://arxiv.org/abs/2312.17173
Autor:
Lotfi, Sanae, Finzi, Marc, Kapoor, Sanyam, Potapczynski, Andres, Goldblum, Micah, Wilson, Andrew Gordon
While there has been progress in developing non-vacuous generalization bounds for deep neural networks, these bounds tend to be uninformative about why deep learning works. In this paper, we develop a compression approach based on quantizing neural n
Externí odkaz:
http://arxiv.org/abs/2211.13609
How do we compare between hypotheses that are entirely consistent with observations? The marginal likelihood (aka Bayesian evidence), which represents the probability of generating our observations from a prior, provides a distinctive approach to thi
Externí odkaz:
http://arxiv.org/abs/2202.11678
In this paper, we consider both first- and second-order techniques to address continuous optimization problems arising in machine learning. In the first-order case, we propose a framework of transition from deterministic or semi-deterministic to stoc
Externí odkaz:
http://arxiv.org/abs/2111.14761
Approximate Bayesian inference for neural networks is considered a robust alternative to standard training, often providing good performance on out-of-distribution data. However, Bayesian neural networks (BNNs) with high-fidelity approximate inferenc
Externí odkaz:
http://arxiv.org/abs/2106.11905
With a better understanding of the loss surfaces for multilayer networks, we can build more robust and accurate training procedures. Recently it was discovered that independently trained SGD solutions can be connected along one-dimensional paths of n
Externí odkaz:
http://arxiv.org/abs/2102.13042
We propose a new stochastic variance-reduced damped L-BFGS algorithm, where we leverage estimates of bounds on the largest and smallest eigenvalues of the Hessian approximation to balance its quality and conditioning. Our algorithm, VARCHEN, draws fr
Externí odkaz:
http://arxiv.org/abs/2012.05783