Zobrazeno 1 - 10
of 8 454
pro vyhledávání: '"Finzi A."'
As large language models (LLMs) are increasingly relied on in AI systems, predicting when they make mistakes is crucial. While a great deal of work in the field uses internal representations to interpret model behavior, these representations are inac
Externí odkaz:
http://arxiv.org/abs/2501.01558
We introduce a novel, training-free method for sampling differentiable representations (diffreps) using pretrained diffusion models. Rather than merely mode-seeking, our method achieves sampling by "pulling back" the dynamics of the reverse-time proc
Externí odkaz:
http://arxiv.org/abs/2412.06981
Autor:
Potapczynski, Andres, Qiu, Shikai, Finzi, Marc, Ferri, Christopher, Chen, Zixi, Goldblum, Micah, Bruss, Bayan, De Sa, Christopher, Wilson, Andrew Gordon
Dense linear layers are the dominant computational bottleneck in large neural networks, presenting a critical need for more efficient alternatives. Previous efforts focused on a small number of hand-crafted structured matrices and neglected to invest
Externí odkaz:
http://arxiv.org/abs/2410.02117
Autor:
Lotfi, Sanae, Kuang, Yilun, Amos, Brandon, Goldblum, Micah, Finzi, Marc, Wilson, Andrew Gordon
Large language models (LLMs) with billions of parameters excel at predicting the next token in a sequence. Recent work computes non-vacuous compression-based generalization bounds for LLMs, but these bounds are vacuous for large models at the billion
Externí odkaz:
http://arxiv.org/abs/2407.18158
Dense linear layers are the dominant computational bottleneck in foundation models. Identifying more efficient alternatives to dense matrices has enormous potential for building more compute-efficient models, as exemplified by the success of convolut
Externí odkaz:
http://arxiv.org/abs/2406.06248
Autor:
Tuckute, Greta, Finzi, Dawn, Margalit, Eshed, Zylberberg, Joel, Chung, SueYeon, Fyshe, Alona, Fedorenko, Evelina, Kriegeskorte, Nikolaus, Yates, Jacob, Grill-Spector, Kalanit, Kar, Kohitij
In recent years, neuroscience has made significant progress in building large-scale artificial neural network (ANN) models of brain activity and behavior. However, there is no consensus on the most efficient ways to collect data and design experiment
Externí odkaz:
http://arxiv.org/abs/2401.03376
Autor:
Lotfi, Sanae, Finzi, Marc, Kuang, Yilun, Rudner, Tim G. J., Goldblum, Micah, Wilson, Andrew Gordon
Modern language models can contain billions of parameters, raising the question of whether they can generalize beyond the training data or simply parrot their training corpora. We provide the first non-vacuous generalization bounds for pretrained lar
Externí odkaz:
http://arxiv.org/abs/2312.17173
By encoding time series as a string of numerical digits, we can frame time series forecasting as next-token prediction in text. Developing this approach, we find that large language models (LLMs) such as GPT-3 and LLaMA-2 can surprisingly zero-shot e
Externí odkaz:
http://arxiv.org/abs/2310.07820
Many areas of machine learning and science involve large linear algebra problems, such as eigendecompositions, solving linear systems, computing matrix exponentials, and trace estimation. The matrices involved often have Kronecker, convolutional, blo
Externí odkaz:
http://arxiv.org/abs/2309.03060
Diffusion models are a class of probabilistic generative models that have been widely used as a prior for image processing tasks like text conditional generation and inpainting. We demonstrate that these models can be adapted to make predictions and
Externí odkaz:
http://arxiv.org/abs/2306.07526