Výsledky vyhledávání

Report

Predicting the Performance of Black-box LLMs through Self-Queries

Autor: Sam, Dylan, Finzi, Marc, Kolter, J. Zico

As large language models (LLMs) are increasingly relied on in AI systems, predicting when they make mistakes is crucial. While a great deal of work in the field uses internal representations to interpret model behavior, these representations are inac

Externí odkaz: http://arxiv.org/abs/2501.01558

Zobrazit plný text záznamu

Report

Diffusing Differentiable Representations

Autor: Savani, Yash, Finzi, Marc, Kolter, J. Zico

We introduce a novel, training-free method for sampling differentiable representations (diffreps) using pretrained diffusion models. Rather than merely mode-seeking, our method achieves sampling by "pulling back" the dynamics of the reverse-time proc

Externí odkaz: http://arxiv.org/abs/2412.06981

Zobrazit plný text záznamu

Report

Searching for Efficient Linear Layers over a Continuous Space of Structured Matrices

Autor: Potapczynski, Andres, Qiu, Shikai, Finzi, Marc, Ferri, Christopher, Chen, Zixi, Goldblum, Micah, Bruss, Bayan, De Sa, Christopher, Wilson, Andrew Gordon

Dense linear layers are the dominant computational bottleneck in large neural networks, presenting a critical need for more efficient alternatives. Previous efforts focused on a small number of hand-crafted structured matrices and neglected to invest

Externí odkaz: http://arxiv.org/abs/2410.02117

Zobrazit plný text záznamu

Report

Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models

Autor: Lotfi, Sanae, Kuang, Yilun, Amos, Brandon, Goldblum, Micah, Finzi, Marc, Wilson, Andrew Gordon

Large language models (LLMs) with billions of parameters excel at predicting the next token in a sequence. Recent work computes non-vacuous compression-based generalization bounds for LLMs, but these bounds are vacuous for large models at the billion

Externí odkaz: http://arxiv.org/abs/2407.18158

Zobrazit plný text záznamu

Report

Compute Better Spent: Replacing Dense Layers with Structured Matrices

Autor: Qiu, Shikai, Potapczynski, Andres, Finzi, Marc, Goldblum, Micah, Wilson, Andrew Gordon

Dense linear layers are the dominant computational bottleneck in foundation models. Identifying more efficient alternatives to dense matrices has enormous potential for building more compute-efficient models, as exemplified by the success of convolut

Externí odkaz: http://arxiv.org/abs/2406.06248

Zobrazit plný text záznamu

Report

How to optimize neuroscience data utilization and experiment design for advancing brain models of visual and linguistic cognition?

Autor: Tuckute, Greta, Finzi, Dawn, Margalit, Eshed, Zylberberg, Joel, Chung, SueYeon, Fyshe, Alona, Fedorenko, Evelina, Kriegeskorte, Nikolaus, Yates, Jacob, Grill-Spector, Kalanit, Kar, Kohitij

In recent years, neuroscience has made significant progress in building large-scale artificial neural network (ANN) models of brain activity and behavior. However, there is no consensus on the most efficient ways to collect data and design experiment

Externí odkaz: http://arxiv.org/abs/2401.03376

Zobrazit plný text záznamu

Report

Non-Vacuous Generalization Bounds for Large Language Models

Autor: Lotfi, Sanae, Finzi, Marc, Kuang, Yilun, Rudner, Tim G. J., Goldblum, Micah, Wilson, Andrew Gordon

Modern language models can contain billions of parameters, raising the question of whether they can generalize beyond the training data or simply parrot their training corpora. We provide the first non-vacuous generalization bounds for pretrained lar

Externí odkaz: http://arxiv.org/abs/2312.17173

Zobrazit plný text záznamu

Report

Large Language Models Are Zero-Shot Time Series Forecasters

Autor: Gruver, Nate, Finzi, Marc, Qiu, Shikai, Wilson, Andrew Gordon

By encoding time series as a string of numerical digits, we can frame time series forecasting as next-token prediction in text. Developing this approach, we find that large language models (LLMs) such as GPT-3 and LLaMA-2 can surprisingly zero-shot e

Externí odkaz: http://arxiv.org/abs/2310.07820

Zobrazit plný text záznamu

Report

CoLA: Exploiting Compositional Structure for Automatic and Efficient Numerical Linear Algebra

Autor: Potapczynski, Andres, Finzi, Marc, Pleiss, Geoff, Wilson, Andrew Gordon

Many areas of machine learning and science involve large linear algebra problems, such as eigendecompositions, solving linear systems, computing matrix exponentials, and trace estimation. The matrices involved often have Kronecker, convolutional, blo

Externí odkaz: http://arxiv.org/abs/2309.03060

Zobrazit plný text záznamu

Report

User-defined Event Sampling and Uncertainty Quantification in Diffusion Models for Physical Dynamical Systems

Autor: Finzi, Marc, Boral, Anudhyan, Wilson, Andrew Gordon, Sha, Fei, Zepeda-Núñez, Leonardo

Diffusion models are a class of probabilistic generative models that have been widely used as a prior for image processing tasks like text conditional generation and inpainting. We demonstrate that these models can be adapted to make predictions and

Externí odkaz: http://arxiv.org/abs/2306.07526

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání