Výsledky vyhledávání

Report

Searching for Efficient Linear Layers over a Continuous Space of Structured Matrices

Autor: Potapczynski, Andres, Qiu, Shikai, Finzi, Marc, Ferri, Christopher, Chen, Zixi, Goldblum, Micah, Bruss, Bayan, De Sa, Christopher, Wilson, Andrew Gordon

Dense linear layers are the dominant computational bottleneck in large neural networks, presenting a critical need for more efficient alternatives. Previous efforts focused on a small number of hand-crafted structured matrices and neglected to invest

Externí odkaz: http://arxiv.org/abs/2410.02117

Zobrazit plný text záznamu

Report

Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models

Autor: Lotfi, Sanae, Kuang, Yilun, Amos, Brandon, Goldblum, Micah, Finzi, Marc, Wilson, Andrew Gordon

Large language models (LLMs) with billions of parameters excel at predicting the next token in a sequence. Recent work computes non-vacuous compression-based generalization bounds for LLMs, but these bounds are vacuous for large models at the billion

Externí odkaz: http://arxiv.org/abs/2407.18158

Zobrazit plný text záznamu

Report

Compute Better Spent: Replacing Dense Layers with Structured Matrices

Autor: Qiu, Shikai, Potapczynski, Andres, Finzi, Marc, Goldblum, Micah, Wilson, Andrew Gordon

Dense linear layers are the dominant computational bottleneck in foundation models. Identifying more efficient alternatives to dense matrices has enormous potential for building more compute-efficient models, as exemplified by the success of convolut

Externí odkaz: http://arxiv.org/abs/2406.06248

Zobrazit plný text záznamu

Report

Non-Vacuous Generalization Bounds for Large Language Models

Autor: Lotfi, Sanae, Finzi, Marc, Kuang, Yilun, Rudner, Tim G. J., Goldblum, Micah, Wilson, Andrew Gordon

Modern language models can contain billions of parameters, raising the question of whether they can generalize beyond the training data or simply parrot their training corpora. We provide the first non-vacuous generalization bounds for pretrained lar

Externí odkaz: http://arxiv.org/abs/2312.17173

Zobrazit plný text záznamu

Report

Large Language Models Are Zero-Shot Time Series Forecasters

Autor: Gruver, Nate, Finzi, Marc, Qiu, Shikai, Wilson, Andrew Gordon

By encoding time series as a string of numerical digits, we can frame time series forecasting as next-token prediction in text. Developing this approach, we find that large language models (LLMs) such as GPT-3 and LLaMA-2 can surprisingly zero-shot e

Externí odkaz: http://arxiv.org/abs/2310.07820

Zobrazit plný text záznamu

Report

CoLA: Exploiting Compositional Structure for Automatic and Efficient Numerical Linear Algebra

Autor: Potapczynski, Andres, Finzi, Marc, Pleiss, Geoff, Wilson, Andrew Gordon

Many areas of machine learning and science involve large linear algebra problems, such as eigendecompositions, solving linear systems, computing matrix exponentials, and trace estimation. The matrices involved often have Kronecker, convolutional, blo

Externí odkaz: http://arxiv.org/abs/2309.03060

Zobrazit plný text záznamu

Report

User-defined Event Sampling and Uncertainty Quantification in Diffusion Models for Physical Dynamical Systems

Autor: Finzi, Marc, Boral, Anudhyan, Wilson, Andrew Gordon, Sha, Fei, Zepeda-Núñez, Leonardo

Diffusion models are a class of probabilistic generative models that have been widely used as a prior for image processing tasks like text conditional generation and inpainting. We demonstrate that these models can be adapted to make predictions and

Externí odkaz: http://arxiv.org/abs/2306.07526

Zobrazit plný text záznamu

Report

A Stable and Scalable Method for Solving Initial Value PDEs with Neural Networks

Autor: Finzi, Marc, Potapczynski, Andres, Choptuik, Matthew, Wilson, Andrew Gordon

Unlike conventional grid and mesh based methods for solving partial differential equations (PDEs), neural networks have the potential to break the curse of dimensionality, providing approximate solutions to problems where using classical solvers is d

Externí odkaz: http://arxiv.org/abs/2304.14994

Zobrazit plný text záznamu

Report

The No Free Lunch Theorem, Kolmogorov Complexity, and the Role of Inductive Biases in Machine Learning

Autor: Goldblum, Micah, Finzi, Marc, Rowan, Keefer, Wilson, Andrew Gordon

No free lunch theorems for supervised learning state that no learner can solve all problems or that all learners achieve exactly the same accuracy on average over a uniform distribution on learning problems. Accordingly, these theorems are often refe

Externí odkaz: http://arxiv.org/abs/2304.05366

Zobrazit plný text záznamu

Report

PAC-Bayes Compression Bounds So Tight That They Can Explain Generalization

Autor: Lotfi, Sanae, Finzi, Marc, Kapoor, Sanyam, Potapczynski, Andres, Goldblum, Micah, Wilson, Andrew Gordon

While there has been progress in developing non-vacuous generalization bounds for deep neural networks, these bounds tend to be uninformative about why deep learning works. In this paper, we develop a compression approach based on quantizing neural n

Externí odkaz: http://arxiv.org/abs/2211.13609

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání