Zobrazeno 1 - 10
of 36
pro vyhledávání: '"Finzi, Marc"'
Autor:
Potapczynski, Andres, Qiu, Shikai, Finzi, Marc, Ferri, Christopher, Chen, Zixi, Goldblum, Micah, Bruss, Bayan, De Sa, Christopher, Wilson, Andrew Gordon
Dense linear layers are the dominant computational bottleneck in large neural networks, presenting a critical need for more efficient alternatives. Previous efforts focused on a small number of hand-crafted structured matrices and neglected to invest
Externí odkaz:
http://arxiv.org/abs/2410.02117
Autor:
Lotfi, Sanae, Kuang, Yilun, Amos, Brandon, Goldblum, Micah, Finzi, Marc, Wilson, Andrew Gordon
Large language models (LLMs) with billions of parameters excel at predicting the next token in a sequence. Recent work computes non-vacuous compression-based generalization bounds for LLMs, but these bounds are vacuous for large models at the billion
Externí odkaz:
http://arxiv.org/abs/2407.18158
Dense linear layers are the dominant computational bottleneck in foundation models. Identifying more efficient alternatives to dense matrices has enormous potential for building more compute-efficient models, as exemplified by the success of convolut
Externí odkaz:
http://arxiv.org/abs/2406.06248
Autor:
Lotfi, Sanae, Finzi, Marc, Kuang, Yilun, Rudner, Tim G. J., Goldblum, Micah, Wilson, Andrew Gordon
Modern language models can contain billions of parameters, raising the question of whether they can generalize beyond the training data or simply parrot their training corpora. We provide the first non-vacuous generalization bounds for pretrained lar
Externí odkaz:
http://arxiv.org/abs/2312.17173
By encoding time series as a string of numerical digits, we can frame time series forecasting as next-token prediction in text. Developing this approach, we find that large language models (LLMs) such as GPT-3 and LLaMA-2 can surprisingly zero-shot e
Externí odkaz:
http://arxiv.org/abs/2310.07820
Many areas of machine learning and science involve large linear algebra problems, such as eigendecompositions, solving linear systems, computing matrix exponentials, and trace estimation. The matrices involved often have Kronecker, convolutional, blo
Externí odkaz:
http://arxiv.org/abs/2309.03060
Diffusion models are a class of probabilistic generative models that have been widely used as a prior for image processing tasks like text conditional generation and inpainting. We demonstrate that these models can be adapted to make predictions and
Externí odkaz:
http://arxiv.org/abs/2306.07526
Unlike conventional grid and mesh based methods for solving partial differential equations (PDEs), neural networks have the potential to break the curse of dimensionality, providing approximate solutions to problems where using classical solvers is d
Externí odkaz:
http://arxiv.org/abs/2304.14994
No free lunch theorems for supervised learning state that no learner can solve all problems or that all learners achieve exactly the same accuracy on average over a uniform distribution on learning problems. Accordingly, these theorems are often refe
Externí odkaz:
http://arxiv.org/abs/2304.05366
Autor:
Lotfi, Sanae, Finzi, Marc, Kapoor, Sanyam, Potapczynski, Andres, Goldblum, Micah, Wilson, Andrew Gordon
While there has been progress in developing non-vacuous generalization bounds for deep neural networks, these bounds tend to be uninformative about why deep learning works. In this paper, we develop a compression approach based on quantizing neural n
Externí odkaz:
http://arxiv.org/abs/2211.13609