Výsledky vyhledávání - "Poli, Michael"

Report

State-Free Inference of State-Space Models: The Transfer Function Approach

Autor: Parnichkun, Rom N., Massaroli, Stefano, Moro, Alessandro, Smith, Jimmy T. H., Hasani, Ramin, Lechner, Mathias, An, Qi, Ré, Christopher, Asama, Hajime, Ermon, Stefano, Suzuki, Taiji, Yamashita, Atsushi, Poli, Michael

We approach designing a state-space model for deep learning applications through its dual representation, the transfer function, and uncover a highly efficient sequence parallel inference algorithm that is state-free: unlike other proposed algorithms

Externí odkaz: http://arxiv.org/abs/2405.06147

Zobrazit plný text záznamu

Report

Mechanistic Design and Scaling of Hybrid Architectures

Autor: Poli, Michael, Thomas, Armin W, Nguyen, Eric, Ponnusamy, Pragaash, Deiseroth, Björn, Kersting, Kristian, Suzuki, Taiji, Hie, Brian, Ermon, Stefano, Ré, Christopher, Zhang, Ce, Massaroli, Stefano

The development of deep learning architectures is a resource-demanding process, due to a vast design space, long prototyping times, and high compute costs associated with at-scale model training and evaluation. We set out to simplify this process by

Externí odkaz: http://arxiv.org/abs/2403.17844

Zobrazit plný text záznamu

Report

Zoology: Measuring and Improving Recall in Efficient Language Models

Autor: Arora, Simran, Eyuboglu, Sabri, Timalsina, Aman, Johnson, Isys, Poli, Michael, Zou, James, Rudra, Atri, Ré, Christopher

Attention-free language models that combine gating and convolutions are growing in popularity due to their efficiency and increasingly competitive performance. To better understand these architectures, we pretrain a suite of 17 attention and "gated-c

Externí odkaz: http://arxiv.org/abs/2312.04927

Zobrazit plný text záznamu

Report

Laughing Hyena Distillery: Extracting Compact Recurrences From Convolutions

Autor: Massaroli, Stefano, Poli, Michael, Fu, Daniel Y., Kumbong, Hermann, Parnichkun, Rom N., Timalsina, Aman, Romero, David W., McIntyre, Quinn, Chen, Beidi, Rudra, Atri, Zhang, Ce, Re, Christopher, Ermon, Stefano, Bengio, Yoshua

Recent advances in attention-free sequence models rely on convolutions as alternatives to the attention operator at the core of Transformers. In particular, long convolution sequence models have achieved state-of-the-art performance in many domains,

Externí odkaz: http://arxiv.org/abs/2310.18780

Zobrazit plný text záznamu

Report

Learning Efficient Surrogate Dynamic Models with Graph Spline Networks

Autor: Hua, Chuanbo, Berto, Federico, Poli, Michael, Massaroli, Stefano, Park, Jinkyoo

While complex simulations of physical systems have been widely used in engineering and scientific computing, lowering their often prohibitive computational requirements has only recently been tackled by deep learning approaches. In this paper, we pre

Externí odkaz: http://arxiv.org/abs/2310.16397

Zobrazit plný text záznamu

Report

Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture

Autor: Fu, Daniel Y., Arora, Simran, Grogan, Jessica, Johnson, Isys, Eyuboglu, Sabri, Thomas, Armin W., Spector, Benjamin, Poli, Michael, Rudra, Atri, Ré, Christopher

Machine learning models are increasingly being scaled in both sequence length and model dimension to reach longer contexts and better performance. However, existing architectures such as Transformers scale quadratically along both these axes. We ask:

Externí odkaz: http://arxiv.org/abs/2310.12109

Zobrazit plný text záznamu

Report

HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution

Autor: Nguyen, Eric, Poli, Michael, Faizi, Marjan, Thomas, Armin, Birch-Sykes, Callum, Wornow, Michael, Patel, Aman, Rabideau, Clayton, Massaroli, Stefano, Bengio, Yoshua, Ermon, Stefano, Baccus, Stephen A., Ré, Chris

Genomic (DNA) sequences encode an enormous amount of information for gene regulation and protein synthesis. Similar to natural language models, researchers have proposed foundation models in genomics to learn generalizable features from unlabeled gen

Externí odkaz: http://arxiv.org/abs/2306.15794

Zobrazit plný text záznamu

Report

Ideal Abstractions for Decision-Focused Learning

Autor: Poli, Michael, Massaroli, Stefano, Ermon, Stefano, Wilder, Bryan, Horvitz, Eric

We present a methodology for formulating simplifying abstractions in machine learning systems by identifying and harnessing the utility structure of decisions. Machine learning tasks commonly involve high-dimensional output spaces (e.g., predictions

Externí odkaz: http://arxiv.org/abs/2303.17062

Zobrazit plný text záznamu

Report

Effectively Modeling Time Series with Simple Discrete State Spaces

Autor: Zhang, Michael, Saab, Khaled K., Poli, Michael, Dao, Tri, Goel, Karan, Ré, Christopher

Time series modeling is a well-established problem, which often requires that methods (1) expressively represent complicated dependencies, (2) forecast long horizons, and (3) efficiently train over long sequences. State-space models (SSMs) are classi

Externí odkaz: http://arxiv.org/abs/2303.09489

Zobrazit plný text záznamu

Report

Hyena Hierarchy: Towards Larger Convolutional Language Models

Autor: Poli, Michael, Massaroli, Stefano, Nguyen, Eric, Fu, Daniel Y., Dao, Tri, Baccus, Stephen, Bengio, Yoshua, Ermon, Stefano, Ré, Christopher

Recent advances in deep learning have relied heavily on the use of large Transformers due to their ability to learn at scale. However, the core building block of Transformers, the attention operator, exhibits quadratic cost in sequence length, limiti

Externí odkaz: http://arxiv.org/abs/2302.10866

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání