Zobrazeno 1 - 10
of 61
pro vyhledávání: '"Poli, Michael"'
Autor:
Parnichkun, Rom N., Massaroli, Stefano, Moro, Alessandro, Smith, Jimmy T. H., Hasani, Ramin, Lechner, Mathias, An, Qi, Ré, Christopher, Asama, Hajime, Ermon, Stefano, Suzuki, Taiji, Yamashita, Atsushi, Poli, Michael
We approach designing a state-space model for deep learning applications through its dual representation, the transfer function, and uncover a highly efficient sequence parallel inference algorithm that is state-free: unlike other proposed algorithms
Externí odkaz:
http://arxiv.org/abs/2405.06147
Autor:
Poli, Michael, Thomas, Armin W, Nguyen, Eric, Ponnusamy, Pragaash, Deiseroth, Björn, Kersting, Kristian, Suzuki, Taiji, Hie, Brian, Ermon, Stefano, Ré, Christopher, Zhang, Ce, Massaroli, Stefano
The development of deep learning architectures is a resource-demanding process, due to a vast design space, long prototyping times, and high compute costs associated with at-scale model training and evaluation. We set out to simplify this process by
Externí odkaz:
http://arxiv.org/abs/2403.17844
Autor:
Arora, Simran, Eyuboglu, Sabri, Timalsina, Aman, Johnson, Isys, Poli, Michael, Zou, James, Rudra, Atri, Ré, Christopher
Attention-free language models that combine gating and convolutions are growing in popularity due to their efficiency and increasingly competitive performance. To better understand these architectures, we pretrain a suite of 17 attention and "gated-c
Externí odkaz:
http://arxiv.org/abs/2312.04927
Autor:
Massaroli, Stefano, Poli, Michael, Fu, Daniel Y., Kumbong, Hermann, Parnichkun, Rom N., Timalsina, Aman, Romero, David W., McIntyre, Quinn, Chen, Beidi, Rudra, Atri, Zhang, Ce, Re, Christopher, Ermon, Stefano, Bengio, Yoshua
Recent advances in attention-free sequence models rely on convolutions as alternatives to the attention operator at the core of Transformers. In particular, long convolution sequence models have achieved state-of-the-art performance in many domains,
Externí odkaz:
http://arxiv.org/abs/2310.18780
While complex simulations of physical systems have been widely used in engineering and scientific computing, lowering their often prohibitive computational requirements has only recently been tackled by deep learning approaches. In this paper, we pre
Externí odkaz:
http://arxiv.org/abs/2310.16397
Autor:
Fu, Daniel Y., Arora, Simran, Grogan, Jessica, Johnson, Isys, Eyuboglu, Sabri, Thomas, Armin W., Spector, Benjamin, Poli, Michael, Rudra, Atri, Ré, Christopher
Machine learning models are increasingly being scaled in both sequence length and model dimension to reach longer contexts and better performance. However, existing architectures such as Transformers scale quadratically along both these axes. We ask:
Externí odkaz:
http://arxiv.org/abs/2310.12109
Autor:
Nguyen, Eric, Poli, Michael, Faizi, Marjan, Thomas, Armin, Birch-Sykes, Callum, Wornow, Michael, Patel, Aman, Rabideau, Clayton, Massaroli, Stefano, Bengio, Yoshua, Ermon, Stefano, Baccus, Stephen A., Ré, Chris
Genomic (DNA) sequences encode an enormous amount of information for gene regulation and protein synthesis. Similar to natural language models, researchers have proposed foundation models in genomics to learn generalizable features from unlabeled gen
Externí odkaz:
http://arxiv.org/abs/2306.15794
We present a methodology for formulating simplifying abstractions in machine learning systems by identifying and harnessing the utility structure of decisions. Machine learning tasks commonly involve high-dimensional output spaces (e.g., predictions
Externí odkaz:
http://arxiv.org/abs/2303.17062
Time series modeling is a well-established problem, which often requires that methods (1) expressively represent complicated dependencies, (2) forecast long horizons, and (3) efficiently train over long sequences. State-space models (SSMs) are classi
Externí odkaz:
http://arxiv.org/abs/2303.09489
Autor:
Poli, Michael, Massaroli, Stefano, Nguyen, Eric, Fu, Daniel Y., Dao, Tri, Baccus, Stephen, Bengio, Yoshua, Ermon, Stefano, Ré, Christopher
Recent advances in deep learning have relied heavily on the use of large Transformers due to their ability to learn at scale. However, the core building block of Transformers, the attention operator, exhibits quadratic cost in sequence length, limiti
Externí odkaz:
http://arxiv.org/abs/2302.10866