Zobrazeno 1 - 10
of 83
pro vyhledávání: '"Gu, Albert"'
Data-driven techniques have emerged as a promising alternative to traditional numerical methods for solving partial differential equations (PDEs). These techniques frequently offer a better trade-off between computational cost and accuracy for many P
Externí odkaz:
http://arxiv.org/abs/2409.02313
Transformer architectures have become a dominant paradigm for domains like language modeling but suffer in many inference settings due to their quadratic-time self-attention. Recently proposed subquadratic architectures, such as Mamba, have shown pro
Externí odkaz:
http://arxiv.org/abs/2408.10189
A wide array of sequence models are built on a framework modeled after Transformers, comprising alternating sequence mixer and channel mixer layers. This paper studies a unifying matrix mixer view of sequence mixers that can be conceptualized as a li
Externí odkaz:
http://arxiv.org/abs/2407.09941
Autor:
Waleffe, Roger, Byeon, Wonmin, Riach, Duncan, Norick, Brandon, Korthikanti, Vijay, Dao, Tri, Gu, Albert, Hatamizadeh, Ali, Singh, Sudhakar, Narayanan, Deepak, Kulshreshtha, Garvit, Singh, Vartika, Casper, Jared, Kautz, Jan, Shoeybi, Mohammad, Catanzaro, Bryan
Selective state-space models (SSMs) like Mamba overcome some of the shortcomings of Transformers, such as quadratic computational complexity with sequence length and large inference-time memory requirements from the key-value cache. Moreover, recent
Externí odkaz:
http://arxiv.org/abs/2406.07887
Autor:
Dao, Tri, Gu, Albert
While Transformers have been the main architecture behind deep learning's success in language modeling, state-space models (SSMs) such as Mamba have recently been shown to match or outperform Transformers at small to medium scale. We show that these
Externí odkaz:
http://arxiv.org/abs/2405.21060
Large-scale sequence modeling has sparked rapid advances that now extend into biology and genomics. However, modeling genomic sequences introduces challenges such as the need to model long-range token interactions, the effects of upstream and downstr
Externí odkaz:
http://arxiv.org/abs/2403.03234
Autor:
De, Soham, Smith, Samuel L., Fernando, Anushan, Botev, Aleksandar, Cristian-Muraru, George, Gu, Albert, Haroun, Ruba, Berrada, Leonard, Chen, Yutian, Srinivasan, Srivatsan, Desjardins, Guillaume, Doucet, Arnaud, Budden, David, Teh, Yee Whye, Pascanu, Razvan, De Freitas, Nando, Gulcehre, Caglar
Recurrent neural networks (RNNs) have fast inference and scale efficiently on long sequences, but they are difficult to train and hard to scale. We propose Hawk, an RNN with gated linear recurrences, and Griffin, a hybrid model that mixes gated linea
Externí odkaz:
http://arxiv.org/abs/2402.19427
Autor:
Gu, Albert, Dao, Tri
Foundation models, now powering most of the exciting applications in deep learning, are almost universally based on the Transformer architecture and its core attention module. Many subquadratic-time architectures such as linear attention, gated convo
Externí odkaz:
http://arxiv.org/abs/2312.00752
Online speech recognition, where the model only accesses context to the left, is an important and challenging use case for ASR systems. In this work, we investigate augmenting neural encoders for online ASR by incorporating structured state-space seq
Externí odkaz:
http://arxiv.org/abs/2309.08551
Autor:
Orvieto, Antonio, Smith, Samuel L, Gu, Albert, Fernando, Anushan, Gulcehre, Caglar, Pascanu, Razvan, De, Soham
Recurrent Neural Networks (RNNs) offer fast inference on long sequences but are hard to optimize and slow to train. Deep state-space models (SSMs) have recently been shown to perform remarkably well on long sequence modeling tasks, and have the added
Externí odkaz:
http://arxiv.org/abs/2303.06349