Výsledky vyhledávání

Report

On the Benefits of Memory for Modeling Time-Dependent PDEs

Autor: Ruiz, Ricardo Buitrago, Marwah, Tanya, Gu, Albert, Risteski, Andrej

Data-driven techniques have emerged as a promising alternative to traditional numerical methods for solving partial differential equations (PDEs). These techniques frequently offer a better trade-off between computational cost and accuracy for many P

Externí odkaz: http://arxiv.org/abs/2409.02313

Zobrazit plný text záznamu

Report

Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models

Autor: Bick, Aviv, Li, Kevin Y., Xing, Eric P., Kolter, J. Zico, Gu, Albert

Transformer architectures have become a dominant paradigm for domains like language modeling but suffer in many inference settings due to their quadratic-time self-attention. Recently proposed subquadratic architectures, such as Mamba, have shown pro

Externí odkaz: http://arxiv.org/abs/2408.10189

Zobrazit plný text záznamu

Report

Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers

Autor: Hwang, Sukjun, Lahoti, Aakash, Dao, Tri, Gu, Albert

A wide array of sequence models are built on a framework modeled after Transformers, comprising alternating sequence mixer and channel mixer layers. This paper studies a unifying matrix mixer view of sequence mixers that can be conceptualized as a li

Externí odkaz: http://arxiv.org/abs/2407.09941

Zobrazit plný text záznamu

Report

An Empirical Study of Mamba-based Language Models

Autor: Waleffe, Roger, Byeon, Wonmin, Riach, Duncan, Norick, Brandon, Korthikanti, Vijay, Dao, Tri, Gu, Albert, Hatamizadeh, Ali, Singh, Sudhakar, Narayanan, Deepak, Kulshreshtha, Garvit, Singh, Vartika, Casper, Jared, Kautz, Jan, Shoeybi, Mohammad, Catanzaro, Bryan

Selective state-space models (SSMs) like Mamba overcome some of the shortcomings of Transformers, such as quadratic computational complexity with sequence length and large inference-time memory requirements from the key-value cache. Moreover, recent

Externí odkaz: http://arxiv.org/abs/2406.07887

Zobrazit plný text záznamu

Report

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Autor: Dao, Tri, Gu, Albert

While Transformers have been the main architecture behind deep learning's success in language modeling, state-space models (SSMs) such as Mamba have recently been shown to match or outperform Transformers at small to medium scale. We show that these

Externí odkaz: http://arxiv.org/abs/2405.21060

Zobrazit plný text záznamu

Report

Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling

Autor: Schiff, Yair, Kao, Chia-Hsiang, Gokaslan, Aaron, Dao, Tri, Gu, Albert, Kuleshov, Volodymyr

Large-scale sequence modeling has sparked rapid advances that now extend into biology and genomics. However, modeling genomic sequences introduces challenges such as the need to model long-range token interactions, the effects of upstream and downstr

Externí odkaz: http://arxiv.org/abs/2403.03234

Zobrazit plný text záznamu

Report

Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Autor: De, Soham, Smith, Samuel L., Fernando, Anushan, Botev, Aleksandar, Cristian-Muraru, George, Gu, Albert, Haroun, Ruba, Berrada, Leonard, Chen, Yutian, Srinivasan, Srivatsan, Desjardins, Guillaume, Doucet, Arnaud, Budden, David, Teh, Yee Whye, Pascanu, Razvan, De Freitas, Nando, Gulcehre, Caglar

Recurrent neural networks (RNNs) have fast inference and scale efficiently on long sequences, but they are difficult to train and hard to scale. We propose Hawk, an RNN with gated linear recurrences, and Griffin, a hybrid model that mixes gated linea

Externí odkaz: http://arxiv.org/abs/2402.19427

Zobrazit plný text záznamu

Report

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Autor: Gu, Albert, Dao, Tri

Foundation models, now powering most of the exciting applications in deep learning, are almost universally based on the Transformer architecture and its core attention module. Many subquadratic-time architectures such as linear attention, gated convo

Externí odkaz: http://arxiv.org/abs/2312.00752

Zobrazit plný text záznamu

Report

Augmenting conformers with structured state-space sequence models for online speech recognition

Autor: Shan, Haozhe, Gu, Albert, Meng, Zhong, Wang, Weiran, Choromanski, Krzysztof, Sainath, Tara

Online speech recognition, where the model only accesses context to the left, is an important and challenging use case for ASR systems. In this work, we investigate augmenting neural encoders for online ASR by incorporating structured state-space seq

Externí odkaz: http://arxiv.org/abs/2309.08551

Zobrazit plný text záznamu

Report

Resurrecting Recurrent Neural Networks for Long Sequences

Autor: Orvieto, Antonio, Smith, Samuel L, Gu, Albert, Fernando, Anushan, Gulcehre, Caglar, Pascanu, Razvan, De, Soham

Recurrent Neural Networks (RNNs) offer fast inference on long sequences but are hard to optimize and slow to train. Deep state-space models (SSMs) have recently been shown to perform remarkably well on long sequence modeling tasks, and have the added

Externí odkaz: http://arxiv.org/abs/2303.06349

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání