Výsledky vyhledávání

Report

Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers

Autor: Hwang, Sukjun, Lahoti, Aakash, Dao, Tri, Gu, Albert

A wide array of sequence models are built on a framework modeled after Transformers, comprising alternating sequence mixer and channel mixer layers. This paper studies a unifying matrix mixer view of sequence mixers that can be conceptualized as a li

Externí odkaz: http://arxiv.org/abs/2407.09941

Zobrazit plný text záznamu

Report

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision

Autor: Shah, Jay, Bikshandi, Ganesh, Zhang, Ying, Thakkar, Vijay, Ramani, Pradeep, Dao, Tri

Attention, as a core layer of the ubiquitous Transformer architecture, is the bottleneck for large language models and long-context applications. FlashAttention elaborated an approach to speed up attention on GPUs through minimizing memory reads/writ

Externí odkaz: http://arxiv.org/abs/2407.08608

Zobrazit plný text záznamu

Report

An Empirical Study of Mamba-based Language Models

Autor: Waleffe, Roger, Byeon, Wonmin, Riach, Duncan, Norick, Brandon, Korthikanti, Vijay, Dao, Tri, Gu, Albert, Hatamizadeh, Ali, Singh, Sudhakar, Narayanan, Deepak, Kulshreshtha, Garvit, Singh, Vartika, Casper, Jared, Kautz, Jan, Shoeybi, Mohammad, Catanzaro, Bryan

Selective state-space models (SSMs) like Mamba overcome some of the shortcomings of Transformers, such as quadratic computational complexity with sequence length and large inference-time memory requirements from the key-value cache. Moreover, recent

Externí odkaz: http://arxiv.org/abs/2406.07887

Zobrazit plný text záznamu

Report

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Autor: Dao, Tri, Gu, Albert

While Transformers have been the main architecture behind deep learning's success in language modeling, state-space models (SSMs) such as Mamba have recently been shown to match or outperform Transformers at small to medium scale. We show that these

Externí odkaz: http://arxiv.org/abs/2405.21060

Zobrazit plný text záznamu

Report

Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling

Autor: Schiff, Yair, Kao, Chia-Hsiang, Gokaslan, Aaron, Dao, Tri, Gu, Albert, Kuleshov, Volodymyr

Large-scale sequence modeling has sparked rapid advances that now extend into biology and genomics. However, modeling genomic sequences introduces challenges such as the need to model long-range token interactions, the effects of upstream and downstr

Externí odkaz: http://arxiv.org/abs/2403.03234

Zobrazit plný text záznamu

Report

StarCoder 2 and The Stack v2: The Next Generation

Autor: Lozhkov, Anton, Li, Raymond, Allal, Loubna Ben, Cassano, Federico, Lamy-Poirier, Joel, Tazi, Nouamane, Tang, Ao, Pykhtar, Dmytro, Liu, Jiawei, Wei, Yuxiang, Liu, Tianyang, Tian, Max, Kocetkov, Denis, Zucker, Arthur, Belkada, Younes, Wang, Zijian, Liu, Qian, Abulkhanov, Dmitry, Paul, Indraneil, Li, Zhuang, Li, Wen-Ding, Risdal, Megan, Li, Jia, Zhu, Jian, Zhuo, Terry Yue, Zheltonozhskii, Evgenii, Dade, Nii Osae Osae, Yu, Wenhao, Krauß, Lucas, Jain, Naman, Su, Yixuan, He, Xuanli, Dey, Manan, Abati, Edoardo, Chai, Yekun, Muennighoff, Niklas, Tang, Xiangru, Oblokulov, Muhtasham, Akiki, Christopher, Marone, Marc, Mou, Chenghao, Mishra, Mayank, Gu, Alex, Hui, Binyuan, Dao, Tri, Zebaze, Armel, Dehaene, Olivier, Patry, Nicolas, Xu, Canwen, McAuley, Julian, Hu, Han, Scholak, Torsten, Paquet, Sebastien, Robinson, Jennifer, Anderson, Carolyn Jane, Chapados, Nicolas, Patwary, Mostofa, Tajbakhsh, Nima, Jernite, Yacine, Ferrandis, Carlos Muñoz, Zhang, Lingming, Hughes, Sean, Wolf, Thomas, Guha, Arjun, von Werra, Leandro, de Vries, Harm

The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder2. In partnership with Software Heritage (SWH), we build The Stack v2 on top of the digita

Externí odkaz: http://arxiv.org/abs/2402.19173

Zobrazit plný text záznamu

Report

BitDelta: Your Fine-Tune May Only Be Worth One Bit

Autor: Liu, James, Xiao, Guangxuan, Li, Kai, Lee, Jason D., Han, Song, Dao, Tri, Cai, Tianle

Large Language Models (LLMs) are typically trained in two phases: pre-training on large internet-scale datasets, and fine-tuning for downstream tasks. Given the higher computational demand of pre-training, it's intuitive to assume that fine-tuning ad

Externí odkaz: http://arxiv.org/abs/2402.10193

Zobrazit plný text záznamu

Report

Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads

Autor: Cai, Tianle, Li, Yuhong, Geng, Zhengyang, Peng, Hongwu, Lee, Jason D., Chen, Deming, Dao, Tri

Large Language Models (LLMs) employ auto-regressive decoding that requires sequential computation, with each step reliant on the previous one's output. This creates a bottleneck as each step necessitates moving the full model parameters from High-Ban

Externí odkaz: http://arxiv.org/abs/2401.10774

Zobrazit plný text záznamu

Report

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Autor: Gu, Albert, Dao, Tri

Foundation models, now powering most of the exciting applications in deep learning, are almost universally based on the Transformer architecture and its core attention module. Many subquadratic-time architectures such as linear attention, gated convo

Externí odkaz: http://arxiv.org/abs/2312.00752

Zobrazit plný text záznamu

Report

Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time

Autor: Liu, Zichang, Wang, Jue, Dao, Tri, Zhou, Tianyi, Yuan, Binhang, Song, Zhao, Shrivastava, Anshumali, Zhang, Ce, Tian, Yuandong, Re, Christopher, Chen, Beidi

Publikováno v: Proceedings of the 40th International Conference on Machine Learning, 2023, 919

Large language models (LLMs) with hundreds of billions of parameters have sparked a new wave of exciting AI applications. However, they are computationally expensive at inference time. Sparsity is a natural approach to reduce this cost, but existing

Externí odkaz: http://arxiv.org/abs/2310.17157

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání