Latte: Latent Attention for Linear Time Transformers

Autor:	Dolga, Rares, Maystre, Lucas, Cobzarenco, Marius, Barber, David
Rok vydání:	2024
Předmět:	Computer Science - Computation and Language Statistics - Machine Learning
Druh dokumentu:	Working Paper
Popis:	The time complexity of the standard attention mechanism in transformers scales quadratically with sequence length. We propose a probabilistic framework for attention, enabling us to derive a novel low-rank linear re-parameterisation of both bidirectional and causal cases, based on defining a latent variable model. Our method can be seamlessly integrated as a drop-in replacement for the standard attention mechanism. Additionally, this framework provides a natural extension for combining local standard attention with our global linear attention. This approach allows us to extend the context length of existing large pre-trained models with only a few additional training steps. The resulting ``Latte Transformer'' achieves performance comparable to standard attention and other state-of-the-art models, while maintaining linear time and memory complexity, along with constant-time next-token prediction during inference.
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2402.17512 Zobrazit plný text záznamu View this record from Arxiv