Do Transformers Need Deep Long-Range Memory?
Autor: | Ali Razavi, Jack W. Rae |
---|---|
Rok vydání: | 2020 |
Předmět: |
FOS: Computer and information sciences
Computer Science - Machine Learning Computer Science - Computation and Language Computer science Machine Learning (stat.ML) 02 engineering and technology 010501 environmental sciences 01 natural sciences Machine Learning (cs.LG) Computer engineering Statistics - Machine Learning 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Computation and Language (cs.CL) 0105 earth and related environmental sciences Transformer (machine learning model) |
Zdroj: | ACL |
Popis: | Deep attention models have advanced the modelling of sequential data across many domains. For language modelling in particular, the Transformer-XL -- a Transformer augmented with a long-range memory of past activations -- has been shown to be state-of-the-art across a variety of well-studied benchmarks. The Transformer-XL incorporates a long-range memory at every layer of the network, which renders its state to be thousands of times larger than RNN predecessors. However it is unclear whether this is necessary. We perform a set of interventions to show that comparable performance can be obtained with 6X fewer long range memories and better performance can be obtained by limiting the range of attention in lower layers of the network. published at 58th Annual Meeting of the Association for Computational Linguistics. 6 pages, 4 figures, 1 table |
Databáze: | OpenAIRE |
Externí odkaz: |