Sketching as a Tool for Understanding and Accelerating Self-attention for Long Sequences

Autor:	Chen, Y., Qi Zeng, Hakkani-Tur, D., Jin, D., Ji, H., Yang, Y.
Rok vydání:	2021
Předmět:	FOS: Computer and information sciences Computer Science - Machine Learning Computer Science - Computation and Language Statistics - Machine Learning Machine Learning (stat.ML) Computation and Language (cs.CL) Machine Learning (cs.LG)
Zdroj:	Web of Science Scopus-Elsevier
DOI:	10.48550/arxiv.2112.05359
Popis:	Transformer-based models are not efficient in processing long sequences due to the quadratic space and time complexity of the self-attention modules. To address this limitation, Linformer and Informer are proposed to reduce the quadratic complexity to linear (modulo logarithmic factors) via low-dimensional projection and row selection respectively. These two models are intrinsically connected, and to understand their connection, we introduce a theoretical framework of matrix sketching. Based on the theoretical analysis, we propose Skeinformer to accelerate self-attention and further improve the accuracy of matrix approximation to self-attention with three carefully designed components: column sampling, adaptive row normalization and pilot sampling reutilization. Experiments on the Long Range Arena (LRA) benchmark demonstrate that our methods outperform alternatives with a consistently smaller time/space footprint.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::c1a92179771f2290f46a6b1220b72558 Zobrazit plný text záznamu