Context-aware positional representation for self-attention networks

Autor:	Masao Utiyama, Rui Wang, Eiichiro Sumita, Kehai Chen
Rok vydání:	2021
Předmět:	0209 industrial biotechnology Machine translation Computer science business.industry Cognitive Neuroscience Representation (systemics) Context (language use) 02 engineering and technology computer.software_genre Translation (geometry) Computer Science Applications 020901 industrial engineering & automation Artificial Intelligence Position (vector) 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Artificial intelligence business computer Sentence Natural language processing Word (computer architecture) Transformer (machine learning model)
Zdroj:	Neurocomputing. 451:46-56
ISSN:	0925-2312
DOI:	10.1016/j.neucom.2021.04.055
Popis:	In self-attention networks (SANs), positional embeddings are used to model order dependencies between words in the input sentence and are added with word embeddings to gain an input representation, which enables the SAN-based neural model to perform (multi-head) and to stack (multi-layer) self-attentive functions in parallel to learn the representation of the input sentence. However, this input representation only involves static order dependencies based on discrete position indexes of words, that is, is independent of context information, which may be weak in modeling the input sentence. To address this issue, we proposed a novel positional representation method to model order dependencies based on n-gram context or sentence context in the input sentence, which allows SANs to learn a more effective sentence representation. To validate the effectiveness of the proposed method, it is applied to the neural machine translation model, which adopts a typical SAN-based neural model. Experimental results on two widely used translation tasks, i.e., WMT14 English-to-German and WMT17 Chinese-to-English, showed that the proposed approach can significantly improve the translation performance over the strong Transformer baseline.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::455d3585c6fa2907244a7ca591aec886 https://doi.org/10.1016/j.neucom.2021.04.055 Zobrazit plný text záznamu Full Text from ScienceDirect