Integrating Weakly Supervised Word Sense Disambiguation into Neural Machine Translation
Autor: | Xiao Pu, Nikolaos Pappas, Andrei Popescu-Belis, James Henderson |
---|---|
Rok vydání: | 2018 |
Předmět: |
FOS: Computer and information sciences
Linguistics and Language Machine translation Computer science Context (language use) 02 engineering and technology 010501 environmental sciences computer.software_genre 01 natural sciences attention-based models Artificial Intelligence 0202 electrical engineering electronic engineering information engineering 0105 earth and related environmental sciences Computer Science - Computation and Language Word-sense disambiguation business.industry Communication neural machine translation Computer Science Applications Human-Computer Interaction word sense disambiguation 020201 artificial intelligence & image processing Artificial intelligence business computer Computation and Language (cs.CL) Natural language processing |
Zdroj: | Transactions of the Association for Computational Linguistics |
DOI: | 10.5281/zenodo.2275709 |
Popis: | This paper demonstrates that word sense disambiguation (WSD) can improve neural machine translation (NMT) by widening the source context considered when modeling the senses of potentially ambiguous words. We first introduce three adaptive clustering algorithms for WSD, based on k-means, Chinese restaurant processes, and random walks, which are then applied to large word contexts represented in a low-rank space and evaluated on SemEval shared-task data. We then learn word vectors jointly with sense vectors defined by our best WSD method, within a state-of-the-art NMT system. We show that the concatenation of these vectors, and the use of a sense selection mechanism based on the weighted average of sense vectors, outperforms several baselines including sense-aware ones. This is demonstrated by translation on five language pairs. The improvements are above one BLEU point over strong NMT baselines, +4% accuracy over all ambiguous nouns and verbs, or +20% when scored manually over several challenging words. Comment: To appear in TACL |
Databáze: | OpenAIRE |
Externí odkaz: |