End-to-End Neural Word Alignment Outperforms GIZA++

Autor:	Thomas Zenkel, Joern Wuebker, John DeNero
Jazyk:	angličtina
Rok vydání:	2020
Předmět:	FOS: Computer and information sciences Computer Science - Computation and Language Machine translation Computer science Speech recognition 05 social sciences 010501 environmental sciences computer.software_genre Lexicon 01 natural sciences End-to-end principle 0502 economics and business Unsupervised learning 050207 economics Computation and Language (cs.CL) computer 0105 earth and related environmental sciences Transformer (machine learning model)
Zdroj:	ACL
Popis:	Word alignment was once a core unsupervised learning task in natural language processing because of its essential role in training statistical machine translation (MT) models. Although unnecessary for training neural MT models, word alignment still plays an important role in interactive applications of neural machine translation, such as annotation transfer and lexicon injection. While statistical MT methods have been replaced by neural approaches with superior performance, the twenty-year-old GIZA++ toolkit remains a key component of state-of-the-art word alignment systems. Prior work on neural word alignment has only been able to outperform GIZA++ by using its output during training. We present the first end-to-end neural word alignment method that consistently outperforms GIZA++ on three data sets. Our approach repurposes a Transformer model trained for supervised translation to also serve as an unsupervised word alignment model in a manner that is tightly integrated and does not affect translation quality. Accepted at ACL 2020
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::a7b9730f32ee54622c36da2d90b9c5f3 http://arxiv.org/abs/2004.14675 Zobrazit plný text záznamu