Addressing data sparsity for neural machine translation between morphologically rich languages

Autor:	Mercedes García-Martínez, Walid Aransa, Loïc Barrault, Fethi Bougares
Rok vydání:	2020
Předmět:	Linguistics and Language Point (typography) Machine translation Arabic business.industry Computer science 02 engineering and technology computer.software_genre Language and Linguistics Data availability language.human_language Domain (software engineering) Artificial Intelligence 0202 electrical engineering electronic engineering information engineering language 020201 artificial intelligence & image processing Artificial intelligence Computational linguistics business computer Software Natural language processing
Zdroj:	Machine Translation. 34:1-20
ISSN:	1573-0573 0922-6567
Popis:	Translating between morphologically rich languages is still challenging for current machine translation systems. In this paper, we experiment with various neural machine translation (NMT) architectures to address the data sparsity problem caused by data availability (quantity), domain shift and the languages involved (Arabic and French). We show that the Factored NMT (FNMT) model, which uses linguistically motivated factors, is able to outperform standard NMT systems using subword units by more than 1 BLEU point even when a large quantity of data is available. Our work shows the benefits of applying linguistic factors in NMT when faced with low- and high-resource conditions.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::c14fdedc994e751afc64b10efb40f7be https://doi.org/10.1007/s10590-019-09242-9 Zobrazit plný text záznamu Full text from SpringerLink