Understanding the effects of word-level linguistic annotations in under-resourced neural machine translation
Autor: | Víctor M. Sánchez-Cartagena, Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez |
---|---|
Přispěvatelé: | Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Transducens |
Jazyk: | angličtina |
Rok vydání: | 2020 |
Předmět: |
Measure (data warehouse)
Machine translation Computer science Neural machine translation 02 engineering and technology Part of speech computer.software_genre Linguistics 030507 speech-language pathology & audiology 03 medical and health sciences Annotation Under-resourced Rule-based machine translation Lenguajes y Sistemas Informáticos 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Grammaticality 0305 other medical science computer Word (computer architecture) Word-level linguistic annotations |
Zdroj: | Proceedings of the 28th International Conference on Computational Linguistics COLING RUA. Repositorio Institucional de la Universidad de Alicante Universidad de Alicante (UA) |
Popis: | This paper studies the effects of word-level linguistic annotations in under-resourced neural machine translation, for which there is incomplete evidence in the literature. The study covers eight language pairs, different training corpus sizes, two architectures and three types of annotation: dummy tags (with no linguistic information at all), part-of-speech tags, and morpho-syntactic description tags, which consist of part of speech and morphological features. These linguistic annotations are interleaved in the input or output streams as a single tag placed before each word. In order to measure the performance under each scenario, we use automatic evaluation metrics and perform automatic error classification. Our experiments show that, in general, source-language annotations are helpful and morpho-syntactic descriptions outperform part of speech for some language pairs. On the contrary, when words are annotated in the target language, part-of-speech tags systematically outperform morpho-syntactic description tags in terms of automatic evaluation metrics, even though the use of morpho-syntactic description tags improves the grammaticality of the output. We provide a detailed analysis of the reasons behind this result. Work funded by the European Union’s Horizon 2020 research and innovation programme under grant agreement number 825299, project Global Under-Resourced Media Translation (GoURMET). |
Databáze: | OpenAIRE |
Externí odkaz: |