Can Automatic Post-Editing Improve NMT?
Autor: | Ewa Szymanska, Liling Tan, Raymond Hendy Susanto, Shamil Chollampatt |
---|---|
Rok vydání: | 2020 |
Předmět: |
FOS: Computer and information sciences
Computer Science - Computation and Language Machine translation business.industry Computer science 02 engineering and technology 010501 environmental sciences computer.software_genre 01 natural sciences language.human_language Field (computer science) Task (project management) German 0202 electrical engineering electronic engineering information engineering language 020201 artificial intelligence & image processing Relevance (information retrieval) Artificial intelligence Compiler business Computation and Language (cs.CL) computer Natural language processing 0105 earth and related environmental sciences |
Zdroj: | EMNLP (1) |
DOI: | 10.18653/v1/2020.emnlp-main.217 |
Popis: | Automatic post-editing (APE) aims to improve machine translations, thereby reducing human post-editing effort. APE has had notable success when used with statistical machine translation (SMT) systems but has not been as successful over neural machine translation (NMT) systems. This has raised questions on the relevance of APE task in the current scenario. However, the training of APE models has been heavily reliant on large-scale artificial corpora combined with only limited human post-edited data. We hypothesize that APE models have been underperforming in improving NMT translations due to the lack of adequate supervision. To ascertain our hypothesis, we compile a larger corpus of human post-edits of English to German NMT. We empirically show that a state-of-art neural APE model trained on this corpus can significantly improve a strong in-domain NMT system, challenging the current understanding in the field. We further investigate the effects of varying training data sizes, using artificial training data, and domain specificity for the APE task. We release this new corpus under CC BY-NC-SA 4.0 license at https://github.com/shamilcm/pedra. In EMNLP 2020 |
Databáze: | OpenAIRE |
Externí odkaz: |