Active Learning for Post-Editing Based Incrementally Retrained MT

Autor: Aswarth Abhilash Dara, Josef van Genabith, Antonio Toral, Qun Liu, John Judge
Rok vydání: 2014
Předmět:
Zdroj: EACL
DOI: 10.3115/v1/e14-4036
Popis: Machine translation, in particular statistical machine translation (SMT), is making big inroads into the localisation and translation industry. In typical workflows (S)MT output is checked and (where required) manually post-edited by human translators. Recently, a significant amount of research has concentrated on capturing human post-editing outputs as early as possible to incrementally update/modify SMT models to avoid repeat mistakes. Typically in these approaches, MT and post-edits happen sequentially and chronologically, following the way unseen data (the translation job) is presented. In this paper, we add to the existing literature addressing the question whether and if so, to what extent, this process can be improved upon by Active Learning, where input is not presented chronologically but dynamically selected according to criteria that maximise performance with respect to (whatever is) the remaining data. We explore novel (source side-only) selection criteria and show performance increases of 0.67-2.65 points TER absolute on average on typical industry data sets compared to sequential PEbased incrementally retrained SMT.
Databáze: OpenAIRE