Training, Enhancing, Evaluating and Using MT Systems with Comparable Data

Autor: Andrejs Vasiļjevs, Andreas Eisele, Yu Chen, Raivis Skadiņš, Xiaojun Zhang, Bogdan Babych, Sabine Hunsicker, Inguna Skadiņa, Mārcis Pinnis, Mateja Verlic, Gregor Thurmair
Rok vydání: 2019
Předmět:
Zdroj: Using Comparable Corpora for Under-Resourced Areas of Machine Translation ISBN: 9783319990033
Using Comparable Corpora for Under-Resourced Areas of Machine Translation
Popis: This chapter describes how semi-parallel and parallel data extracted from comparable corpora can be used in enhancing machine translation (MT) systems: what are the methods used for this task in statistical and rule-based machine translation systems; what kinds of showcases exist that illustrate the usage of such enhanced MT systems. The impact of data extracted from comparable corpora on MT quality is evaluated for 17 language pairs, and detailed studies involving human evaluation are carried out for 11 language pairs. At first, baseline statistical machine translation (SMT) systems were built using traditional SMT techniques. Then they were improved by the integration of additional data extracted from the comparable corpora. Comparative evaluation was performed to measure improvements. Comparable corpora were also used to enrich the linguistic knowledge of rule-based machine translation (RBMT) systems by applying terminology extraction technology. Finally, SMT systems were adjusted for a narrow domain and included domain-specific knowledge such as terminology, named entities (NEs), domain-specific language models (LMs), etc.
Databáze: OpenAIRE