Machine Translation in the Covid domain: an English-Irish case study for LoResMT 2021

Autor:	Lankford, Séamus, Afli, Haithem, Way, Andy
Rok vydání:	2024
Předmět:	Computer Science - Computation and Language Computer Science - Artificial Intelligence
Zdroj:	Proceedings of the 4th Workshop on Technologies for MT of Low Resource Languages (LoResMT2021)
Druh dokumentu:	Working Paper
Popis:	Translation models for the specific domain of translating Covid data from English to Irish were developed for the LoResMT 2021 shared task. Domain adaptation techniques, using a Covid-adapted generic 55k corpus from the Directorate General of Translation, were applied. Fine-tuning, mixed fine-tuning and combined dataset approaches were compared with models trained on an extended in-domain dataset. As part of this study, an English-Irish dataset of Covid related data, from the Health and Education domains, was developed. The highest-performing model used a Transformer architecture trained with an extended in-domain Covid dataset. In the context of this study, we have demonstrated that extending an 8k in-domain baseline dataset by just 5k lines improved the BLEU score by 27 points.
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2403.01196 Zobrazit plný text záznamu View this record from Arxiv