Efficient incremental training using a novel NMT-SMT hybrid framework for translation of low-resource languages

Autor:	Kumar Bhuvaneswari, Murugesan Varalakshmi
Jazyk:	angličtina
Rok vydání:	2024
Předmět:	hybrid NMT-SMT incremental training beam search SMT phrase table low-resource languages Electronic computers. Computer science QA75.5-76.95
Zdroj:	Frontiers in Artificial Intelligence, Vol 7 (2024)
Druh dokumentu:	article
ISSN:	2624-8212
DOI:	10.3389/frai.2024.1381290
Popis:	The data-hungry statistical machine translation (SMT) and neural machine translation (NMT) models offer state-of-the-art results for languages with abundant data resources. However, extensive research is imperative to make these models perform equally well for low-resource languages. This paper proposes a novel approach to integrate the best features of the NMT and SMT systems for improved translation performance of low-resource English–Tamil language pair. The suboptimal NMT model trained with the small parallel corpus translates the monolingual corpus and selects only the best translations, to retrain itself in the next iteration. The proposed method employs the SMT phrase-pair table to determine the best translations, based on the maximum match between the words of the phrase-pair dictionary and each of the individual translations. This repeating cycle of translation and retraining generates a large quasi-parallel corpus, thus making the NMT model more powerful. SMT-integrated incremental training demonstrates a substantial difference in translation performance as compared to the existing approaches for incremental training. The model is strengthened further by adopting a beam search decoding strategy to produce k best possible translations for each input sentence. Empirical findings prove that the proposed model with BLEU scores of 19.56 and 23.49 outperforms the baseline NMT with scores 11.06 and 17.06 for Eng-to-Tam and Tam-to-Eng translations, respectively. METEOR score evaluation further corroborates these results, proving the supremacy of the proposed model.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/e3acad548ae043aa92cc8cad027e5f69 Zobrazit plný text záznamu View record in DOAJ