Unsupervised SMT: an analysis of Indic languages and a low resource language.

Autor: Saxena, Shefali, Chauhan, Shweta, Arora, Paras, Daniel, Philemon
Předmět:
Zdroj: Journal of Experimental & Theoretical Artificial Intelligence; Aug2024, Vol. 36 Issue 6, p865-884, 20p
Abstrakt: The rapid globalisation in language technology and the Internet's fast expansion have brought nations and their cultures close together, and the demand for inter-language interactions has risen enormously. However, in many low-resource languages (LRL) pairings and areas, Machine Translation (MT) is still not viable because of a lack of parallel data. The challenge of MT is still unsolved. Recent studies employing monolingual datasets have shown excellent outcomes in Phrase-based Statistical MT (PBSMT) and Neural MT (NMT) systems. However, earlier researchers have demonstrated that unsupervised Statistical MT surpasses unsupervised NMT, especially for different language pairings. The study unveils the compendium of ten unsupervised SMT systems translation tasks utilizing a monolingual dataset from the Dravidian and Indo-Aryan language families; and a low-resource endangered language. The machine-translated experimental outcomes examined the system using different tokenizers and investigated them for various language pairs using different evaluation metrics for various iterations. The statistical significance of test results has been computed for each evaluation metric to check the true system quality of the translation tasks. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index