Statistical Machine Learning for Transliteration: Transliterating names between Sinhala, Tamil and English

Autor:	M.M.S.P. Ranasinghe, H.S. Priyadarshani, K. Sarveswaran, M.D.W. Rajapaksha, Gihan Dias
Rok vydání:	2019
Předmět:	Machine translation Computer science business.industry Machine learning computer.software_genre Hybrid approach language.human_language Focus (linguistics) Naive Bayes classifier Test case Factor (programming language) Tamil language Transliteration Artificial intelligence business computer computer.programming_language
Zdroj:	IALP
DOI:	10.1109/ialp48816.2019.9037651
Popis:	In this paper, we focus on building models for transliteration of personal names between the primary languages of Sri Lanka-namely Sinhala, Tamil and English. Currently, a Rule-based system has been used to transliterate names between Sinhala and Tamil. However, we found that it fails in several cases. Further, there were no systems available to transliterate names to English. In this paper, we present a hybrid approach where we use machine learning and statistical machine translation to do the transliteration. We built a parallel trilingual corpus of personal names. Then we trained a machine learner to classify names based on the ethnicity as we found it is an influencing factor in transliteration. Then we took the transliteration as a translation problem and applied statistical machine translation to generate the most probable transliteration for personal names. The system shows very promising results compared with the existing rule-based system. It gives a BLEU score of 89 in all the test cases and produces the top BLEU score of 93.7 for Sinhala to English transliteration.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::1b624da0477586362ffd7775b3ae7e14 https://doi.org/10.1109/ialp48816.2019.9037651 Zobrazit plný text záznamu