Statistical Machine Learning for Transliteration: Transliterating names between Sinhala, Tamil and English

Autor: M.M.S.P. Ranasinghe, H.S. Priyadarshani, K. Sarveswaran, M.D.W. Rajapaksha, Gihan Dias
Rok vydání: 2019
Předmět:
Zdroj: IALP
DOI: 10.1109/ialp48816.2019.9037651
Popis: In this paper, we focus on building models for transliteration of personal names between the primary languages of Sri Lanka-namely Sinhala, Tamil and English. Currently, a Rule-based system has been used to transliterate names between Sinhala and Tamil. However, we found that it fails in several cases. Further, there were no systems available to transliterate names to English. In this paper, we present a hybrid approach where we use machine learning and statistical machine translation to do the transliteration. We built a parallel trilingual corpus of personal names. Then we trained a machine learner to classify names based on the ethnicity as we found it is an influencing factor in transliteration. Then we took the transliteration as a translation problem and applied statistical machine translation to generate the most probable transliteration for personal names. The system shows very promising results compared with the existing rule-based system. It gives a BLEU score of 89 in all the test cases and produces the top BLEU score of 93.7 for Sinhala to English transliteration.
Databáze: OpenAIRE