Statistical Machine Learning for Transliteration: Transliterating names between Sinhala, Tamil and English
Autor: | M.M.S.P. Ranasinghe, H.S. Priyadarshani, K. Sarveswaran, M.D.W. Rajapaksha, Gihan Dias |
---|---|
Rok vydání: | 2019 |
Předmět: |
Machine translation
Computer science business.industry Machine learning computer.software_genre Hybrid approach language.human_language Focus (linguistics) Naive Bayes classifier Test case Factor (programming language) Tamil language Transliteration Artificial intelligence business computer computer.programming_language |
Zdroj: | IALP |
DOI: | 10.1109/ialp48816.2019.9037651 |
Popis: | In this paper, we focus on building models for transliteration of personal names between the primary languages of Sri Lanka-namely Sinhala, Tamil and English. Currently, a Rule-based system has been used to transliterate names between Sinhala and Tamil. However, we found that it fails in several cases. Further, there were no systems available to transliterate names to English. In this paper, we present a hybrid approach where we use machine learning and statistical machine translation to do the transliteration. We built a parallel trilingual corpus of personal names. Then we trained a machine learner to classify names based on the ethnicity as we found it is an influencing factor in transliteration. Then we took the transliteration as a translation problem and applied statistical machine translation to generate the most probable transliteration for personal names. The system shows very promising results compared with the existing rule-based system. It gives a BLEU score of 89 in all the test cases and produces the top BLEU score of 93.7 for Sinhala to English transliteration. |
Databáze: | OpenAIRE |
Externí odkaz: |