Neural machine translation for Tamil to English

Autor: Jain, Minni, Punia, Ravneet, Hooda, Ishika
Zdroj: Journal of Statistics and Management Systems; October 2020, Vol. 23 Issue: 7 p1251-1264, 14p
Abstrakt: AbstractThe Tamil language is spoken by 80 million people around the world. The translation between Tamil and English leads to a significant impact by helping in the understanding of Tamil scripts, which otherwise would be a tedious, costly, and time-consuming process. Thus, developing an automated system to perform Tamil to English translation would save human time and effort. We publicly release a new high-quality corpus for standard training, evaluation, and report results experiments with two different architectures based on Encoder-Decoder to translate Tamil to English. We further tried to improve it by experimenting with pre-trained word embeddings and tuning hyperparameters. Although Google-Translator also provides Tamil to English and vice versa, our implemented architectures, along with the new dataset, completely outperformed the Google Translator with a margin of 7.5 BLEU score. Moreover, our proposed model solves out of vocabulary and polysemy problems up to a greater extent. Our dataset and implementation are available at: https://github.com/Ishikahooda/Tamil-English-Dataset
Databáze: Supplemental Index