Unsupervised dialectal neural machine translation
Autor: | Ahmad Bisher Tarakji, Ruba Waleed Jaikat, Wael Farhan, Bashar Talafha, Mahmoud Al-Ayyoub, Anas Toma, Analle Abuammar |
---|---|
Rok vydání: | 2020 |
Předmět: |
Machine translation
Computer science business.industry Cosine similarity 020206 networking & telecommunications 02 engineering and technology Library and Information Sciences Management Science and Operations Research Translation (geometry) computer.software_genre language.human_language Computer Science Applications Standard language 0202 electrical engineering electronic engineering information engineering Media Technology Modern Standard Arabic language 020201 artificial intelligence & image processing Language model Artificial intelligence business computer Natural language processing Word (computer architecture) Information Systems |
Zdroj: | Information Processing & Management. 57:102181 |
ISSN: | 0306-4573 |
DOI: | 10.1016/j.ipm.2019.102181 |
Popis: | In this paper, we present the first work on unsupervised dialectal Neural Machine Translation (NMT), where the source dialect is not represented in the parallel training corpus. Two systems are proposed for this problem. The first one is the Dialectal to Standard Language Translation (D2SLT) system, which is based on the standard attentional sequence-to-sequence model while introducing two novel ideas leveraging similarities among dialects: using common words as anchor points when learning word embeddings and a decoder scoring mechanism that depends on cosine similarity and language models. The second system is based on the celebrated Google NMT (GNMT) system. We first evaluate these systems in a supervised setting (where the training and testing are done using our parallel corpus of Jordanian dialect and Modern Standard Arabic (MSA)) before going into the unsupervised setting (where we train each system once on a Saudi-MSA parallel corpus and once on an Egyptian-MSA parallel corpus and test them on the Jordanian-MSA parallel corpus). The highest BLEU score obtained in the unsupervised setting is 32.14 (by D2SLT trained on Saudi-MSA data), which is remarkably high compared with the highest BLEU score obtained in the supervised setting, which is 48.25. |
Databáze: | OpenAIRE |
Externí odkaz: |