Neural machine translation for Kashmiri to Eglish and Hindi using pre-trained embeddings
Autor: | Shailashree K Sheshadri, Deepa Gupta, Marta R. Costa-Jussá |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2022 |
Předmět: |
Indic Languages
Neural Machine Translation Transformer modeling Knowledge management MBPE Ensenyament i aprenentatge::Aprenentatge de llengües [Àrees temàtiques de la UPC] Transformer-model Computational linguistics Pre-trained embeddings Transfer learning Embeddings Machine translations Parallel corpora Attention Mechanism Pre-trained embedding Traducció automàtica Machine translating Computer aided language translation Indic language Zero-shot learning Attention mechanisms Machine translation systems Translation systems |
Popis: | Neural Machine Translation (NMT) is one of the advanced approaches of Machine Translation (MT) that has recently gained popularity. A significant amount of parallel corpus is required to achieve a sound translation system, but most languages have a deficit worldwide. Many SoTA NMT systems are available for low-resource langauges that are developed using transfer learning, knowledge transfer, and zero-shot learning mechanisms. Most Indic languages fall into low-resource and zero-resource due to the non-availability of rich parallel and monolingual corpora. Though many Indian border languages have social and economic significance, they lack resources and automated machine translation systems. Kashmiri, one such Indian border language, belongs to the zero-resource category with limited corpora and no significant translation system. This paper uses pre-trained word embeddings to create the first NMT system specifically for Kashmiri-English and Kashmiri-Hindi translation. mBPE pre-trained word embeddings for Kashmiri language are used to develop the NMT system. A pre-trained word embedding model shows +2.58 BLEU improvisation in comparison to Vanilla NMT. |
Databáze: | OpenAIRE |
Externí odkaz: |