Autor: |
Tukeyev Ualsher, Karibayeva Aidana, Abduali Balzhan |
Jazyk: |
English<br />French |
Rok vydání: |
2019 |
Předmět: |
|
Zdroj: |
MATEC Web of Conferences, Vol 252, p 03006 (2019) |
Druh dokumentu: |
article |
ISSN: |
2261-236X |
DOI: |
10.1051/matecconf/201925203006 |
Popis: |
The lack of big parallel data is present for the Kazakh language. This problem seriously impairs the quality of machine translation from and into Kazakh. This article considers the neural machine translation of the Kazakh language on the basis of synthetic corpora. The Kazakh language belongs to the Turkic languages, which are characterised by rich morphology. Neural machine translation of natural languages requires large training data. The article will show the model for the creation of synthetic corpora, namely the generation of sentences based on complete suffixes for the Kazakh language. The novelty of this approach of the synthetic corpora generation for the Kazakh language is the generation of sentences on the basis of the complete system of suffixes of the Kazakh language. By using generated synthetic corpora we are improving the translation quality in neural machine translation of Kazakh-English and Kazakh-Russian pairs. |
Databáze: |
Directory of Open Access Journals |
Externí odkaz: |
|