Statistical Machine Translation Customization between Turkish and 11 Languages

Autor:	Gökhan Doğru
Jazyk:	angličtina
Rok vydání:	2020
Předmět:	lcsh:Language and Literature statistical machine translation customization Turkish automatic evaluation metrics translation quality evaluation parallel corpus Machine translation Computer science Arabic Turkish media_common.quotation_subject parallel corpus computer.software_genre Language and Linguistics Parallel corpora Personalization statistical machine translation customization Quality (business) automatic evaluation metrics media_common turkish lcsh:P101-410 business.industry lcsh:Translating and interpreting Dil ve Dil Bilim translation quality evaluation Variety (linguistics) lcsh:P306-310 language.human_language lcsh:Language. Linguistic theory. Comparative grammar language lcsh:P Catalan Artificial intelligence business computer Natural language processing
Zdroj:	transLogos: Translation Studies Journal, Vol 3, Iss 1, Pp 98-121 (2020) Volume: 3, Issue: 1 98-121 transLogos Translation Studies Journal
ISSN:	2667-4629
Popis:	Statistical Machine Translation (SMT) has been the dominant corpus-based machine translation (MT) approach in the last twenty years. While SMT has been studied in detail among European languages, it has not been studied sufficiently in language pairs including Turkish as source or target language, and its study has been limited mostly to English ↔ Turkish language pair. This study aims to broaden the perspective on Turkish corpus-based MT studies by training MT engines between Turkish and a wide variety of languages with different features. It surveys customized SMT between Turkish and 11 different languages. Twenty-two SMT engines have been trained in KantanMT with open parallel corpora using Turkish as both source and target language. Three automatic evaluation metrics F-Measure, BLEU, and TER have been used for evaluating MT quality. Due to the variations in the corpus quality and size, highly varying results have been achieved. While Turkish ↔ Catalan engines have had the highest automatic evaluation scores, Turkish ↔ Arabic engines have had the lowest automatic scores. While the quality results are highly varying across languages, we obtain baseline scores for a wide variety of languages coupled with Turkish. These results may provide a reference point for evaluating future MT systems including Turkish.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::8d84e7549738a4ab082aacdcc51ae7f2 https://dergipark.org.tr/en/download/article-file/1180416 Zobrazit plný text záznamu