Syntactic Transfer to Kyrgyz Using the Treebank Translation Method

Autor: Alekseev, Anton, Tillabaeva, Alina, Kabaeva, Gulnara Dzh., Nikolenko, Sergey I.
Jazyk: ruština
Rok vydání: 2024
Předmět:
Druh dokumentu: Working Paper
Popis: The Kyrgyz language, as a low-resource language, requires significant effort to create high-quality syntactic corpora. This study proposes an approach to simplify the development process of a syntactic corpus for Kyrgyz. We present a tool for transferring syntactic annotations from Turkish to Kyrgyz based on a treebank translation method. The effectiveness of the proposed tool was evaluated using the TueCL treebank. The results demonstrate that this approach achieves higher syntactic annotation accuracy compared to a monolingual model trained on the Kyrgyz KTMU treebank. Additionally, the study introduces a method for assessing the complexity of manual annotation for the resulting syntactic trees, contributing to further optimization of the annotation process.
Comment: To be published in the Journal of Math. Sciences. Zapiski version (in Russian): http://www.pdmi.ras.ru/znsl/2024/v540/abs252.html
Databáze: arXiv