Embedded Translations for Low-resource Automated Glossing

Autor:	Yang, Changbing, Nicolai, Garrett, Silfverberg, Miikka
Rok vydání:	2024
Předmět:	Computer Science - Computation and Language
Druh dokumentu:	Working Paper
Popis:	We investigate automatic interlinear glossing in low-resource settings. We augment a hard-attentional neural model with embedded translation information extracted from interlinear glossed text. After encoding these translations using large language models, specifically BERT and T5, we introduce a character-level decoder for generating glossed output. Aided by these enhancements, our model demonstrates an average improvement of 3.97\%-points over the previous state of the art on datasets from the SIGMORPHON 2023 Shared Task on Interlinear Glossing. In a simulated ultra low-resource setting, trained on as few as 100 sentences, our system achieves an average 9.78\%-point improvement over the plain hard-attentional baseline. These results highlight the critical role of translation information in boosting the system's performance, especially in processing and interpreting modest data sources. Our findings suggest a promising avenue for the documentation and preservation of languages, with our experiments on shared task datasets indicating significant advancements over the existing state of the art.
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2403.08189 Zobrazit plný text záznamu View this record from Arxiv