Analysing cross-lingual transfer in lemmatisation for Indian languages
Autor: | Kumar Saunack, Kumar Saurav, Pushpak Bhattacharyya |
---|---|
Rok vydání: | 2020 |
Předmět: | |
Zdroj: | COLING |
DOI: | 10.18653/v1/2020.coling-main.534 |
Popis: | Lemmatization aims to reduce the sparse data problem by relating the inflected forms of a word to its dictionary form. However, most of the prior work on this topic has focused on high resource languages. In this paper, we evaluate cross-lingual approaches for low resource languages, especially in the context of morphologically rich Indian languages. We test our model on six languages from two different families and develop linguistic insights into each model’s performance. |
Databáze: | OpenAIRE |
Externí odkaz: |