Toward an Effective Igbo Part-of-Speech Tagger

Autor: Ignatius Ezeani, Mark Hepple, Uchechukwu Chinedu, Ikechukwu E. Onyenwe
Rok vydání: 2019
Předmět:
Zdroj: ACM Transactions on Asian and Low-Resource Language Information Processing. 18:1-26
ISSN: 2375-4702
2375-4699
DOI: 10.1145/3314942
Popis: Part-of-speech (POS) tagging is a well-established technology for most Western European languages and a few other world languages, but it has not been evaluated on Igbo, an agglutinative African language. This article presents POS tagging experiments conducted using an Igbo corpus as a test bed for identifying the POS taggers and the Machine Learning (ML) methods that can achieve a good performance with the small dataset available for the language. Experiments have been conducted using different well-known POS taggers developed for English or European languages, and different training data styles and sizes. Igbo has a number of language-specific characteristics that present a challenge for effective POS tagging. One interesting case is the wide use of verbs (and nominalizations thereof) that have an inherent noun complement , which form “linked pairs” in the POS tagging scheme, but which may appear discontinuously. Another issue is Igbo’s highly productive agglutinative morphology, which can produce many variant word forms from a given root. This productivity is a key cause of the out-of-vocabulary (OOV) words observed during Igbo tagging. We report results of experiments on a promising direction for improving tagging performance on such morphologically-inflected OOV words.
Databáze: OpenAIRE