Toward an Effective Igbo Part-of-Speech Tagger
Autor: | Ignatius Ezeani, Mark Hepple, Uchechukwu Chinedu, Ikechukwu E. Onyenwe |
---|---|
Rok vydání: | 2019 |
Předmět: |
Agglutinative language
050101 languages & linguistics Root (linguistics) General Computer Science Computer science business.industry 05 social sciences Igbo 02 engineering and technology computer.software_genre Part of speech language.human_language Nominalization Noun Language technology 0202 electrical engineering electronic engineering information engineering language 020201 artificial intelligence & image processing 0501 psychology and cognitive sciences Artificial intelligence business Productivity (linguistics) computer Natural language processing |
Zdroj: | ACM Transactions on Asian and Low-Resource Language Information Processing. 18:1-26 |
ISSN: | 2375-4702 2375-4699 |
DOI: | 10.1145/3314942 |
Popis: | Part-of-speech (POS) tagging is a well-established technology for most Western European languages and a few other world languages, but it has not been evaluated on Igbo, an agglutinative African language. This article presents POS tagging experiments conducted using an Igbo corpus as a test bed for identifying the POS taggers and the Machine Learning (ML) methods that can achieve a good performance with the small dataset available for the language. Experiments have been conducted using different well-known POS taggers developed for English or European languages, and different training data styles and sizes. Igbo has a number of language-specific characteristics that present a challenge for effective POS tagging. One interesting case is the wide use of verbs (and nominalizations thereof) that have an inherent noun complement , which form “linked pairs” in the POS tagging scheme, but which may appear discontinuously. Another issue is Igbo’s highly productive agglutinative morphology, which can produce many variant word forms from a given root. This productivity is a key cause of the out-of-vocabulary (OOV) words observed during Igbo tagging. We report results of experiments on a promising direction for improving tagging performance on such morphologically-inflected OOV words. |
Databáze: | OpenAIRE |
Externí odkaz: |