Lexical Disambiguation of Igbo using Diacritic Restoration

Autor: Ikechukwu E. Onyenwe, Mark Hepple, Ignatius Ezeani
Rok vydání: 2017
Předmět:
Zdroj: Proceedings of the 1st Workshop on Sense, Concept and Entity Representations and their Applications.
DOI: 10.18653/v1/w17-1907
Popis: Properly written texts in Igbo, a low resource African language, are rich in both orthographic and tonal diacritics. Diacritics are essential in capturing the distinctions in pronunciation and meaning of words, as well as in lexical disambiguation. Unfortunately, most electronic texts in diacritic languages are written without diacritics. This makes diacritic restoration a necessary step in corpus building and language processing tasks for languages with diacritics. In our previous work, we built some n−gram models with simple smoothing techniques based on a closedworld assumption. However, as a classi- fication task, diacritic restoration is well suited for and will be more generalisable with machine learning. This paper, therefore, presents a more standard approach to dealing with the task which involves the application of machine learning algorithms.
Databáze: OpenAIRE