Automatic Restoration of Diacritics for Igbo Language
Autor: | Mark Hepple, Ikechukwu E. Onyenwe, Ignatius Ezeani |
---|---|
Rok vydání: | 2016 |
Předmět: |
060201 languages & linguistics
Process (engineering) Computer science business.industry Igbo 06 humanities and the arts 02 engineering and technology Pronunciation computer.software_genre language.human_language Linguistics Task (project management) Vowel 0602 languages and literature Diacritic 0202 electrical engineering electronic engineering information engineering language 020201 artificial intelligence & image processing Artificial intelligence Value (semiotics) business computer Natural language processing Meaning (linguistics) |
Zdroj: | Text, Speech, and Dialogue ISBN: 9783319455099 TSD |
DOI: | 10.1007/978-3-319-45510-5_23 |
Popis: | Igbo is a low-resource African language with orthographic and tonal diacritics, which capture distinctions between words that are important for both meaning and pronunciation, and hence of potential value for a range of language processing tasks. Such diacritics, however, are often largely absent from the electronic texts we might want to process, or assemble into corpora, and so the need arises for effective methods for automatic diacritic restoration for Igbo. In this paper, we experiment using an Igbo bible corpus, which is extensively marked for vowel distinctions, and partially for tonal distinctions, and attempt the task of reinstating these diacritics when they have been deleted. We investigate a number of word-level diacritic restoration methods, based on n-grams, under a closed-world assumption, achieving an accuracy of 98.83 % with our most effective method. |
Databáze: | OpenAIRE |
Externí odkaz: |