A Basic Language Resource Kit Implementation for the Igbo NLP Project

Autor:	Ikechukwu E. Onyenwe, Mark Hepple, Uchechukwu Chinedu, Ignatius Ezeani
Rok vydání:	2018
Předmět:	060201 languages & linguistics Text corpus General Computer Science Computer science business.industry Igbo 06 humanities and the arts 02 engineering and technology computer.software_genre Basic language language.human_language Text processing 0602 languages and literature Language technology 0202 electrical engineering electronic engineering information engineering language Preprocessor 020201 artificial intelligence & image processing Artificial intelligence business computer Natural language processing
Zdroj:	ACM Transactions on Asian and Low-Resource Language Information Processing. 17:1-23
ISSN:	2375-4702 2375-4699
DOI:	10.1145/3146387
Popis:	Igbo, an African language with around 32 million speakers worldwide, is one of the many languages having few or none of the language processing resources needed for advanced language technology applications. In this article, we describe the approach taken to creating an initial set of resources for Igbo, including an electronic text corpus, a part-of-speech (POS) tagset, and a POS-tagged subcorpus. We discuss the approach taken in gathering texts, the preprocessing of these texts, and the development of the POS tagged corpus. We also discuss some of the problems encountered during corpus and tagset development and the solutions arrived at for these problems.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::7102b964b80c23f104267993b1a90e7a https://doi.org/10.1145/3146387 Zobrazit plný text záznamu