Unsupervised detection of morpheme boundaries

Autor: Abeer Alsheddi, Ahmed Khorsi
Rok vydání: 2016
Předmět:
Zdroj: 2016 4th Saudi International Conference on Information Technology (Big Data Analysis) (KACSTIT).
DOI: 10.1109/kacstit.2016.7756076
Popis: The main drawback of unsupervised approaches in Natural Language Processing (NLP) is often their low accuracy. Nevertheless, they remain a practical shortcut to accommodate a language that lacks theorization and/or computerization. The present article reports an unsupervised identification of morphemes boundaries. Based on an intuitive yet formal definition of event dependence, the approach’s input is no more than a plain text. Tests on two languages of totally different families of morphology: Arabic and English show a very acceptable precision and recall. A deeper refinement of the output allowed 89% precision and 78% recall on Arabic. While most of the NLP research works published on the Arabic, focus on the Modern Standard Arabic (MSA), we managed to build a traditional Standard Arabic data set to test our approach.
Databáze: OpenAIRE