Unsupervised detection of morpheme boundaries
Autor: | Abeer Alsheddi, Ahmed Khorsi |
---|---|
Rok vydání: | 2016 |
Předmět: |
Vocabulary
Morphology (linguistics) Computer science Arabic media_common.quotation_subject 02 engineering and technology computer.software_genre 03 medical and health sciences 0302 clinical medicine Morpheme 0202 electrical engineering electronic engineering information engineering Hidden Markov model media_common Plain text business.industry computer.file_format language.human_language Identification (information) ComputingMethodologies_PATTERNRECOGNITION ComputingMethodologies_DOCUMENTANDTEXTPROCESSING 030221 ophthalmology & optometry Modern Standard Arabic language 020201 artificial intelligence & image processing Artificial intelligence Precision and recall business computer Natural language processing |
Zdroj: | 2016 4th Saudi International Conference on Information Technology (Big Data Analysis) (KACSTIT). |
DOI: | 10.1109/kacstit.2016.7756076 |
Popis: | The main drawback of unsupervised approaches in Natural Language Processing (NLP) is often their low accuracy. Nevertheless, they remain a practical shortcut to accommodate a language that lacks theorization and/or computerization. The present article reports an unsupervised identification of morphemes boundaries. Based on an intuitive yet formal definition of event dependence, the approach’s input is no more than a plain text. Tests on two languages of totally different families of morphology: Arabic and English show a very acceptable precision and recall. A deeper refinement of the output allowed 89% precision and 78% recall on Arabic. While most of the NLP research works published on the Arabic, focus on the Modern Standard Arabic (MSA), we managed to build a traditional Standard Arabic data set to test our approach. |
Databáze: | OpenAIRE |
Externí odkaz: |