Automatic Identification of Moroccan Colloquial Arabic

Autor: Karim Bouzoubaa, Si Lhoussaine Aouragh, Ridouane Tachicart, Hamid Jaafa
Rok vydání: 2018
Předmět:
Zdroj: Communications in Computer and Information Science ISBN: 9783319734996
ICALP
Popis: Language Identification is an NLP task which aims at predicting the language of a given text. For the Arabic dialects many attempts have been done to address this topic. In this paper, we present our approach to build a Language Identification system in order to distinguish between Moroccan Colloquial Arabic and Arabic languages using two different methods. The first is rule-based and relies on stop word frequency, while the second is statically-based and uses several machine learning classifiers. Obtained results show that the statistical approach outperforms the rule-based approach. Furthermore, the Support Vector Machines classifier is more accurate than other statistical classifiers. Our goal in this paper is to pave the way toward building advanced Moroccan dialect NLP tools such as morphological analyzer and machine translation system.
Databáze: OpenAIRE