Machine Learning for Arabic Text Classification: A Comparative Study

Autor: Djelloul Bouchiha, Abdelghani Bouziane, Noureddine Doumi
Jazyk: angličtina
Rok vydání: 2022
Předmět:
Zdroj: Malaysian Journal of Science and Advanced Technology, Vol 2, Iss 4 (2022)
Druh dokumentu: article
ISSN: 2785-8901
DOI: 10.56532/mjsat.v2i4.83
Popis: The ultimate aim of Machine Learning (ML) is to make machine acts like a human. In particular, ML algorithms are widely used to classify texts. Text classification is the process of classifying texts into a predefined set of categories based on the texts’ content. It contributes to improving information retrieval on the Web. In this paper, we focus on the "Arabic" text classification since there is a large community in the world that uses this language. The Arabic text classification process consists of three main steps: preprocessing, feature extraction and ML algorithm. This paper presents a comparative empirical study to see which combination (feature extraction - ML algorithm) acts well when dealing with Arabic documents. So, we implemented one hundred sixty classifiers by combining 5 feature extraction techniques and 32 machine learning algorithms. Then, we made these classifiers open access for the benefit of the AI and NLP communities. Experiments were carried out using a huge open dataset. The comparison study reveals that TFIDF-Perceptron is the best performing combination of a classifier.
Databáze: Directory of Open Access Journals