Machine Learning for Multilabel Emotion Classification in Arabic Tweets

Autor: Enas A. H. Khalil, Enas M.F. El Houby, Hoda K. Mohamed
Rok vydání: 2022
Předmět:
Zdroj: International Journal of Computer Science and Mobile Computing. 11:222-233
ISSN: 2320-088X
DOI: 10.47760/ijcsmc.2022.v11i05.010
Popis: Multilabel emotion classification is a high priority because it mimics real-life scenarios in which people display a variety of emotions. The text could express a collection of emotions such as happiness, love, and optimism, or sadness, anger, and pessimism. In this framework, the Arabic tweets data provided by SemEval 2018-Task1, E-c subtask have been first preprocessed through different normalization steps, including stemming, stop word removal, special characters, and digits removal. An emotion lexicon has been built to replace the emotions with their meaning related to emotion classes. A word embedding pre-trained model Aravec has been implemented for the feature extraction process because word embedding performed better in this task than other features such as the N-gram model. In the classification process of our framework, different machine learning techniques have been implemented, including Multi-Layer Perceptron (MLP), Support Vector Machine SVM, K Nearest Neighbor (KNN), Ensemble Random Forest (RF), and Ensemble Extra Tree. The best performance was achieved using MLP, whereas SVM proved to perform best over other Traditional machine learning techniques such as KNN, RF, and Extra tree. Extra tree achieved a multilabel Jaccard accuracy of 26.2%, Nearest Neighbor (KNN) of 37.5%, Ensemble Random Forest (RF) of 29.1%, and SVM accuracy of 46.3%. A neural network model Multi-Layer Perceptron (MLP), achieved an accuracy of 48%. The proposed framework has been compared with different previous machine learning models built for this task; the results obtained by the proposed framework outperform other previous models in most cases.
Databáze: OpenAIRE