Impact of Text Pre-processing and Ensemble Learning on Arabic Sentiment Analysis

Autor:	Samir Belfkih, Ayoub Ait Lahcen, Ahmed Oussous
Rok vydání:	2019
Předmět:	Stop words business.industry Computer science Arabic Sentiment analysis 02 engineering and technology computer.software_genre Ensemble learning Field (computer science) language.human_language Support vector machine Naive Bayes classifier Text mining 020204 information systems 0202 electrical engineering electronic engineering information engineering language Web application 020201 artificial intelligence & image processing Artificial intelligence Precision and recall business computer Natural language processing
Zdroj:	NISS
DOI:	10.1145/3320326.3320399
Popis:	Nowadays, with the rapid growth and spread of web platforms such as social networks, online review websites and blogs, people can openly express and share their opinions. They can rate products or comment various subjects. Thus, a new field called web based Sentiment Analysis (SA) or Opinion Mining has emerged. In general, SA is the process of classifying opinions and sentiments as positive, negative or neutral. Many studies were performed on SA for languages such as English, Spanish and French. However, the research on SA of Arabic text is very limited. The goal of this paper is to measure the impact of the preprocessing phase on Arabic Sentiment Analysis in terms of various aspects such as accuracy, precision and recall. We have conducted experimentations using different stemming (Khoja, ISRI, Tashaphyne, Light10, and MOTAZ), n-gram, and stop words. The second goal is to study the impact of combining multiple classifiers on Arabic sentiment analysis. For this reason, the vote algorithm in conjunction with three classifiers, namely Naive Bayes, Support Vector Machine (SVM), and Maximum Entropy have been used and evaluated using k-fold cross-validation.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::8f863c5bec0780b439339e72bf2ebcb9 https://doi.org/10.1145/3320326.3320399 Zobrazit plný text záznamu