A tweet sentiment classification approach using an ensemble classifier

Autor: Vidyashree KP, Rajendra AB, Gururaj HL, Vinayakumar Ravi, Moez Krichen
Jazyk: angličtina
Rok vydání: 2024
Předmět:
Zdroj: International Journal of Cognitive Computing in Engineering, Vol 5, Iss , Pp 170-177 (2024)
Druh dokumentu: article
ISSN: 2666-3074
DOI: 10.1016/j.ijcce.2024.04.001
Popis: Social media users are more receptive to products or events and share their thoughts through raw textual data, which is classified as semi-structured data. This data, which is presented using a variety of terminologies, is noisy by nature but yet contains important information and superfluous details, giving analysts a way to identify patterns and knowledge. This hidden information must be extracted from language data in order to make informed decisions and create strategic plans for entering new markets. Among the most prominent fields of study are natural language processing (NLP) and data mining techniques, especially when it comes to sentiment analysis—the process of identifying the feelings and insights concealed in the data. Twitter is one of the significant microblogging platform with millions of users. These users use Twitter to share sentiments using hash tags on different topics and to make status updates known as tweets. Twitter is therefore regarded as a significant real-time source and as one of the most active opinion indicators. The volume of information is produced by Twitter is enormous and manually scanning the entire data set is difficult process. The paper proposed an ensemble classifier to categorize emotion of the tweets on the basis of polarities such as positive and negative.In our study, we ensemble classifiers which is a combination of Random Forest (RF), Support Vector Machine (SVM) and Decision Tree (DT). The data is collected from Twitter API and the Twitter data is analysed autonomously to define public view on particular topic. The features obtained after the process of dimensionality reduction using LDA undergoes the stage of feature selection using Wrapper based technique. The iterative Wrapper based technique predict score for the features, the features with low score are ignored and high score is proceeded for classification. The ensemble classifier used Adaptive Boosting (AdaBoost) technique where the output from the Machine Learning (ML) classifiers are combined to produce a single output. Adaboost combines the poor classifiers and extracts the prediction value to make a better classifier. The experimental results show that the proposed ensemble classifier provides better accuracy of 93.42 % that is comparatively better than existing Convolutional Bidirectional - Long Short-Term Memory (ConvBiLSTM) classifier and Hybrid Lexicon- Naïve Bayes Classifier (HL-NBC) which produce classification accuracy of 91.53 % and 89.61 % respectively.
Databáze: Directory of Open Access Journals