Employing synthetic data for addressing the class imbalance in aspect-based sentiment classification

Autor: Vaishali Ganganwar, Ratnavel Rajalakshmi
Jazyk: angličtina
Rok vydání: 2024
Předmět:
Zdroj: Journal of Information and Telecommunication, Vol 8, Iss 2, Pp 167-188 (2024)
Druh dokumentu: article
ISSN: 24751839
2475-1847
2475-1839
DOI: 10.1080/24751839.2023.2270824
Popis: ABSTRACTThe class imbalance problem, in which the distribution of different classes in training data is unequal or skewed, is a prevailing problem. This can lead to classifier algorithms being biased, negatively impacting the performance of the minority class. In this paper, we addressed the class imbalance problem in datasets for aspect-based sentiment classification. Aspect-based Sentiment Classification (AbSC) is a type of fine-grained sentiment analysis in which sentiments about particular aspects of an entity are extracted. In this work, we addressed the issue of class imbalance by creating synthetic data. For synthetic data generation, two techniques have been proposed: paraphrasing using the PEGASUS fine-tuned model and backtranslation using the M2M100 neural machine translation model. We compared these techniques with two other class balancing techniques, such as weighted oversampling and cross-entropy loss with class weight. An extensive experimental study has been conducted on three benchmark datasets for restaurant reviews: SemEval-2014, SemEval-2015, and SemEval-2016. We applied these methods to the BERT-based deep learning model for aspect-based sentiment classification and studied the effect of balancing the data on the performance of these models. Our proposed balancing technique, using synthetic data, yielded better results than the other two existing methods for dealing with multi-class imbalance.
Databáze: Directory of Open Access Journals