Sentiment analysis on labeled and unlabeled datasets using BERT architecture

Autor: Koyel Chakraborty, Siddhartha Bhattacharyya, Rajib Bag
Rok vydání: 2022
DOI: 10.21203/rs.3.rs-1822017/v1
Popis: Sentiment analysis (SA) is the study of human perception in any subject of practice. It retrieves data from datasets using Natural Language Processing (NLP) methodologies and algorithms that are either regulation-based, blended, or rely on machine learning approaches. SA is garnering fame for its capacity to fit in a large chunk of data with user evaluations, uncover a trend, and come to a consensus derived from real facts rather than hypotheses established on a limited number of observations. The flexible nature of sentiment gathering has helped in playing a critical role in both commercial and research applications in the last few years. This study presents new sentiment analysis models based on Bidirectional Encoder Representations from Transformers (BERT) for both labeled and unlabeled datasets. The labeled datasets using supervised learning are modeled in a hybrid architecture of fine-tuned BERT and interval Type − 2 fuzzy sets. The inclusion of interval Type-2 fuzzy logic for handling reluctance or inaccuracy in data shows commendable results for the labeled datasets. For the prediction of sentiments in unlabeled datasets, they are embedded through a BERT tokenizer with the help of a threshold and activation functions. The coupling of a multi-layer perceptron with the BERT parser substantially decreases the time and complexity compared to supervised learning. Both the models have been implemented on multiple datasets and have outperformed existing state-of-the-art techniques in this field.
Databáze: OpenAIRE