Modelling sentiments based on objectivity and subjectivity with self-attention mechanisms.

Autor: Ng H; Faculty of Computing and Informatics, Multimedia University, Cyberjaya, Selangor, 63100, Malaysia., Chia GJW; Faculty of Computing and Informatics, Multimedia University, Cyberjaya, Selangor, 63100, Malaysia., Yap TTV; Faculty of Computing and Informatics, Multimedia University, Cyberjaya, Selangor, 63100, Malaysia., Goh VT; Faculty of Engineering, Multimedia University, Cyberjaya, Selangor, 63100, Malaysia.
Jazyk: angličtina
Zdroj: F1000Research [F1000Res] 2021 Oct 04; Vol. 10, pp. 1001. Date of Electronic Publication: 2021 Oct 04 (Print Publication: 2021).
DOI: 10.12688/f1000research.73131.2
Abstrakt: Background : The proliferation of digital commerce has allowed merchants to reach out to a wider customer base, prompting a study of customer reviews to gauge service and product quality through sentiment analysis. Sentiment analysis can be enhanced through subjectivity and objectivity classification with attention mechanisms. Methods : This research includes input corpora of contrasting levels of subjectivity and objectivity from different databases to perform sentiment analysis on user reviews, incorporating attention mechanisms at the aspect level. Three large corpora are chosen as the subjectivity and objectivity datasets, the Shopee user review dataset (ShopeeRD) for subjectivity, together with the Wikipedia English dataset (Wiki-en) and Internet Movie Database (IMDb) for objectivity. Word embeddings are created using Word2Vec with Skip-Gram. Then, a bidirectional LSTM with an attention layer (LSTM-ATT) imposed on word vectors. The performance of the model is evaluated and benchmarked against classification models of Logistics Regression (LR) and Linear SVC (L-SVC). Three models are trained with subjectivity (70% of ShopeeRD) and the objectivity (Wiki-en) embeddings, with ten-fold cross-validation. Next, the three models are evaluated against two datasets (IMDb and 20% of ShopeeRD). The experiments are based on benchmark comparisons, embedding comparison and model comparison with 70-10-20 train-validation-test splits. Data augmentation using AUG-BERT is performed and selected models incorporating AUG-BERT, are compared. Results: L-SVC scored the highest accuracy with 56.9% for objective embeddings (Wiki-en) while the LSTM-ATT scored 69.0% on subjective embeddings (ShopeeRD).  Improved performances were observed with data augmentation using AUG-BERT, where the LSTM-ATT+AUG-BERT model scored the highest accuracy at 60.0% for objective embeddings and 70.0% for subjective embeddings, compared to 57% (objective) and 69% (subjective) for L-SVC+AUG-BERT, and 56% (objective) and 68% (subjective) for L-SVC. Conclusions : Utilizing attention layers with subjectivity and objectivity notions has shown improvement to the accuracy of sentiment analysis models.
Competing Interests: No competing interests were disclosed.
(Copyright: © 2022 Ng H et al.)
Databáze: MEDLINE