Effective Automated Transformer Model based Sarcasm Detection Using Multilingual Data.

Autor: Sukhavasi, Vidyullatha, Dondeti, Venkatesulu
Předmět:
Zdroj: Multimedia Tools & Applications; May2024, Vol. 83 Issue 16, p47531-47562, 32p
Abstrakt: Sarcasm detection is crucial for social media users to understand more about the underlying facts. However, determining the sarcasm only from text is not appropriate for recently updated social networks. It can be overcome by analyzing both the emoji and text data. Therefore, bilingual data in Hindi and English with emojis are offered as input to the proposed model. Traditionally, different transformer models were developed for efficient sarcasm detection, but such models haven't reached a satisfactory position in the performance enhancement chart. Therefore, in this proposed model, the attention based transformer model is developed, which shows effective performance in analyzing both the emoji and text data. Using raw data in the transformer model will reduce the accuracy rate, therefore, to overcome such an issue, the pre-processing steps like stop word removal, case folding, filtering, lemmatization, stemming, and tokenization are initially performed over the input data. After pre-processing, the Average based Term Frequency-Inverse Document Frequency (ATF-IDF) approach is used to extract the textual features. The Gated Temporal Bidirectional Convolution Network (GT-BiCNet) is used to create the text model. The emoji-to-vector model (E-VM) is used to construct the Emoji model and express the features as vectors. The produced models obtained TexMoJ features concatenated using a deep feature fusion method. The resultant vectors are used to classify the feature vectors using the deep learning model Attention LSTM based on Amended Bidirectional Encoder Representation from Transformers (ALABerT). The network model's losses are reduced by using the Enhanced Pelican Optimization Algorithm (EpoA). The softmax layer efficiently separates the data into sarcasm and non-sarcasm. The proposed method is compared to many current methodologies regarding various performances. The English Twitter dataset has attained 99.1% accuracy, 99.2% precision, 99.1% recall, 99.1% F-measure, an execution time of 56.66 s, and an average threshold of 12364.365 s. The accuracy, recall, precision, and F-measure of the Hindi Twitter dataset are 98.1%, 98.41%, 98.2%, and 69.6%, respectively. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index