Character gated recurrent neural networks for Arabic sentiment analysis.

Autor: Omara E; Agriculture Directorate, Cairo Governorate, Cairo, Egypt. e_omara@hotmail.com., Mousa M; Faculty of Electronic Engineering, Menofia University, Menouf, Egypt., Ismail N; Faculty of Electronic Engineering, Menofia University, Menouf, Egypt.
Jazyk: angličtina
Zdroj: Scientific reports [Sci Rep] 2022 Jun 13; Vol. 12 (1), pp. 9779. Date of Electronic Publication: 2022 Jun 13.
DOI: 10.1038/s41598-022-13153-w
Abstrakt: Sentiment analysis is a Natural Language Processing (NLP) task concerned with opinions, attitudes, emotions, and feelings. It applies NLP techniques for identifying and detecting personal information from opinionated text. Sentiment analysis deduces the author's perspective regarding a topic and classifies the attitude polarity as positive, negative, or neutral. In the meantime, deep architectures applied to NLP reported a noticeable breakthrough in performance compared to traditional approaches. The outstanding performance of deep architectures is related to their capability to disclose, differentiate and discriminate features captured from large datasets. Recurrent neural networks (RNNs) and their variants Long-Short Term Memory (LSTM), Gated Recurrent Unit (GRU), Bi-directional Long-Short Term Memory (Bi-LSTM), and Bi-directional Gated Recurrent Unit (Bi-GRU) architectures are robust at processing sequential data. They are commonly used for NLP applications as they-unlike RNNs-can combat vanishing and exploding gradients. Also, Convolution Neural Networks (CNNs) were efficiently applied for implicitly detecting features in NLP tasks. In the proposed work, different deep learning architectures composed of LSTM, GRU, Bi-LSTM, and Bi-GRU are used and compared for Arabic sentiment analysis performance improvement. The models are implemented and tested based on the character representation of opinion entries. Moreover, deep hybrid models that combine multiple layers of CNN with LSTM, GRU, Bi-LSTM, and Bi-GRU are also tested. Two datasets are used for the models implementation; the first is a hybrid combined dataset, and the second is the Book Review Arabic Dataset (BRAD). The proposed application proves that character representation can capture morphological and semantic features, and hence it can be employed for text representation in different Arabic language understanding and processing tasks.
(© 2022. The Author(s).)
Databáze: MEDLINE
Nepřihlášeným uživatelům se plný text nezobrazuje