Autor: |
Muhammad Shahid Iqbal Malik, Anna Nazarova, Mona Mamdouh Jamjoom, Dmitry I. Ignatov |
Jazyk: |
angličtina |
Rok vydání: |
2023 |
Předmět: |
|
Zdroj: |
Journal of King Saud University: Computer and Information Sciences, Vol 35, Iss 8, Pp 101736- (2023) |
Druh dokumentu: |
article |
ISSN: |
1319-1578 |
DOI: |
10.1016/j.jksuci.2023.101736 |
Popis: |
Hope Speech Detection (HSD) from social media is a new direction for promoting and supporting positive content to encourage harmony and positivity in society. As users of social media belong to different linguistic communities, hope speech detection is rarely studied as a multilingual task considering low-resource languages. Moreover, prior studies explored only monolingual techniques, and the Russian language is not addressed. This study tackles the issue of Multi-lingual Hope Speech Detection (MHSD) in English and Russian languages using the transfer learning paradigm with fine-tuning approach. We explore joint multi-lingual and translation-based approaches to tackle the task of multilingualism, where the latter approach adopts the translation mechanism to transform all content into one language and then classify them. The joint multi-lingual method handles it by designing a universal classifier for various languages. We explore the strengths of the Robustly Optimized BERT Pre-Training Approach (RoBERTa) that showed a benchmark in capturing the semantics and contextual information within the content. The proposed framework consists of several stages: 1) data preprocessing, 2) representation of data using RoBERTa models, 3) fine-tuning phase, and 4) classification of hope speech into two labels. A new Russian corpus for hope speech detection is built, containing YouTube comments. Several experiments are conducted in English and Russian languages by using semi-supervised bilingual English and Russian datasets. The findings show that the proposed framework demonstrated benchmark performance and outperformed the baselines. Furthermore, the translation-based approach (Russian-RoBERTa) offered the best performance by achieving 94% accuracy and 80.24% f1-score. |
Databáze: |
Directory of Open Access Journals |
Externí odkaz: |
|