Bengali & Banglish: A monolingual dataset for emotion detection in linguistically diverse contexts

Autor: Moshiur Rahman Faisal, Ashrin Mobashira Shifa, Md Hasibur Rahman, Mohammed Arif Uddin, Rashedur M. Rahaman
Jazyk: angličtina
Rok vydání: 2024
Předmět:
Zdroj: Data in Brief, Vol 55, Iss , Pp 110760- (2024)
Druh dokumentu: article
ISSN: 2352-3409
84442921
DOI: 10.1016/j.dib.2024.110760
Popis: The ever-evolving global landscape of communication, driven by Information Technology advancements, underscores the importance of emotion detection in natural language processing. However, challenges persist in interpreting emotions within linguistically diverse contexts, notably in low-resource languages like Bengali, compounded by the emergence of Banglish. To address this gap, we present “Bengali & Banglish,” an extensive dataset comprising 80,098 labelled samples across six emotion classes. Our dataset fills a void in fine-grained emotion classification for Bengali and pioneers in emotion detection in Banglish. We achieve significant performance metrics through meticulous annotation and rigorous evaluation, including a weighted F1 score of 71.30% for Bengali and 64.59% for Banglish using BanglaBERT. Also, our dataset facilitates Bengali-to-Banglish Machine Translation, contributing to the advancement of language processing models. Furthermore, our dataset demonstrates a high Cohen's Kappa score of 93.5%, affirming the reliability and consistency of our annotations. This research underscores the importance of linguistic diversity in NLP and provides a valuable resource for enhancing Emotion Detection capabilities in Bengali and Banglish across digital platforms.
Databáze: Directory of Open Access Journals