Twitter dataset on public sentiments towards biodiversity policy in Indonesia.

Autor: Uliniansyah MT; Research Organization for Electronics and Informatics, National Research and Innovation Agency, Jakarta Pusat 10340, Indonesia., Budi I; Faculty of Computer Science, University of Indonesia, Depok 16424, Indonesia., Nurfadhilah E; Research Organization for Electronics and Informatics, National Research and Innovation Agency, Jakarta Pusat 10340, Indonesia., Afra DIN; Research Organization for Electronics and Informatics, National Research and Innovation Agency, Jakarta Pusat 10340, Indonesia., Santosa A; Research Organization for Electronics and Informatics, National Research and Innovation Agency, Jakarta Pusat 10340, Indonesia., Latief AD; Research Organization for Electronics and Informatics, National Research and Innovation Agency, Jakarta Pusat 10340, Indonesia., Jarin A; Research Organization for Electronics and Informatics, National Research and Innovation Agency, Jakarta Pusat 10340, Indonesia., Gunarso; Research Organization for Electronics and Informatics, National Research and Innovation Agency, Jakarta Pusat 10340, Indonesia., Jiwanggi MA; Faculty of Computer Science, University of Indonesia, Depok 16424, Indonesia., Hidayati NN; Research Organization for Electronics and Informatics, National Research and Innovation Agency, Jakarta Pusat 10340, Indonesia., Fajri R; Research Organization for Electronics and Informatics, National Research and Innovation Agency, Jakarta Pusat 10340, Indonesia., Suryono RR; Universitas Teknokrat Indonesia, Bandar Lampung 35142, Indonesia., Pebiana S; Research Organization for Electronics and Informatics, National Research and Innovation Agency, Jakarta Pusat 10340, Indonesia., Shaleha S; Research Organization for Electronics and Informatics, National Research and Innovation Agency, Jakarta Pusat 10340, Indonesia., Ramdhani TW; Faculty of Computer Science, University of Indonesia, Depok 16424, Indonesia., Sampurno T; Research Organization for Electronics and Informatics, National Research and Innovation Agency, Jakarta Pusat 10340, Indonesia.
Jazyk: angličtina
Zdroj: Data in brief [Data Brief] 2023 Dec 01; Vol. 52, pp. 109890. Date of Electronic Publication: 2023 Dec 01 (Print Publication: 2024).
DOI: 10.1016/j.dib.2023.109890
Abstrakt: In recent years, biodiversity has emerged as a prominent and pressing topic due to the urgent need to address biodiversity loss and the recognition of its connections to climate change and sustainable development. Additionally, increased public awareness and the consideration of economic factors have further underscored the significance of biodiversity conservation. To investigate the sentiment of the Indonesian people towards biodiversity, we conducted a comprehensive data collection on Twitter, focusing on keywords we have set. We amassed a substantial dataset of 500,000 Indonesian tweets from January 2020 to March 2023. These tweets encompassed a wide range of discussions on biodiversity, including its subdomains such as food security, health, and environmental management. Three annotators labeled each tweet with a sentiment class (positive, negative, neutral), or label none for unrelated tweet. The final label was determined using the majority voting method. The tweets with the final label none and those with undecided sentiment class were considered invalid and excluded in the subsequent process. Before labeling, a team of 18 experts jointly developed a labeling guide. This document served as a reference in labeling. After going through a series of processes, including cleaning (removing duplications, irrelevant tweets, and tweets written other than in Indonesian) and preprocessing, we prepared a dataset containing 13,435 tweets. We measured the inter-annotator agreement level, made several models using different algorithms and the K-Fold cross-validation method, and evaluated the models. The Fleiss' Kappa value of the dataset was 0.62187 as the value of the inter-annotator agreement level, and the F1-score value with the best model using the pre-trained IndoBERT model was 0.7959. The Fleiss' Kappa and F1-score values suggest that the annotators have a substantial comprehension and agreement of how to label a tweet, thus ensuring consistency and reliability of our dataset, and the reusability of our dataset is quite suitable for further research on sentiment analysis on biodiversity, respectively. This dataset will benefit various research, including topic modeling, sentiment analysis, public opinion analysis on Twitter, etc., especially biodiversity-related policies.
(© 2023 The Author(s).)
Databáze: MEDLINE