An augmented multilingual Twitter dataset for studying the COVID-19 infodemic

Autor:	Christian E. Lopez, Caleb Gallemore
Jazyk:	angličtina
Rok vydání:	2021
Předmět:	Review Paper Data collection Coronavirus disease 2019 (COVID-19) Computer science Communication Sentiment analysis Twitter COVID-19 computer.software_genre Data science Named Entity Recognition Semantic network Computer Science Applications Human-Computer Interaction Named-entity recognition Media Technology Social media Duration (project management) computer Information Systems Statistical hypothesis testing
Zdroj:	Social Network Analysis and Mining
ISSN:	1869-5469 1869-5450
Popis:	This work presents an openly available dataset to facilitate researchers' exploration and hypothesis testing about the social discourse of the COVID-19 pandemic. The dataset currently consists of over 2.2 billions tweets (count as of September, 2021), from all over the world, in multiple languages. Tweets start from January 22, 2020, when the total cases of reported COVID-19 were below 600 worldwide. The dataset was collected using the Twitter API and by rehydrating tweets from other available datasets, data collection is ongoing as of the time of writing. To facilitate hypothesis testing and exploration of social discourse, the English and Spanish tweets have been augmented with state-of-the-art Twitter Sentiment and Named Entity Recognition algorithms. The dataset and the summary files provided allow researchers to avoid some computationally intensive analyses, facilitating more widespread use of social media data to gain insights on issues such as (mis)information diffusion, semantic networks, sentiments, and the evolution of COVID-19 discussions. In addition, the dataset provides an archive for researchers in the social sciences wishing to have access to a dataset covering the entire duration of the pandemic.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::7b4c8a02f342d52f482157be52f7ba81 http://europepmc.org/articles/PMC8528187 Zobrazit plný text záznamu Full text from SpringerLink