'Thought I'd Share First' and Other Conspiracy Theory Tweets from the COVID-19 Infodemic: Exploratory Study

Autor: Chrysm Watson Ross, Geoffrey Fairchild, Courtney D. Shelley, Ashlynn R. Daughton, Nidhi Parikh, Travis Pitts, Nidia Yadria Vaquera Chavez, Dax Gerts
Jazyk: angličtina
Rok vydání: 2020
Předmět:
conspiracy
FOS: Computer and information sciences
Computer Science - Machine Learning
020205 medical informatics
Conspiracy theory
coronavirus
02 engineering and technology
Machine Learning (cs.LG)
0302 clinical medicine
infodemic
Statistics - Machine Learning
vaccine
0202 electrical engineering
electronic engineering
information engineering

030212 general & internal medicine
Misinformation
communication
lcsh:Public aspects of medicine
public health
Computer Science - Social and Information Networks
Data matching
machine learning
vaccine hesitancy
Psychology
medicine.medical_specialty
Coronavirus disease 2019 (COVID-19)
social media
Twitter
Internet privacy
Exploratory research
Health Informatics
Machine Learning (stat.ML)
unsupervised learning
infodemiology
supervised learning
03 medical and health sciences
conspiracy theories
active learning
medicine
health communication
Humans
Social media
misinformation
Social and Information Networks (cs.SI)
Original Paper
Information Dissemination
business.industry
Public health
Public Health
Environmental and Occupational Health

COVID-19
lcsh:RA1-1270
business
5G
random forest
Zdroj: JMIR Public Health and Surveillance, Vol 7, Iss 4, p e26527 (2021)
JMIR Public Health and Surveillance
Popis: Background The COVID-19 outbreak has left many people isolated within their homes; these people are turning to social media for news and social connection, which leaves them vulnerable to believing and sharing misinformation. Health-related misinformation threatens adherence to public health messaging, and monitoring its spread on social media is critical to understanding the evolution of ideas that have potentially negative public health impacts. Objective The aim of this study is to use Twitter data to explore methods to characterize and classify four COVID-19 conspiracy theories and to provide context for each of these conspiracy theories through the first 5 months of the pandemic. Methods We began with a corpus of COVID-19 tweets (approximately 120 million) spanning late January to early May 2020. We first filtered tweets using regular expressions (n=1.8 million) and used random forest classification models to identify tweets related to four conspiracy theories. Our classified data sets were then used in downstream sentiment analysis and dynamic topic modeling to characterize the linguistic features of COVID-19 conspiracy theories as they evolve over time. Results Analysis using model-labeled data was beneficial for increasing the proportion of data matching misinformation indicators. Random forest classifier metrics varied across the four conspiracy theories considered (F1 scores between 0.347 and 0.857); this performance increased as the given conspiracy theory was more narrowly defined. We showed that misinformation tweets demonstrate more negative sentiment when compared to nonmisinformation tweets and that theories evolve over time, incorporating details from unrelated conspiracy theories as well as real-world events. Conclusions Although we focus here on health-related misinformation, this combination of approaches is not specific to public health and is valuable for characterizing misinformation in general, which is an important first step in creating targeted messaging to counteract its spread. Initial messaging should aim to preempt generalized misinformation before it becomes widespread, while later messaging will need to target evolving conspiracy theories and the new facets of each as they become incorporated.
Databáze: OpenAIRE