How the world's collective attention is being paid to a pandemic: COVID-19 related n-gram time series for 24 languages on Twitter

Autor: Andrew J. Reagan, Jane Lydia Adams, Roby Muhamad, Thayer Alshaabi, Peter Sheridan Dodds, Michael Vincent Arnold, David Rushing Dewhurst, Joshua R. Minot, Christopher M. Danforth
Jazyk: angličtina
Rok vydání: 2020
Předmět:
FOS: Computer and information sciences
Viral Diseases
Epidemiology
Social Sciences
02 engineering and technology
Geographical locations
Medical Conditions
0302 clinical medicine
Sociology
Pandemic
Medicine and Health Sciences
0202 electrical engineering
electronic engineering
information engineering

Psychology
Attention
030212 general & internal medicine
Virus Testing
Data Management
Language
Multidisciplinary
Social Communication
Computer Science - Social and Information Networks
Public relations
Infectious Diseases
Social Networks
Medicine
020201 artificial intelligence & image processing
Coronavirus Infections
Network Analysis
Brazil
Research Article
Computer and Information Sciences
Physics - Physics and Society
Science
Twitter
FOS: Physical sciences
Physics and Society (physics.soc-ph)
03 medical and health sciences
Politics
Diagnostic Medicine
Humans
Social media
China
Set (psychology)
Pandemics
Socioeconomic status
Retrospective Studies
Social and Information Networks (cs.SI)
Series (stratigraphy)
Divergence (linguistics)
SARS-CoV-2
business.industry
Data Visualization
Biology and Life Sciences
COVID-19
Covid 19
South America
Communications
Collective Human Behavior
People and places
business
Social Media
Zdroj: PLoS ONE
PLoS ONE, Vol 16, Iss 1, p e0244476 (2021)
Popis: In confronting the global spread of the coronavirus disease COVID-19 pandemic we must have coordinated medical, operational, and political responses. In all efforts, data is crucial. Fundamentally, and in the possible absence of a vaccine for 12 to 18 months, we need universal, well-documented testing for both the presence of the disease as well as confirmed recovery through serological tests for antibodies, and we need to track major socioeconomic indices. But we also need auxiliary data of all kinds, including data related to how populations are talking about the unfolding pandemic through news and stories. To in part help on the social media side, we curate a set of 2000 day-scale time series of 1- and 2-grams across 24 languages on Twitter that are most 'important' for April 2020 with respect to April 2019. We determine importance through our allotaxonometric instrument, rank-turbulence divergence. We make some basic observations about some of the time series, including a comparison to numbers of confirmed deaths due to COVID-19 over time. We broadly observe across all languages a peak for the language-specific word for 'virus' in January 2020 followed by a decline through February and then a surge through March and April. The world's collective attention dropped away while the virus spread out from China. We host the time series on Gitlab, updating them on a daily basis while relevant. Our main intent is for other researchers to use these time series to enhance whatever analyses that may be of use during the pandemic as well as for retrospective investigations.
13 pages, 6 figures, 3 tables, website: http://compstorylab.org/covid19ngrams/
Databáze: OpenAIRE