Storywrangler: A massive exploratorium for sociolinguistic, cultural, socioeconomic, and political timelines using Twitter
Autor: | Michael Vincent Arnold, David Rushing Dewhurst, Jane Lydia Adams, Thayer Alshaabi, Peter Sheridan Dodds, Joshua R. Minot, Christopher M. Danforth, Andrew J. Reagan |
---|---|
Rok vydání: | 2020 |
Předmět: |
FOS: Computer and information sciences
Physics - Physics and Society History Sociotechnical system Language change FOS: Physical sciences Popular culture Social Sciences Physics and Society (physics.soc-ph) 02 engineering and technology World Wide Web Numeral system 03 medical and health sciences 0202 electrical engineering electronic engineering information engineering Research Resource 030304 developmental biology Social and Information Networks (cs.SI) 0303 health sciences Computer Science - Computation and Language Multidisciplinary SciAdv r-resources Computer Science - Social and Information Networks Timeline Popularity Compendium Dynamics (music) Computer Science 020201 artificial intelligence & image processing Computation and Language (cs.CL) |
Zdroj: | Science Advances |
ISSN: | 2375-2548 |
Popis: | In real-time, social media data strongly imprints world events, popular culture, and day-to-day conversations by millions of ordinary people at a scale that is scarcely conventionalized and recorded. Vitally, and absent from many standard corpora such as books and news archives, sharing and commenting mechanisms are native to social media platforms, enabling us to quantify social amplification (i.e., popularity) of trending storylines and contemporary cultural phenomena. Here, we describe Storywrangler, a natural language processing instrument designed to carry out an ongoing, day-scale curation of over 100 billion tweets containing roughly 1 trillion 1-grams from 2008 to 2021. For each day, we break tweets into unigrams, bigrams, and trigrams spanning over 100 languages. We track n-gram usage frequencies, and generate Zipf distributions, for words, hashtags, handles, numerals, symbols, and emojis. We make the data set available through an interactive time series viewer, and as downloadable time series and daily distributions. Although Storywrangler leverages Twitter data, our method of extracting and tracking dynamic changes of n-grams can be extended to any similar social media platform. We showcase a few examples of the many possible avenues of study we aim to enable including how social amplification can be visualized through 'contagiograms'. We also present some example case studies that bridge n-gram time series with disparate data sources to explore sociotechnical dynamics of famous individuals, box office success, and social unrest. Main text: 15 pages, 6 figures; Supplementary text: 23 pages, 11 figures, 15 tables. Website: https://storywrangling.org/ |
Databáze: | OpenAIRE |
Externí odkaz: |