Automatically Identifying Childhood Health Outcomes on Twitter for Digital Epidemiology in Pregnancy

Autor: Ari Z. Klein, José Agustín Gutiérrez Gómez, Lisa D. Levine, Graciela Gonzalez-Hernandez
Rok vydání: 2022
DOI: 10.1101/2022.11.01.22281813
Popis: Data are limited regarding associations between pregnancy exposures and childhood outcomes. The objectives of this preliminary study were to (1) assess the availability of Twitter data during pregnancy for users who reported having a child with attention deficit/hyperactivity disorder (ADHD), autism spectrum disorders (ASD), delayed speech, or asthma, and (2) automate the detection of these outcomes. We annotated 9734 tweets that mentioned these outcomes, posted by users who had reported their pregnancy, and used them to train and evaluate the automatic classification of tweets that reported these outcomes in their children. A classifier based on a RoBERTa-Large pretrained model achieved the highest F1-score of 0.93 (precision = 0.92 and recall = 0.94). Manually and automatically, we identified 3806 total users who reported having a child with ADHD (678 users), ASD (1744 users), delayed speech (902 users), or asthma (1255 users), enabling the use of Twitter data for large-scale observational studies.
Databáze: OpenAIRE