Automatic Breast Cancer Survivor Detection from Social Media for Studying Latent Factors Affecting Treatment Success

Autor:	Abeed Sarker, Mohammed Ali Al-Garadi, Yuan-Chi Yang, Sahithi Lakamana, Jie Lin, Sabrina Li, Angel Xie, Whitney Hogg-Bremer, Mylin Torres, Imon Banerjee
Rok vydání:	2020
Předmět:	Learning classifier system business.industry medicine.medical_treatment Machine learning computer.software_genre medicine.disease Mental health Cancer recurrence Treatment success Breast cancer medicine Drug side effects Social media Artificial intelligence Hormone therapy business computer
DOI:	10.1101/2020.05.17.20104778
Popis:	Breast cancer patients often discontinue their long-term treatments, such as hormone therapy, increasing the risk of cancer recurrence. These discontinuations are often caused by adverse patient-centered outcomes (PCOs) due to hormonal drug side effects or other factors. PCOs are not detectable through laboratory tests and are sparsely documented in electronic health records. Thus, there is a need to explore other sources of information for PCOs associated with breast cancer treatments. Social media is a promising resource, but extracting true PCOs from it first requires the accurate detection of breast cancer patients. We describe a natural language processing (NLP) architecture for automatically detecting breast cancer patients from Twitter based on their self-reports. The architecture employs breast cancer-related keywords to collect streaming data from Twitter, applies NLP patterns to pre-filter noisy posts, and then employs a machine learning classifier trained using manually-annotated data (n=5019) for distinguishing firsthand self-reports of breast cancer from other tweets. A classifier based on bidirectional encoder representations from transformers (BERT) showed human-like performance and achieved F1-score of 0.857 (inter-annotator agreement: 0.845; Cohen's kappa) for the positive class, considerably outperforming the next best classifier--a deep neural network (F1-score: 0.665). Qualitative analyses of posts from automatically-detected users revealed discussions about side effects, non-adherence, and mental health conditions, illustrating the feasibility of our social media-based approach for studying breast cancer-related PCOs from a large population.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::fde5f74980eec3099ccba022b661c9d7 https://doi.org/10.1101/2020.05.17.20104778 Zobrazit plný text záznamu