Modeling Gender Dysphoria with Machine Learning and Natural Language Processing: Preliminary Implications for Technology-Delivered Interventions (Preprint)

Autor: Cory J. Cascalheira, Ryan E. Flinn, Yuxuan Zhao, Dannie Klooster, Danica Laparade, Shah M. Hamdi, Jillian R. Scheer, Alejandra Gonzalez, Emily M. Lund, Ivan N. Gomez, Koustuv Saha, Munmun De Choudhury
Rok vydání: 2023
Popis: BACKGROUND Many transgender and nonbinary (TNB) people face significant treatment barriers (e.g., healthcare discrimination) when seeking help for gender dysphoria. Technology-delivered interventions for TNB people can be used discretely, safely, and flexibly, thereby reducing such treatment barriers. Technology-delivered interventions are beginning to incorporate machine learning (ML) and natural language processing (NLP) to automate intervention components and tailor intervention content. A critical step in using ML and NLP in technology-delivered interventions is demonstrating how accurately these methods model gender dysphoria. OBJECTIVE The present study sought to determine the preliminary effectiveness of modeling gender dysphoria with ML and NLP. METHODS Six ML models and 949 NLP-generated independent variables were used to model gender dysphoria from the text data of 1,573 Reddit posts created on TNB-specific online forums. Qualitative content analysis was used to determine whether gender dysphoria was present in each post (i.e., the dependent variable). NLP transformed the linguistic content of each post into predictors for the ML algorithms. RESULTS Results indicated that a supervised ML algorithm (i.e., optimized extreme gradient boosting; XGBoost) modeled gender dysphoria with a high degree of accuracy (.84), precision (.83), and speed (1.23 seconds). Of the NLP-generated independent variables, DSM-5 clinical keywords (e.g., dysphoria, disorder) were most predictive of gender dysphoria. CONCLUSIONS These preliminary findings and initial validation evidence suggest ML- and NLP-based models of gender dysphoria have significant potential to be integrated into TNB-specific technology-delivered interventions.
Databáze: OpenAIRE