Generating interpretable predictions about antidepressant treatment stability using supervised topic models

Autor: Thomas H. McCoy, Michael C. Hughes, Roy H. Perlis, Andrew S. Ross, Finale Doshi-Velez, Melanie F. Pradier
Rok vydání: 2020
Předmět:
DOI: 10.1101/2020.03.18.20038232
Popis: ImportanceIn the absence of readily-assessed and clinically-validated predictors of treatment response, pharmacologic management of major depressive disorder (MDD) often relies on trial and error.ObjectiveTo utilize electronic health records to identify predictors of treatment response, while preserving interpretability of predictions despite large numbers of covariates.DesignRetrospective cohort study.SettingTwo academic medical centers in Boston, including outpatient primary and specialty care clinics.Participants81,630 adults with a coded diagnosis of MDD.ExposureTreatment with 1 or more of 11 standard antidepressants.Main Outcomes and MethodsStable treatment, intended as a proxy for treatment effectiveness, defined as continued prescription of an antidepressant for 90 days. We trained supervised topic models to extract 10 interpretable covariates from coded clinical data for stability prediction. Then, using data from one hospital system (Site A) we trained generalized linear models and ensembles of decision trees to predict stability outcomes from topic features that summarize patient history. We evaluated on held-out patients from Site A as well as all individuals from a second hospital system (B).ResultsAmong the 81,630 adults (31% male; age 18-80 with mean 48.46), we identified 55,303 who reached a stable treatment regimen during follow-up. For held-out patients from Site A, mean area-under-the-receiver-operating-characteristic-curve (AUC) discrimination for general stability outcome was 0.627 (95% confidence interval (CI) 0.615 - 0.639) for our supervised topic model with 10 covariates. In evaluation on site B, our approach achieved similar AUC of 0.619 (95% CI 0.610 - 0.627). Building models to predict stability specific to a particular drug did not improve upon predicting general stability, even when using a harder-to-interpret ensemble classifier and 9,256 coded covariates (specific AUC = 0.647, 95% CI 0.635-0.658; general AUC = 0.661, 95% CI 0.648-0.672). Topics coherently captured clinical concepts associated with treatment response.Conclusions and RelevanceCoded clinical data available in electronic health records facilitated prediction of general treatment response, but not response to specific medications. While greater discrimination is likely required for clinical application, our results provide a simple and transparent baseline for such studies.FundingOracle Labs, Harvard SEAS, and National Institute of Mental Health.LinksSupplement document providing more results, links to interactive visualizations, and detailed procedures for reproducibility https://www.michaelchughes.com/papers/HughesEtAl_medRxiv2020_Supplement.pdfSTROBE checklist: https://www.michaelchughes.com/papers/HughesEtAl_medRxiv2020_STROBE_checklist.pdfOpen-source code for our proposed machine learning methods https://github.com/dtak/prediction-constrained-topic-modelsKey PointsQuestionHow well can coded clinical data from electronic health records be used to predict achievement of a stable antidepressant regimen in major depressive disorder?FindingsIn this in silico cohort study of 81,630 adults, we identified 55,303 who reached a stable antidepressant treatment regimen. Predictions using generalized linear models or ensembles of decision trees applied to diagnosis, procedure, and medication codes, as well as low-dimensional summaries of these codes via supervised topic models, achieved area under receiver operating characteristic curve values of ∼0.62-0.65; treatment-specific models performed no better than general treatment outcome models.MeaningCoded clinical data can facilitate prediction of antidepressant treatment outcomes, but medication-specific models do not outperform general response prediction models.
Databáze: OpenAIRE