Validation of a machine learning approach to estimate Clinical Disease Activity Index Scores for rheumatoid arthritis.

Autor: Spencer AK; Data Science, OM1 Inc, Boston, Massachusetts, USA., Bandaria J; Data Science, OM1 Inc, Boston, Massachusetts, USA., Leavy MB; Research, OM1 Inc, Boston, Massachusetts, USA mleavy@om1.com., Gliklich B; Research, Noble and Greenough School, Dedham, Massachusetts, USA., Su Z; Biostatistics, OM1 Inc, Boston, Massachusetts, USA., Curhan G; Research, OM1 Inc, Boston, Massachusetts, USA., Boussios C; Data Science, OM1 Inc, Boston, Massachusetts, USA.
Jazyk: angličtina
Zdroj: RMD open [RMD Open] 2021 Nov; Vol. 7 (3).
DOI: 10.1136/rmdopen-2021-001781
Abstrakt: Objective: Disease activity measures, such as the Clinical Disease Activity Index (CDAI), are important tools for informing treatment decisions and monitoring patient outcomes in rheumatoid arthritis (RA). Yet, documentation of CDAI scores in electronic medical records and other real-world data sources is inconsistent, making it challenging to use these data for research. The purpose of this study was to validate a machine learning model to estimate CDAI scores for patients with RA using clinical notes.
Methods: A machine learning model was developed to estimate CDAI score values using clinical notes from a specific rheumatology visit. Data from the OM1 RA Registry were used to create a training cohort of 56 177 encounters and a separate validation cohort of 18 726 encounters, 11 985 of which passed a model-derived confidence filter; all included encounters had both a clinician-recorded CDAI score and a clinical note. Model performance was assessed using the area under the receiver operating characteristic curve (AUC), positive predictive value (PPV) and negative predictive value (NPV), calculated using a binarised version of the outcome. The Spearman's R and Pearson's R values were also calculated.
Results: The model had a PPV of 0.80, NPV of 0.84 and AUC of 0.88 when evaluating performance using the binarised version of the outcome. The model had a Spearman's R value of 0.72 and a Pearson's R value of 0.69 when evaluating performance using the continuous CDAI numeric scores.
Conclusion: A machine learning model estimates CDAI scores from clinical notes with good performance. Application of the model to real-world data sets may allow estimated CDAI scores to be used for research purposes.
Competing Interests: Competing interests: The authors indicated are employees of OM1, which is involved in issues related to the topic of this manuscript.
(© Author(s) (or their employer(s)) 2021. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ.)
Databáze: MEDLINE