Machine learning application to find patients with lower-risk myelodysplastic syndrome from real-world data
Autor: | Colden Johanson, Hu T. Huang, Danny Idryo, Ronda Broome, Matthew J. Rioth, Rayna K. Matsuno |
---|---|
Rok vydání: | 2022 |
Předmět: | |
Zdroj: | Journal of Clinical Oncology. 40:1555-1555 |
ISSN: | 1527-7755 0732-183X |
DOI: | 10.1200/jco.2022.40.16_suppl.1555 |
Popis: | 1555 Background: It is a challenge to identify patients with myelodysplastic syndrome (MDS) using structured data from electronic health records (EHRs). Current claims-based algorithms incorporating diagnosis codes, clinical labs, and procedures have not been validated against an expert reference standard. A machine learning-based approach was investigated to identify erythropoietin-stimulating agent (ESA)-treated, lower-risk (LR)-MDS patients from structured EHR data. Methods: A sample of 1,549 patients from the Syapse Learning Health Network (SLHN) was identified as potential ESA-treated LR-MDS patients by a team of clinicians and epidemiologists based on diagnosis and medication data from multiple health systems’ EHRs and cancer registries. Of these, 404 (25%) were confirmed as ESA-treated LR-MDS patients through a review of patient records by certified cancer registrars (CTRs). The sample was divided into training and validation sets at a ratio of 80/20, stratified by the outcome. Age, sex, diagnosis codes corresponding to MDS and chronic kidney disease, medication (ESA, luspatercept, lenalidomide), clinical lab tests (hemoglobin, absolute neutrophils, platelet, blast percentage), and evidence of bone marrow biopsy were included as the predictive variables for the models. Gradient boosting machines with a nested cross-validation scheme were adopted to build the optimal model on the training set. Model acceptance was evaluated based on precision and recall on the validation set. The optimal model was then applied to the remaining unscreened SLHN patient population. Results: The optimal model identified an additional cohort of 157 patients based on the predicted likelihood. Among these, 69 (44%) were CTR-confirmed ESA-treated LR-MDS patients, all of whom were previously missed by the initial expert-determined selection criteria, as shown in the table. Conclusions: The application of machine learning methods increased the rate of ESA-treated MDS patient identification even after the expertly-determined population was depleted. This suggests the application of machine learning models using EHR data may improve the efficiency of MDS patient identification and screening efforts for research, quality improvement, and clinical care. [Table: see text] |
Databáze: | OpenAIRE |
Externí odkaz: |