A predictive model for survival in non-small cell lung cancer (NSCLC) based on electronic health record (EHR) and tumor sequencing data at the Department of Veterans Affairs (VA)
Autor: | Kelly Gaynor, Colleen Shannon, Samuel Ajjarapu, Mary T. Brophy, Karen E. Pierce-Murray, Brett Johnson, Ayesha Rizwan Sheikh, Siamack Ayandeh, Danne Elbers, Nhan Do, Jamie Ramos-Cejudo, Nathanael Fillmore, Feng-Chi Sung, Corri Dedomenico, David Cheng, David P. Tuck, Sarah Schiller, Daniel Chen, Frank Meng, Robert B. Hall |
---|---|
Rok vydání: | 2019 |
Předmět: | |
Zdroj: | Journal of Clinical Oncology. 37:109-109 |
ISSN: | 1527-7755 0732-183X |
DOI: | 10.1200/jco.2019.37.15_suppl.109 |
Popis: | 109 Background: Machine learning tools based on EHR data hold promise to help avoid unnecessary risks associated with lung cancer and its treatment. Additionally, molecular genetic profiling is becoming an integral tool for clinicians to individualize treatment for lung cancer. However, relatively few survival models have been built that integrate this data in individualized predictive models. Here, we combine real-world EHR and tumor sequencing data from the VA Precision Oncology Data Repository (PODR) to build accurate individualized survival predictions in newly-diagnosed NSCLC patients. Methods: We identified a cohort of 356 VA patients newly diagnosed with NSCLC for whom EHR, cancer registry, and targeted tumor sequencing data is available in PODR. We defined 41 features reflecting 15 baseline clinical and demographic characteristics from the EHR and registry, such as age, race, stage, histology, and therapy. We also defined features reflecting 206 clinically actionable somatic variants. We selected 5 important variants for inclusion in the model, as well as the total number of mutations. We trained a random forests algorithm to predict 1-year survival. Precision, recall, and area under the ROC curve (AUC) were assessed using 5-fold cross validation. Results: Mean age at diagnosis was 66 years. The majority of patients had late stage disease (15% stage I, 6% II, 15% III, 44% IV), and 59% of patients received systemic therapy. 45% died within 1 year of diagnosis, and 55% survived past 1 year. Our predictive model for 1-year survival achieves strong results. Cross-validated AUC is 0.79 (SD 0.08), precision is 0.79 (SD 0.07), recall is 0.74 (SD 0.07), suggesting that the trained model combining clinical and genomic features is effective at predicting 1-year survival. Conclusions: By integrating real-world EHR and sequencing data, we built a highly accurate predictive model of 1-year survival in NSCLC patients at the VA. Such a model, after ongoing validation in a larger cohort, offers the ability to make individualized predictions that could inform patient care to improve outcomes. |
Databáze: | OpenAIRE |
Externí odkaz: |