Machine learning and natural language processing (NLP) approach to predict early progression to first-line treatment in real-world hormone receptor-positive (HR+)/HER2-negative advanced breast cancer patients
Autor: | Bella Pajares, Sofia Ruiz-Medina, Enrique Saez, Antonia Márquez, Laura Galvez, Begoña Jimenez, Ana Godoy, Pablo Rodriguez-Brazzarola, Maria E. Dominguez-Recio, Francisco Carabantes, Alfonso Sánchez-Muñoz, María José Bermejo, Irene López, José M. Jerez, Tamara Diaz-Redondo, Ester Villar, Héctor Mesa, Leo Franco, Nuria Ribelles, Emilio Alba |
---|---|
Rok vydání: | 2021 |
Předmět: |
Adult
0301 basic medicine Cancer Research Receptor ErbB-2 Advanced breast Breast Neoplasms computer.software_genre Machine learning Machine Learning Young Adult 03 medical and health sciences 0302 clinical medicine Breast cancer Antineoplastic Combined Chemotherapy Protocols Electronic Health Records Humans Medicine Aged Natural Language Processing Retrospective Studies Aged 80 and over business.industry HER2 negative Area under the curve Cancer Middle Aged Prognosis medicine.disease Metastatic breast cancer Survival Rate First line treatment 030104 developmental biology Receptors Estrogen Oncology Hormone receptor 030220 oncology & carcinogenesis Disease Progression Female Artificial intelligence Receptors Progesterone business computer Natural language processing Follow-Up Studies |
Zdroj: | European Journal of Cancer. 144:224-231 |
ISSN: | 0959-8049 |
DOI: | 10.1016/j.ejca.2020.11.030 |
Popis: | Background CDK4/6 inhibitors plus endocrine therapies are the current standard of care in the first-line treatment of HR+/HER2-negative metastatic breast cancer, but there are no well-established clinical or molecular predictive factors for patient response. In the era of personalised oncology, new approaches for developing predictive models of response are needed. Materials and methods Data derived from the electronic health records (EHRs) of real-world patients with HR+/HER2-negative advanced breast cancer were used to develop predictive models for early and late progression to first-line treatment. Two machine learning approaches were used: a classic approach using a data set of manually extracted features from reviewed (EHR) patients, and a second approach using natural language processing (NLP) of free-text clinical notes recorded during medical visits. Results Of the 610 patients included, there were 473 (77.5%) progressions to first-line treatment, of which 126 (20.6%) occurred within the first 6 months. There were 152 patients (24.9%) who showed no disease progression before 28 months from the onset of first-line treatment. The best predictive model for early progression using the manually extracted dataset achieved an area under the curve (AUC) of 0.734 (95% CI 0.687–0.782). Using the NLP free-text processing approach, the best model obtained an AUC of 0.758 (95% CI 0.714–0.800). The best model to predict long responders using manually extracted data obtained an AUC of 0.669 (95% CI 0.608–0.730). With NLP free-text processing, the best model attained an AUC of 0.752 (95% CI 0.705–0.799). Conclusions Using machine learning methods, we developed predictive models for early and late progression to first-line treatment of HR+/HER2-negative metastatic breast cancer, also finding that NLP-based machine learning models are slightly better than predictive models based on manually obtained data. |
Databáze: | OpenAIRE |
Externí odkaz: |