Popis: |
The rise of machine learning (ML) has recently buttressed the efforts for big data-driven precision oncology. This study used ensemble ML for precision oncology in breast cancer, which is one of the most common malignancies worldwide with marked heterogeneity of the underlying molecular mechanisms. We analyzed clinical and RNA-seq data from The Cancer Genome Atlas (TCGA) (844 patients with breast cancer and 113 healthy individuals) and the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) (1784 patients with breast cancer and 202 healthy individuals). We evaluated six algorithms in the context of ensemble modeling and identified a candidate mRNA diagnostic panel that can differentiate patients from healthy controls, and stratify breast cancer into molecular subtypes. The ensemble model included 50 mRNAs and displayed 82.55% accuracy, 79.22% specificity, and 84.55% sensitivity in stratifying patients into molecular subtypes in TCGA cohort. Its performance was markedly higher, however, in distinguishing the basal, LumB, and Her2+ breast cancer subtypes from healthy individuals. In overall survival analysis, the mRNA panel showed a hazard ratio of 2.25 ( |