A Novel Hybrid Feature Selection and Ensemble Learning Framework for Unbalanced Cancer Data Diagnosis With Transcriptome and Functional Proteomic
Autor: | Lijun Cai, Xianfang Tang, Yajie Meng, Changlong Gu, Jialiang Yang, Jiasheng Yang |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2021 |
Předmět: |
0301 basic medicine
Functional proteomic General Computer Science Computer science Feature extraction Feature selection hybrid feature selection Machine learning computer.software_genre Data type 03 medical and health sciences 0302 clinical medicine cancer diagnosis Classifier (linguistics) General Materials Science Biological data business.industry General Engineering Ensemble learning the Cancer Genome Atlas (TCGA) ensemble method Support vector machine 030104 developmental biology Feature (computer vision) 030220 oncology & carcinogenesis Artificial intelligence transcriptome profiles lcsh:Electrical engineering. Electronics. Nuclear engineering business computer lcsh:TK1-9971 |
Zdroj: | IEEE Access, Vol 9, Pp 51659-51668 (2021) |
ISSN: | 2169-3536 |
Popis: | The high dimension, high redundancy and class imbalance of cancer multiple omics data are the main challenges for cancer diagnosis. Existing studies have neglected the role of functional proteomics in the occurrence and development of cancer. In this study, a novel hybrid feature selection and ensemble learning framework, referred to as the three-stage feature selection and twice-competitional ensemble learning method (TSFS-TCEM), is proposed for cancer diagnosis. Firstly, we combine the transcriptome and functional proteomics data to construct a multi-omics data on breast cancer, which is the first time to apply these combined biological data for diagnosing breast cancer. Secondly, the proposed method introduces multiple models during the feature selection and diagnostic model construction. The three-stage feature selections integrate the features from different types of data and the twice-competitional ensemble learning framework resolves the data imbalance problem suffer from a single classifier. The TSFS-TCEM achieves a diagnostic accuracy of 99.64%, outperforming all compared methods. In addition, the 5-fold cross-validation sensitivity, specificity and F-Measure of the method are above 99.63%. |
Databáze: | OpenAIRE |
Externí odkaz: |