Machine Learning-Based Ensemble Recursive Feature Selection of Circulating miRNAs for Cancer Tumor Classification

Autor: Lopez-Rincon, Alejandro, Mendoza-Maldonado, Lucero, Martinez-Archundia, Marlet, Schönhuth, Alexander, Kraneveld, Aletta D., Garssen, Johan, Tonda, Alberto, Pharmacology, Afd Pharmacology
Přispěvatelé: Utrecht University [Utrecht], Nuevo Hospital Civil de Guadalajara 'Dr. Juan I. Menchaca', Instituto Politecnico Nacional [Mexico] (IPN), Centrum voor Wiskunde en Informatica (CWI), Centrum Wiskunde & Informatica (CWI)-Netherlands Organisation for Scientific Research, Universität Bielefeld = Bielefeld University, Danone Nutricia Research [Utrecht], Mathématiques et Informatique Appliquées (MIA-Paris), Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE)-AgroParisTech-Université Paris-Saclay, division of Pharmacology, Department of Pharmaceutical Sciences, Faculty of Science, Utrecht University, SURF Cooperative, Pharmacology, Afd Pharmacology
Jazyk: angličtina
Rok vydání: 2020
Předmět:
0301 basic medicine
Cancer Research
Computer science
Feature selection
[SDV.CAN]Life Sciences [q-bio]/Cancer
Machine learning
computer.software_genre
lcsh:RC254-282
Article
03 medical and health sciences
0302 clinical medicine
Breast cancer
feature selection
[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG]
[SDV.BBM.GTP]Life Sciences [q-bio]/Biochemistry
Molecular Biology/Genomics [q-bio.GN]

Feature (machine learning)
medicine
Circulating
Set (psychology)
Triple-negative breast cancer
circulating
business.industry
Dimensionality reduction
Precision medicine
medicine.disease
lcsh:Neoplasms. Tumors. Oncology. Including cancer and carcinogens
3. Good health
Statistical classification
030104 developmental biology
machine learning
Oncology
030220 oncology & carcinogenesis
miRNAs
MiRNAs
[SDV.IB]Life Sciences [q-bio]/Bioengineering
Artificial intelligence
business
computer
TNBC
Zdroj: Cancers
Cancers, MDPI, 2020, 12 (7), pp.1785. ⟨10.3390/cancers12071785⟩
Cancers, 12(7), 1-27
Volume 12
Issue 7
Cancers, Vol 12, Iss 1785, p 1785 (2020)
Cancers, 12(7), 1. Multidisciplinary Digital Publishing Institute (MDPI)
ISSN: 2072-6694
DOI: 10.3390/cancers12071785⟩
Popis: International audience; Circulating microRNAs (miRNA) are small noncoding RNA molecules that can be detected in bodily fluids without the need for major invasive procedures on patients. miRNAs have shown great promise as biomarkers for tumors to both assess their presence and to predict their type and subtype. Recently, thanks to the availability of miRNAs datasets, machine learning techniques have been successfully applied to tumor classification. The results, however, are difficult to assess and interpret by medical experts because the algorithms exploit information from thousands of miRNAs. In this work, we propose a novel technique that aims at reducing the necessary information to the smallest possible set of circulating miRNAs. The dimensionality reduction achieved reflects a very important first step in a potential, clinically actionable, circulating miRNA-based precision medicine pipeline. While it is currently under discussion whether this first step can be taken, we demonstrate here that it is possible to perform classification tasks by exploiting a recursive feature elimination procedure that integrates a heterogeneous ensemble of high-quality, state-of-the-art classifiers on circulating miRNAs. Heterogeneous ensembles can compensate inherent biases of classifiers by using different classification algorithms. Selecting features then further eliminates biases emerging from using data from different studies or batches, yielding more robust and reliable outcomes. The proposed approach is first tested on a tumor classification problem in order to separate 10 different types of cancer, with samples collected over 10 different clinical trials, and later is assessed on a cancer subtype classification task, with the aim to distinguish triple negative breast cancer from other subtypes of breast cancer. Overall, the presented methodology proves to be effective and compares favorably to other state-of-the-art feature selection methods.
Databáze: OpenAIRE