biosigner: A New Method for the Discovery of Significant Molecular Signatures from Omics Data

Autor: Christophe Junot, Philippe Rinaudo, Samia Boudah, Etienne A. Thévenot
Přispěvatelé: Laboratoire d'analyse des données et d'intelligence des systèmes (LADIS), Département Métrologie Instrumentation & Information (DM2I), Laboratoire d'Intégration des Systèmes et des Technologies (LIST), Direction de Recherche Technologique (CEA) (DRT (CEA)), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Direction de Recherche Technologique (CEA) (DRT (CEA)), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Université Paris-Saclay-Laboratoire d'Intégration des Systèmes et des Technologies (LIST), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Université Paris-Saclay, Laboratoire d'Etude du Métabolisme des Médicaments (LEMM), Service de Pharmacologie et Immunoanalyse (SPI), Médicaments et Technologies pour la Santé (MTS), Université Paris-Saclay-Direction de Recherche Fondamentale (CEA) (DRF (CEA)), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE)-Université Paris-Saclay-Direction de Recherche Fondamentale (CEA) (DRF (CEA)), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE)-Médicaments et Technologies pour la Santé (MTS), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), ANR-11-INBS-0010,METABOHUB,Développement d'une infrastructure française distribuée pour la métabolomique dédiée à l'innovation(2011), Laboratoire d'Intégration des Systèmes et des Technologies (LIST (CEA)), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Université Paris-Saclay-Laboratoire d'Intégration des Systèmes et des Technologies (LIST (CEA))
Jazyk: angličtina
Rok vydání: 2016
Předmět:
0301 basic medicine
Workflow4metabolomics
Support Vector Machine
Computer science
Feature selection
bile
computer.software_genre
Biochemistry
Genetics and Molecular Biology (miscellaneous)

Biochemistry
Partial Least Squares
wrapper approach
Bioconductor
03 medical and health sciences
transcriptomics
feature selection
proteomics
Resampling
[SDV.BBM.GTP]Life Sciences [q-bio]/Biochemistry
Molecular Biology/Genomics [q-bio.GN]

Partial least squares regression
taurochenodeoxycholic acid
Molecular Biosciences
Molecular Biology
reference binary classifier
Original Research
Random Forest
[INFO.INFO-DB]Computer Science [cs]/Databases [cs.DB]
biosigner algorithm
data mining
molecular signature
Linear discriminant analysis
metabolomics
[SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM]
Random forest
Support vector machine
omics data
030104 developmental biology
diabetic patients
[INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR]
discovery of biomarkers
biomarker
Data mining
Classifier (UML)
computer
[PHYS.PHYS.PHYS-DATA-AN]Physics [physics]/Physics [physics]/Data Analysis
Statistics and Probability [physics.data-an]
Zdroj: Frontiers in Molecular Biosciences
Frontiers in Molecular Biosciences, Frontiers Media, 2016, 3, ⟨10.3389/fmolb.2016.00026⟩
Frontiers in Molecular Biosciences, 2016, 3, ⟨10.3389/fmolb.2016.00026⟩
ISSN: 2296-889X
Popis: International audience; High-throughput technologies such as transcriptomics, proteomics, and metabolomics show great promise for the discovery of biomarkers for diagnosis and prognosis. Selection of the most promising candidates between the initial untargeted step and the subsequent validation phases is critical within the pipeline leading to clinical tests. Several statistical and data mining methods have been described for feature selection: in particular, wrapper approaches iteratively assess the performance of the classifier on distinct subsets of variables. Current wrappers, however, do not estimate the significance of the selected features. We therefore developed a new methodology to find the smallest feature subset which significantly contributes to the model performance, by using a combination of resampling, ranking of variable importance, significance assessment by permutation of the feature values in the test subsets, and half-interval search. We wrapped our biosigner algorithm around three reference binary classifiers (Partial Least Squares—Discriminant Analysis, Random Forest, and Support Vector Machines) which have been shown to achieve specific performances depending on the structure of the dataset. By using three real biological and clinical metabolomics and transcriptomics datasets (containing up to 7000 features), complementary signatures were obtained in a few minutes, generally providing higher prediction accuracies than the initial full model. Comparison with alternative feature selection approaches further indicated that our method provides signatures of restricted size and high stability. Finally, by using our methodology to seek metabolites discriminating type 1 from type 2 diabetic patients, several features were selected, including a fragment from the taurochenodeoxycholic bile acid. Our methodology, implemented in the biosigner R/Bioconductor package and Galaxy/Workflow4metabolomics module, should be of interest for both experimenters and statisticians to identify robust molecular signatures from large omics datasets in the process of developing new diagnostics.
Databáze: OpenAIRE