Classification based on extensions of LS-PLS using logistic regression: application toclinical and multiple genomic data
Autor: | Caroline Bazzoli, Sophie Lambert-Lacroix |
---|---|
Přispěvatelé: | Statistique pour le Vivant et l’Homme (SVH), Laboratoire Jean Kuntzmann (LJK ), Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019]), Biologie Computationnelle et Mathématique (TIMC-IMAG-BCM), Techniques de l'Ingénierie Médicale et de la Complexité - Informatique, Mathématiques et Applications, Grenoble - UMR 5525 (TIMC-IMAG), Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019]) |
Jazyk: | angličtina |
Rok vydání: | 2018 |
Předmět: |
0301 basic medicine
Clustering high-dimensional data Computer science Datasets as Topic Logistic regression Context (language use) lcsh:Computer applications to medicine. Medical informatics computer.software_genre Biochemistry Least squares Field (computer science) reduction dimension Reduction (complexity) 03 medical and health sciences 0302 clinical medicine [STAT.ML]Statistics [stat]/Machine Learning [stat.ML] Structural Biology Neoplasms [SDV.BBM.GTP]Life Sciences [q-bio]/Biochemistry Molecular Biology/Genomics [q-bio.GN] Partial least squares regression Humans Least-Squares Analysis lcsh:QH301-705.5 Molecular Biology [STAT.AP]Statistics [stat]/Applications [stat.AP] Genome Human Methodology Article Gene Expression Profiling Applied Mathematics Dimensionality reduction Genomics Classification Computer Science Applications Data set High-dimensional data Logistic Models 030104 developmental biology lcsh:Biology (General) LS-PLS 030220 oncology & carcinogenesis lcsh:R858-859.7 Clinico-genomic model Data mining computer [STAT.ME]Statistics [stat]/Methodology [stat.ME] Algorithms |
Zdroj: | BMC Bioinformatics BMC Bioinformatics, 2018, 19 (1), ⟨10.1186/s12859-018-2311-2⟩ BMC Bioinformatics, BioMed Central, 2018, 19 (1), ⟨10.1186/s12859-018-2311-2⟩ BMC Bioinformatics, Vol 19, Iss 1, Pp 1-13 (2018) |
ISSN: | 1471-2105 |
Popis: | International audience; Prediction from high-dimensional genomic data is an active field in today's medical research. Most of the proposed prediction methods make use of genomic data alone without considering established clinical data that often are available and known to have predictive value. Recent studies suggest that combining clinical and genomic information may improve predictions. We consider in this paper methods for classification purposes that simultaneously use both types of variables, but applying dimension reduction only to the high-dimensional genomic ones. A usual way to deal with that is the use of a two-step approach. In step one, dimensionality reduction technique is just performed on the genomic dataset. In step two, the selected genomic variables are merged with the clinical variables to build a classification model on the combined dataset. Nevertheless, the reduction dimension is built without taking into account the link between the response variable and the clinical data. To address this issue, using Partial Least Squares (PLS) as reduction technique, we propose here a one step approach based on three extensions of LS-PLS (LS for Least Squares) method for logistic regression context. We perform a simulation study to evaluate these approaches compared to methods using only the clinical data or only genetic data. Then, we illustrate their performances to classify two real data sets containing both clinical information and gene expression. |
Databáze: | OpenAIRE |
Externí odkaz: |