Classification based on extensions of LS-PLS using logistic regression: application toclinical and multiple genomic data

Autor: Caroline Bazzoli, Sophie Lambert-Lacroix
Přispěvatelé: Statistique pour le Vivant et l’Homme (SVH), Laboratoire Jean Kuntzmann (LJK ), Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019]), Biologie Computationnelle et Mathématique (TIMC-IMAG-BCM), Techniques de l'Ingénierie Médicale et de la Complexité - Informatique, Mathématiques et Applications, Grenoble - UMR 5525 (TIMC-IMAG), Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])
Jazyk: angličtina
Rok vydání: 2018
Předmět:
0301 basic medicine
Clustering high-dimensional data
Computer science
Datasets as Topic
Logistic regression
Context (language use)
lcsh:Computer applications to medicine. Medical informatics
computer.software_genre
Biochemistry
Least squares
Field (computer science)
reduction dimension
Reduction (complexity)
03 medical and health sciences
0302 clinical medicine
[STAT.ML]Statistics [stat]/Machine Learning [stat.ML]
Structural Biology
Neoplasms
[SDV.BBM.GTP]Life Sciences [q-bio]/Biochemistry
Molecular Biology/Genomics [q-bio.GN]

Partial least squares regression
Humans
Least-Squares Analysis
lcsh:QH301-705.5
Molecular Biology
[STAT.AP]Statistics [stat]/Applications [stat.AP]
Genome
Human

Methodology Article
Gene Expression Profiling
Applied Mathematics
Dimensionality reduction
Genomics
Classification
Computer Science Applications
Data set
High-dimensional data
Logistic Models
030104 developmental biology
lcsh:Biology (General)
LS-PLS
030220 oncology & carcinogenesis
lcsh:R858-859.7
Clinico-genomic model
Data mining
computer
[STAT.ME]Statistics [stat]/Methodology [stat.ME]
Algorithms
Zdroj: BMC Bioinformatics
BMC Bioinformatics, 2018, 19 (1), ⟨10.1186/s12859-018-2311-2⟩
BMC Bioinformatics, BioMed Central, 2018, 19 (1), ⟨10.1186/s12859-018-2311-2⟩
BMC Bioinformatics, Vol 19, Iss 1, Pp 1-13 (2018)
ISSN: 1471-2105
Popis: International audience; Prediction from high-dimensional genomic data is an active field in today's medical research. Most of the proposed prediction methods make use of genomic data alone without considering established clinical data that often are available and known to have predictive value. Recent studies suggest that combining clinical and genomic information may improve predictions. We consider in this paper methods for classification purposes that simultaneously use both types of variables, but applying dimension reduction only to the high-dimensional genomic ones. A usual way to deal with that is the use of a two-step approach. In step one, dimensionality reduction technique is just performed on the genomic dataset. In step two, the selected genomic variables are merged with the clinical variables to build a classification model on the combined dataset. Nevertheless, the reduction dimension is built without taking into account the link between the response variable and the clinical data. To address this issue, using Partial Least Squares (PLS) as reduction technique, we propose here a one step approach based on three extensions of LS-PLS (LS for Least Squares) method for logistic regression context. We perform a simulation study to evaluate these approaches compared to methods using only the clinical data or only genetic data. Then, we illustrate their performances to classify two real data sets containing both clinical information and gene expression.
Databáze: OpenAIRE