A statistical methodology to select covariates in high-dimensional data under dependence. Application to the classification of genetic profiles in oncology

Autor: Hélène Dumond, Charlène Thiébaut, Taha Boukhobza, Bérangère Bastien, Aurélie Muller-Gueudin, Anne Gégout-Petit
Přispěvatelé: Transgene SA [Illkirch], Centre de Recherche en Automatique de Nancy (CRAN), Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Biology, genetics and statistics (BIGS), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Institut Élie Cartan de Lorraine (IECL), Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
Rok vydání: 2020
Předmět:
FOS: Computer and information sciences
Statistics and Probability
Clustering high-dimensional data
Variable selection
Application Notes
Computer science
0211 other engineering and technologies
Correlated covariates selection
Mathematics - Statistics Theory
Context (language use)
Feature selection
Statistics Theory (math.ST)
02 engineering and technology
Machine learning
computer.software_genre
Statistics - Applications
01 natural sciences
Methodology (stat.ME)
010104 statistics & probability
[MATH.MATH-ST]Mathematics [math]/Statistics [math.ST]
Covariate
FOS: Mathematics
Applications (stat.AP)
0101 mathematics
Statistics - Methodology
[STAT.AP]Statistics [stat]/Applications [stat.AP]
021103 operations research
business.industry
Genetic profiles
High dimension
Personalized medicine
3. Good health
Variable (computer science)
Ranking
Multiple testing procedures
Aggregated methods
Artificial intelligence
Statistics
Probability and Uncertainty

business
[STAT.ME]Statistics [stat]/Methodology [stat.ME]
computer
Zdroj: Journal of Applied Statistics
Journal of Applied Statistics, Taylor & Francis (Routledge), In press, pp.23. ⟨10.1080/02664763.2020.1837083⟩
Journal of Applied Statistics, 2022, 49 (3), pp.764-781. ⟨10.1080/02664763.2020.1837083⟩
J Appl Stat
ISSN: 1360-0532
0266-4763
DOI: 10.1080/02664763.2020.1837083
Popis: International audience; We propose a new methodology for selecting and ranking covariates associated with a variable of interest in a context of high-dimensional data under dependence but few observations. The methodology successively intertwines the clustering of covariates, decorrelation of covariates using Factor Latent Analysis, selection using aggregation of adapted methods and finally ranking. Simulations study shows the interest of the decorrelation inside the different clusters of covariates. We first apply our method to transcriptomic data of 37 patients with advanced non-small-cell lung cancer who have received chemotherapy, to select the transcriptomic covariates that explain the survival outcome of the treatment. Secondly, we apply our method to 79 breast tumor samples to define patient profiles for a new metastatic biomarker and associated gene network in order to personalize the treatments.
Databáze: OpenAIRE