A statistical methodology to select covariates in high-dimensional data under dependence. Application to the classification of genetic profiles associated with outcome of a non-small-cell lung cancer treatment

Autor: Bastien, Bérangère, Chakir, Hafid, Gégout-Petit, Anne, Muller-Gueudin, Aurélie, Shi, Yaojie
Přispěvatelé: Transgene SA [Illkirch], Institut Élie Cartan de Lorraine (IECL), Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Biology, genetics and statistics (BIGS), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Institut Élie Cartan de Lorraine (IECL), Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Toulouse School of Economics (TSE-R), Université Toulouse Capitole (UT Capitole), Université de Toulouse (UT)-Université de Toulouse (UT)-Institut National de la Recherche Agronomique (INRA)-École des hautes études en sciences sociales (EHESS)-Centre National de la Recherche Scientifique (CNRS), Toulouse School of Economics (TSE), École des hautes études en sciences sociales (EHESS)-Institut National de la Recherche Agronomique (INRA)-Centre National de la Recherche Scientifique (CNRS)-Université Toulouse 1 Capitole (UT1), Muller-Gueudin, Aurélie, Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées
Jazyk: angličtina
Rok vydání: 2018
Předmět:
Popis: We propose a new methodology to select and rank covariates associated to avariable of interest in a context of high-dimensional data under dependencebut few observations. The methodology imbricates successively clustering ofcovariates, decorrelation of covariates using Factor Latent Analysis, selectionusing aggregation of adapted methods and finally ranking. Simulations studyshows the interest of the decorrelation inside the different clusters of covariates.The objective of our method is to determine profiles of patients linked withthe outcome of a treatment. We apply our method on transcriptomic data ofn = 37 patients with advanced non-small-cell lung cancer, who have receivedchemotherapy. The survival time of these patients being known, we apply ourmethod to select the covariates that are the most linked with the outcometreatment among a set of more than 50 000 transcriptomic covariates. Weobtain different transcriptomic profiles for the patients whose survival time wasshort, versus the other patients with longer survival time.
Databáze: OpenAIRE