A Sparse Mixture-of-Experts Model With Screening of Genetic Associations to Guide Disease Subtyping
Autor: | Marie Courbariaux, Kylliann De Santiago, Cyril Dalmasso, Fabrice Danjou, Samir Bekadar, Jean-Christophe Corvol, Maria Martinez, Marie Szafranski, Christophe Ambroise |
---|---|
Přispěvatelé: | Sorbonne Université Maison des Modélisations Ingénieries et Technologies (SUMMIT), Sorbonne Université (SU), Laboratoire de Mathématiques et Modélisation d'Evry (LaMME), Université d'Évry-Val-d'Essonne (UEVE)-ENSIIE-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Université d'Évry-Val-d'Essonne (UEVE), Institut du Cerveau = Paris Brain Institute (ICM), Assistance publique - Hôpitaux de Paris (AP-HP) (AP-HP)-Institut National de la Santé et de la Recherche Médicale (INSERM)-CHU Pitié-Salpêtrière [AP-HP], Assistance publique - Hôpitaux de Paris (AP-HP) (AP-HP)-Sorbonne Université (SU)-Sorbonne Université (SU)-Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS), Agence Technique de l'Information sur l'Hospitalisation (ATIH), ATIH, Institut de Recherche en Santé Digestive (IRSD ), Université Toulouse III - Paul Sabatier (UT3), Université de Toulouse (UT)-Université de Toulouse (UT)-Ecole Nationale Vétérinaire de Toulouse (ENVT), Institut National Polytechnique (Toulouse) (Toulouse INP), Université de Toulouse (UT)-Université de Toulouse (UT)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université de Toulouse (UT)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Ecole Nationale Supérieure d'Informatique pour l'Industrie et l'Entreprise (ENSIIE), ANR-16-CE37-0008,MeMoDeeP,Méthodes et Modèles pour la caractérisation phénotypique fine de la Maladie de Parkinson(2016) |
Rok vydání: | 2022 |
Předmět: |
[STAT.AP]Statistics [stat]/Applications [stat.AP]
Disease subtyping with clinical and genotyping data High-dimensionality and variable selection Longitudinal data Mixture (of experts) models Parkinson's disease Genetics Molecular Medicine [SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM] Genetics (clinical) |
Zdroj: | Frontiers in Genetics Frontiers in Genetics, 2022, Statistical Methods, Computing, and Resources for Genome-Wide Association Studies, Volume II, 13, ⟨10.3389/fgene.2022.859462⟩ |
ISSN: | 1664-8021 |
DOI: | 10.3389/fgene.2022.859462⟩ |
Popis: | Motivation: Identifying new genetic associations in non-Mendelian complex diseases is an increasingly difficult challenge. These diseases sometimes appear to have a significant component of heritability requiring explanation, and this missing heritability may be due to the existence of subtypes involving different genetic factors. Taking genetic information into account in clinical trials might potentially have a role in guiding the process of subtyping a complex disease. Most methods dealing with multiple sources of information rely on data transformation, and in disease subtyping, the two main strategies used are 1) the clustering of clinical data followed by posterior genetic analysis and 2) the concomitant clustering of clinical and genetic variables. Both of these strategies have limitations that we propose to address.Contribution: This work proposes an original method for disease subtyping on the basis of both longitudinal clinical variables and high-dimensional genetic markers via a sparse mixture-of-regressions model. The added value of our approach lies in its interpretability in relation to two aspects. First, our model links both clinical and genetic data with regard to their initial nature (i.e., without transformation) and does not require post-processing where the original information is accessed a second time to interpret the subtypes. Second, it can address large-scale problems because of a variable selection step that is used to discard genetic variables that may not be relevant for subtyping.Results: The proposed method was validated on simulations. A dataset from a cohort of Parkinson’s disease patients was also analyzed. Several subtypes of the disease and genetic variants that potentially have a role in this typology were identified.Software availability: The R code for the proposed method, named DiSuGen, and a tutorial are available for download (see the references). |
Databáze: | OpenAIRE |
Externí odkaz: |