A novel application of data-consistent inversion to overcome spurious inference in genome-wide association studies.
Autor: | Janani N; Department of Mathematical and Statistical Sciences, University of Colorado Denver, Denver, Colorado, USA., Young KA; Department of Epidemiology, Colorado School of Public Health, Aurora, Colorado, USA., Kinney G; Department of Epidemiology, Colorado School of Public Health, Aurora, Colorado, USA., Strand M; Division of Biostatistics, National Jewish Health, Denver, Colorado, USA., Hokanson JE; Department of Epidemiology, Colorado School of Public Health, Aurora, Colorado, USA., Liu Y; Department of Mathematical and Statistical Sciences, University of Colorado Denver, Denver, Colorado, USA., Butler T; Department of Mathematical and Statistical Sciences, University of Colorado Denver, Denver, Colorado, USA., Austin E; Department of Mathematical and Statistical Sciences, University of Colorado Denver, Denver, Colorado, USA. |
---|---|
Jazyk: | angličtina |
Zdroj: | Genetic epidemiology [Genet Epidemiol] 2024 Sep; Vol. 48 (6), pp. 270-288. Date of Electronic Publication: 2024 Apr 21. |
DOI: | 10.1002/gepi.22563 |
Abstrakt: | The genome-wide association studies (GWAS) typically use linear or logistic regression models to identify associations between phenotypes (traits) and genotypes (genetic variants) of interest. However, the use of regression with the additive assumption has potential limitations. First, the normality assumption of residuals is the one that is rarely seen in practice, and deviation from normality increases the Type-I error rate. Second, building a model based on such an assumption ignores genetic structures, like, dominant, recessive, and protective-risk cases. Ignoring genetic variants may result in spurious conclusions about the associations between a variant and a trait. We propose an assumption-free model built upon data-consistent inversion (DCI), which is a recently developed measure-theoretic framework utilized for uncertainty quantification. This proposed DCI-derived model builds a nonparametric distribution on model inputs that propagates to the distribution of observed data without the required normality assumption of residuals in the regression model. This characteristic enables the proposed DCI-derived model to cover all genetic variants without emphasizing on additivity of the classic-GWAS model. Simulations and a replication GWAS with data from the COPDGene demonstrate the ability of this model to control the Type-I error rate at least as well as the classic-GWAS (additive linear model) approach while having similar or greater power to discover variants in different genetic modes of transmission. (© 2024 Wiley Periodicals LLC.) |
Databáze: | MEDLINE |
Externí odkaz: |