Proper Conditional Analysis in the Presence of Missing Data Identified Novel Independently Associated Low Frequency Variants in Nicotine Dependence Genes

Autor: Gonçalo R. Abecasis, Matt McGue, Michael Boehnke, Bibo Jiang, Yu Jiang, Anita Pandit, Markku Laakso, Mengzhen Liu, Gregory J.M. Zajac, Dajiang J. Liu, Sai Chen, Sharon M. Lutz, Kevin Li, John E. Hokanson, Daniel McGuire, Scott I. Vrieze, John K. Hewitt, William G. Iacono, Xiaowei Zhan, Kenneth Krauter
Rok vydání: 2017
Předmět:
Popis: Meta-analysis of genetic association studies increases sample size and the power for mapping complex traits. Existing methods are mostly developed for datasets without missing values. In practice, genotype imputation is not always effective, e.g. when targeted genotyping/sequencing assays are used or when the un-typed genetic variant is rare. Therefore, contributed summary statistics often contain missing values. Naïve extensions of existing methods either replace missing summary statistics with 0 or discard studies with missing data. These approaches can bias genetic effect estimates and lead to seriously inflated type-I or II errors in conditional analysis, which is a critical tool for identifying independently associated variants.To address this challenge and complement imputation methods, we developed a method to combine summary statistics across participating studies and consistently estimate joint effects, even when the contributed summary statistics contain large amount of missing values. Based on this estimator, we propose a score statistic we call PCBS (partial correlation based score statistic) for conditional analysis of single-variant and gene-level associations. Through extensive analysis of simulated and real data, we showed that the new method produces well-calibrated type-I errors and is substantially more powerful than existing approaches. We applied the proposed approach to analyze the CHRNA5-CHRNB4-CHRNA3 locus in a large-scale meta-analysis for cigarettes-per-day. Using the new method, we identified three novel variants, independent of known association signals, which were otherwise missed by alternative methods. Together, the phenotypic variance explained by these variants is .46%, improving that of previously reported associations by 17%. These findings illustrate the extent of locus allelic heterogeneity and can help pinpoint causal variants.AUTHOR SUMMARYIt is of great interest to estimate the joint and conditional effects of multiple correlated variants from large scale meta-analysis, in order to fine map causal variants and understand the genetic architecture for complex traits. The contributed summary statistics from participating studies in a meta-analysis often contain missing values, as the imputation methods are not often effective, especially when the underlying genetic variant is rare or the participating studies use targeted genotyping array that is not suitable for imputation. Existing meta-analysis methods do not properly handle missing data, and can incorrectly estimate correlations between score statistics. As a result, they can produce highly biased estimates of joint effects and highly inflated type-I errors for conditional analysis, which will in turn result in overestimated phenotypic variance explained and incorrect identification of causal variants. We systematically evaluated this bias and proposed a novel partial correlation based score statistic. The new statistic has valid type-I errors for conditional analysis and much higher power than the existing methods, even when the contributed summary statistics in the meta-analysis contain a large fraction of missing values. We expect this method to be highly useful in the sequencing age for complex trait genetics.
Databáze: OpenAIRE