Imputation-Aware Tag SNP Selection To Improve Power for Large-Scale, Multi-ethnic Association Studies.

Autor: Wojcik GL; Department of Genetics, Stanford University School of Medicine, 365 Lasuen Street, Littlefield Center MC2069, Stanford, CA 94305., Fuchsberger C; Department of Biostatistics and Center for Statistical Genetics, School of Public Health, University of Michigan, 1415 Washington Heights, Ann Arbor, MI 48109.; Center for Biomedicine, European Academy of Bolzano/Bozen (EURAC), affiliated with the University of Lübeck, Bolzano, Bozen, 39100, Italy., Taliun D; Department of Biostatistics and Center for Statistical Genetics, School of Public Health, University of Michigan, 1415 Washington Heights, Ann Arbor, MI 48109., Welch R; Department of Biostatistics and Center for Statistical Genetics, School of Public Health, University of Michigan, 1415 Washington Heights, Ann Arbor, MI 48109., Martin AR; Department of Genetics, Stanford University School of Medicine, 365 Lasuen Street, Littlefield Center MC2069, Stanford, CA 94305., Shringarpure S; Department of Genetics, Stanford University School of Medicine, 365 Lasuen Street, Littlefield Center MC2069, Stanford, CA 94305., Carlson CS; Fred Hutchinson Cancer Center, University of Washington, 1100 Fairview Ave. N., Seattle, WA 98109., Abecasis G; Department of Biostatistics and Center for Statistical Genetics, School of Public Health, University of Michigan, 1415 Washington Heights, Ann Arbor, MI 48109., Kang HM; Department of Biostatistics and Center for Statistical Genetics, School of Public Health, University of Michigan, 1415 Washington Heights, Ann Arbor, MI 48109., Boehnke M; Department of Biostatistics and Center for Statistical Genetics, School of Public Health, University of Michigan, 1415 Washington Heights, Ann Arbor, MI 48109., Bustamante CD; Department of Genetics, Stanford University School of Medicine, 365 Lasuen Street, Littlefield Center MC2069, Stanford, CA 94305.; Department of Biomedical Data Science, Stanford University School of Medicine, 365 Lasuen Street, Littlefield Center MC2069, Stanford, CA 94305., Gignoux CR; Department of Genetics, Stanford University School of Medicine, 365 Lasuen Street, Littlefield Center MC2069, Stanford, CA 94305 chris.gignoux@ucdenver.edu eimear.kenny@mssm.edu., Kenny EE; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Place, New York, NY 10029 chris.gignoux@ucdenver.edu eimear.kenny@mssm.edu.; The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Place, New York, NY 10029.; The Icahn Institute of Multiscale Biology and Genomics, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Place, New York, NY 10029.; The Center for Statistical Genetics, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Place, New York, NY 10029.
Jazyk: angličtina
Zdroj: G3 (Bethesda, Md.) [G3 (Bethesda)] 2018 Oct 03; Vol. 8 (10), pp. 3255-3267. Date of Electronic Publication: 2018 Oct 03.
DOI: 10.1534/g3.118.200502
Abstrakt: The emergence of very large cohorts in genomic research has facilitated a focus on genotype-imputation strategies to power rare variant association. These strategies have benefited from improvements in imputation methods and association tests, however little attention has been paid to ways in which array design can increase rare variant association power. Therefore, we developed a novel framework to select tag SNPs using the reference panel of 26 populations from Phase 3 of the 1000 Genomes Project. We evaluate tag SNP performance via mean imputed r 2 at untyped sites using leave-one-out internal validation and standard imputation methods, rather than pairwise linkage disequilibrium. Moving beyond pairwise metrics allows us to account for haplotype diversity across the genome for improve imputation accuracy and demonstrates population-specific biases from pairwise estimates. We also examine array design strategies that contrast multi-ethnic cohorts vs. single populations, and show a boost in performance for the former can be obtained by prioritizing tag SNPs that contribute information across multiple populations simultaneously. Using our framework, we demonstrate increased imputation accuracy for rare variants (frequency < 1%) by 0.5-3.1% for an array of one million sites and 0.7-7.1% for an array of 500,000 sites, depending on the population. Finally, we show how recent explosive growth in non-African populations means tag SNPs capture on average 30% fewer other variants than in African populations. The unified framework presented here will enable investigators to make informed decisions for the design of new arrays, and help empower the next phase of rare variant association for global health.
(Copyright © 2018 Wojcik et al.)
Databáze: MEDLINE