Rare variant genotype imputation with thousands of study-specific whole-genome sequences: implications for cost-effective study designs.

Autor: Pistis G; 1] Istituto di Ricerca Genetica e Biomedica (IRGB), CNR, Monserrato, Italy [2] Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA [3] Dipartimento di Scienze Biomediche, Università di Sassari, Sassari, Italy., Porcu E; 1] Istituto di Ricerca Genetica e Biomedica (IRGB), CNR, Monserrato, Italy [2] Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA [3] Dipartimento di Scienze Biomediche, Università di Sassari, Sassari, Italy., Vrieze SI; Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA., Sidore C; 1] Istituto di Ricerca Genetica e Biomedica (IRGB), CNR, Monserrato, Italy [2] Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA [3] Dipartimento di Scienze Biomediche, Università di Sassari, Sassari, Italy., Steri M; Istituto di Ricerca Genetica e Biomedica (IRGB), CNR, Monserrato, Italy., Danjou F; Istituto di Ricerca Genetica e Biomedica (IRGB), CNR, Monserrato, Italy., Busonero F; 1] Istituto di Ricerca Genetica e Biomedica (IRGB), CNR, Monserrato, Italy [2] Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA., Mulas A; 1] Istituto di Ricerca Genetica e Biomedica (IRGB), CNR, Monserrato, Italy [2] Dipartimento di Scienze Biomediche, Università di Sassari, Sassari, Italy., Zoledziewska M; Istituto di Ricerca Genetica e Biomedica (IRGB), CNR, Monserrato, Italy., Maschio A; 1] Istituto di Ricerca Genetica e Biomedica (IRGB), CNR, Monserrato, Italy [2] Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA., Brennan C; University of Michigan Sequencing Core, University of Michigan Medical School, Ann Arbor, MI, USA., Lai S; Istituto di Ricerca Genetica e Biomedica (IRGB), CNR, Monserrato, Italy., Miller MB; Department of Psychology, University of Minnesota, Minneapolis, MN, USA., Marcelli M; CRS4, Parco tecnologico della Sardegna, Pula, Cagliari, Italy., Urru MF; CRS4, Parco tecnologico della Sardegna, Pula, Cagliari, Italy., Pitzalis M; Istituto di Ricerca Genetica e Biomedica (IRGB), CNR, Monserrato, Italy., Lyons RH; University of Michigan Sequencing Core, University of Michigan Medical School, Ann Arbor, MI, USA., Kang HM; Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA., Jones CM; CRS4, Parco tecnologico della Sardegna, Pula, Cagliari, Italy., Angius A; 1] Istituto di Ricerca Genetica e Biomedica (IRGB), CNR, Monserrato, Italy [2] University of Michigan Sequencing Core, University of Michigan Medical School, Ann Arbor, MI, USA., Iacono WG; Department of Psychology, University of Minnesota, Minneapolis, MN, USA., Schlessinger D; Laboratory of Genetics, NIA, Baltimore, MD, USA., McGue M; Department of Psychology, University of Minnesota, Minneapolis, MN, USA., Cucca F; 1] Istituto di Ricerca Genetica e Biomedica (IRGB), CNR, Monserrato, Italy [2] Dipartimento di Scienze Biomediche, Università di Sassari, Sassari, Italy., Abecasis GR; Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA., Sanna S; Istituto di Ricerca Genetica e Biomedica (IRGB), CNR, Monserrato, Italy.
Jazyk: angličtina
Zdroj: European journal of human genetics : EJHG [Eur J Hum Genet] 2015 Jul; Vol. 23 (7), pp. 975-83. Date of Electronic Publication: 2014 Oct 08.
DOI: 10.1038/ejhg.2014.216
Abstrakt: The utility of genotype imputation in genome-wide association studies is increasing as progressively larger reference panels are improved and expanded through whole-genome sequencing. Developing general guidelines for optimally cost-effective imputation, however, requires evaluation of performance issues that include the relative utility of study-specific compared with general/multipopulation reference panels; genotyping with various array scaffolds; effects of different ethnic backgrounds; and assessment of ranges of allele frequencies. Here we compared the effectiveness of study-specific reference panels to the commonly used 1000 Genomes Project (1000G) reference panels in the isolated Sardinian population and in cohorts of European ancestry including samples from Minnesota (USA). We also examined different combinations of genome-wide and custom arrays for baseline genotypes. In Sardinians, the study-specific reference panel provided better coverage and genotype imputation accuracy than the 1000G panels and other large European panels. In fact, even gene-centered custom arrays (interrogating ~200 000 variants) provided highly informative content across the entire genome. Gain in accuracy was also observed for Minnesotans using the study-specific reference panel, although the increase was smaller than in Sardinians, especially for rare variants. Notably, a combined panel including both study-specific and 1000G reference panels improved imputation accuracy only in the Minnesota sample, and only at rare sites. Finally, we found that when imputation is performed with a study-specific reference panel, cutoffs different from the standard thresholds of MACH-Rsq and IMPUTE-INFO metrics should be used to efficiently filter badly imputed rare variants. This study thus provides general guidelines for researchers planning large-scale genetic studies.
Databáze: MEDLINE