Efficient Association Study Design Via Power-Optimized Tag SNP Selection
Autor: | Myeong Seong Seo, Buhm Han, Hyun Min Kang, Eleazar Eskin, Noah Zaitlen |
---|---|
Rok vydání: | 2008 |
Předmět: |
Genetics
education.field_of_study Linkage disequilibrium Models Statistical Genome Human Population Single-nucleotide polymorphism Computational biology Tag SNP Biology Polymorphism Single Nucleotide Linkage Disequilibrium Article Statistical power SNP genotyping Gene Frequency Genetic Techniques Humans Genetic Predisposition to Disease education Genetics (clinical) Selection (genetic algorithm) Genetic association |
Zdroj: | Annals of Human Genetics. 72:834-847 |
ISSN: | 1469-1809 0003-4800 |
DOI: | 10.1111/j.1469-1809.2008.00469.x |
Popis: | Summary Discovering statistical correlation between causal genetic variation and clinical traits through association studies is an important method for identifying the genetic basis of human diseases. Since fully resequencing a cohort is prohibitively costly, genetic association studies take advantage of local correlation structure (or linkage disequilibrium) between single nucleotide polymorphisms (SNPs) by selecting a subset of SNPs to be genotyped (tag SNPs). While many current association studies are performed using commercially available high-throughput genotyping products that define a set of tag SNPs, choosing tag SNPs remains an important problem for both custom follow-up studies as well as designing the high-throughput genotyping products themselves. The most widely used tag SNP selection method optimizes the correlation between SNPs (r 2 ). However, tag SNPs chosen based on an r 2 criterion do not necessarily maximize the statistical power of an association study. We propose a study design framework that chooses SNPs to maximize power and efficiently measures the power through empirical simulation. Empirical results based on the HapMap data show that our method gains considerable power over a widely used r 2 -based method, or equivalently reduces the number of tag SNPs required to attain the desired power of a study. Our power-optimized 100k whole genome tag set provides equivalent power to the Affymetrix 500k chip for the CEU population. For the design of custom follow-up studies, our method provides up to twice the power increase using the same number of tag SNPs as r 2 -based methods. Our method is publicly available via web server at http://design.cs.ucla.edu. |
Databáze: | OpenAIRE |
Externí odkaz: |