Discovering Genome-Wide Tag SNPs Based on the Mutual Information of the Variants
Autor: | Xiaodong Wang, Abdulkadir Elmas, Tai-Hsien Ou Yang, Dimitris Anastassiou |
---|---|
Rok vydání: | 2016 |
Předmět: |
0301 basic medicine
Heredity lcsh:Medicine Genome-wide association study Genomics--Data processing Linkage Disequilibrium Database and Informatics Methods 0302 clinical medicine Human genetics--Data processing Databases Genetic Cluster Analysis lcsh:Science Genetics Expressed Sequence Tags Expressed sequence tag Multidisciplinary Chromosome Biology Applied Mathematics Simulation and Modeling Chromosome Mapping Mutual information Genomics Tag SNP Genomic Databases Genetic Mapping Chromosome 22 030220 oncology & carcinogenesis Physical Sciences Algorithms Research Article Genotyping Variant Genotypes Computational biology Biology Research and Analysis Methods Genome Complexity Polymorphism Single Nucleotide Chromosomes 03 medical and health sciences Sequence Homology Nucleic Acid SNP Humans 1000 Genomes Project Molecular Biology Techniques Molecular Biology Gene mapping Biology--Classification Selection (genetic algorithm) Genetic Association Studies Genetic association Evolutionary Biology Base Sequence Population Biology lcsh:R Biology and Life Sciences Computational Biology Epistasis Genetic Single nucleotide polymorphisms Cell Biology Genome Analysis Chromosome Pairs 030104 developmental biology ComputingMethodologies_PATTERNRECOGNITION Biological Databases Haplotypes FOS: Biological sciences lcsh:Q Population Genetics Mathematics Genome-Wide Association Study |
Zdroj: | PLoS ONE PLoS ONE, Vol 11, Iss 12, p e0167994 (2016) |
ISSN: | 1932-6203 |
Popis: | Exploring linkage disequilibrium (LD) patterns among the single nucleotide polymorphism (SNP) sites can improve the accuracy and cost-effectiveness of genomic association studies, whereby representative (tag) SNPs are identified to sufficiently represent the genomic diversity in populations. There has been considerable amount of effort in developing efficient algorithms to select tag SNPs from the growing large-scale data sets. Methods using the classical pairwise-LD and multi-locus LD measures have been proposed that aim to reduce the computational complexity and to increase the accuracy, respectively. The present work solves the tag SNP selection problem by efficiently balancing the computational complexity and accuracy, and improves the coverage in genomic diversity in a cost-effective manner. The employed algorithm makes use of mutual information to explore the multi-locus association between SNPs and can handle different data types and conditions. Experiments with benchmark HapMap data sets show comparable or better performance against the state-of-the-art algorithms. In particular, as a novel application, the genome-wide SNP tagging is performed in the 1000 Genomes Project data sets, and produced a well-annotated database of tagging variants that capture the common genotype diversity in 2,504 samples from 26 human populations. Compared to conventional methods, the algorithm requires as input only the genotype (or haplotype) sequences, can scale up to genome-wide analyses, and produces accurate solutions with more information-rich output, providing an improved platform for researchers towards the subsequent association studies. |
Databáze: | OpenAIRE |
Externí odkaz: |