Discovering Genome-Wide Tag SNPs Based on the Mutual Information of the Variants

Autor: Xiaodong Wang, Abdulkadir Elmas, Tai-Hsien Ou Yang, Dimitris Anastassiou
Rok vydání: 2016
Předmět:
0301 basic medicine
Heredity
lcsh:Medicine
Genome-wide association study
Genomics--Data processing
Linkage Disequilibrium
Database and Informatics Methods
0302 clinical medicine
Human genetics--Data processing
Databases
Genetic

Cluster Analysis
lcsh:Science
Genetics
Expressed Sequence Tags
Expressed sequence tag
Multidisciplinary
Chromosome Biology
Applied Mathematics
Simulation and Modeling
Chromosome Mapping
Mutual information
Genomics
Tag SNP
Genomic Databases
Genetic Mapping
Chromosome 22
030220 oncology & carcinogenesis
Physical Sciences
Algorithms
Research Article
Genotyping
Variant Genotypes
Computational biology
Biology
Research and Analysis Methods
Genome Complexity
Polymorphism
Single Nucleotide

Chromosomes
03 medical and health sciences
Sequence Homology
Nucleic Acid

SNP
Humans
1000 Genomes Project
Molecular Biology Techniques
Molecular Biology
Gene mapping
Biology--Classification
Selection (genetic algorithm)
Genetic Association Studies
Genetic association
Evolutionary Biology
Base Sequence
Population Biology
lcsh:R
Biology and Life Sciences
Computational Biology
Epistasis
Genetic

Single nucleotide polymorphisms
Cell Biology
Genome Analysis
Chromosome Pairs
030104 developmental biology
ComputingMethodologies_PATTERNRECOGNITION
Biological Databases
Haplotypes
FOS: Biological sciences
lcsh:Q
Population Genetics
Mathematics
Genome-Wide Association Study
Zdroj: PLoS ONE
PLoS ONE, Vol 11, Iss 12, p e0167994 (2016)
ISSN: 1932-6203
Popis: Exploring linkage disequilibrium (LD) patterns among the single nucleotide polymorphism (SNP) sites can improve the accuracy and cost-effectiveness of genomic association studies, whereby representative (tag) SNPs are identified to sufficiently represent the genomic diversity in populations. There has been considerable amount of effort in developing efficient algorithms to select tag SNPs from the growing large-scale data sets. Methods using the classical pairwise-LD and multi-locus LD measures have been proposed that aim to reduce the computational complexity and to increase the accuracy, respectively. The present work solves the tag SNP selection problem by efficiently balancing the computational complexity and accuracy, and improves the coverage in genomic diversity in a cost-effective manner. The employed algorithm makes use of mutual information to explore the multi-locus association between SNPs and can handle different data types and conditions. Experiments with benchmark HapMap data sets show comparable or better performance against the state-of-the-art algorithms. In particular, as a novel application, the genome-wide SNP tagging is performed in the 1000 Genomes Project data sets, and produced a well-annotated database of tagging variants that capture the common genotype diversity in 2,504 samples from 26 human populations. Compared to conventional methods, the algorithm requires as input only the genotype (or haplotype) sequences, can scale up to genome-wide analyses, and produces accurate solutions with more information-rich output, providing an improved platform for researchers towards the subsequent association studies.
Databáze: OpenAIRE