A Genetic Algorithm for Diploid Genome Reconstruction Using Paired-End Sequencing

Autor: Yao-Ting Huang, Sheng-Yu Chuang, Choun-Sea Lin, Ming-Tsai Chan, Jian-Wei Chen, Chuan-Kang Ting
Rok vydání: 2016
Předmět:
0301 basic medicine
Cancer genome sequencing
Molecular biology
lcsh:Medicine
Genome
DNA library construction
Genome Sequencing
lcsh:Science
Paired-end tag
Genetics
Multidisciplinary
Chromosome Mapping
High-Throughput Nucleotide Sequencing
Genomics
Genome project
Genomic Library Construction
Epigenetics
Sequence Analysis
Algorithms
Research Article
Heterozygote
Single-nucleotide polymorphism
DNA construction
Biology
Genome Complexity
Polymorphism
Single Nucleotide

Evolution
Molecular

Genomic Imprinting
03 medical and health sciences
Computer Simulation
Sequencing Techniques
Genetic association
Evolutionary Biology
Sequence Assembly Tools
Population Biology
Gene Mapping
lcsh:R
Haplotype
Biology and Life Sciences
Computational Biology
Reproducibility of Results
Sequence Analysis
DNA

Genome Analysis
Diploidy
Research and analysis methods
Molecular biology techniques
030104 developmental biology
Haplotypes
Mutation
lcsh:Q
Sequence Alignment
Population Genetics
Software
Developmental Biology
Zdroj: PLoS ONE, Vol 11, Iss 11, p e0166721 (2016)
PLoS ONE
ISSN: 1932-6203
Popis: The genome of many species in the biosphere is a diploid consisting of paternal and maternal haplotypes. The differences between these two haplotypes range from single nucleotide polymorphisms (SNPs) to large-scale structural variations (SVs). Existing genome assemblers for next-generation sequencing platforms attempt to reconstruct one consensus sequence, which is a mosaic of two parental haplotypes. Reconstructing paternal and maternal haplotypes is an important task in linkage analysis and association studies. This study designs and implemented HapSVAssembler on the basis of Genetic Algorithm (GA) and paired-end sequencing. The proposed method builds a consensus sequence, identifies various types of heterozygous variants, and reconstructs the paternal and maternal haplotypes by solving an optimization problem with a GA algorithm. Experimental results indicate that the HapSVAssembler has high accuracy and contiguity under various sequencing coverage, error rates, and insert sizes. The program is tested on pilot sequencing of a highly heterozygous genome, and 12,781 heterozygous SNPs and 602 hemizygous SVs are identified. We observe that, although the number of SVs is much less than that of SNPs, the genomic regions occupied by SVs are much larger, implying the heterozygosity computed using SNPs or k-mer spectrum may be under-estimated.
Databáze: OpenAIRE