A Genetic Algorithm for Diploid Genome Reconstruction Using Paired-End Sequencing
Autor: | Yao-Ting Huang, Sheng-Yu Chuang, Choun-Sea Lin, Ming-Tsai Chan, Jian-Wei Chen, Chuan-Kang Ting |
---|---|
Rok vydání: | 2016 |
Předmět: |
0301 basic medicine
Cancer genome sequencing Molecular biology lcsh:Medicine Genome DNA library construction Genome Sequencing lcsh:Science Paired-end tag Genetics Multidisciplinary Chromosome Mapping High-Throughput Nucleotide Sequencing Genomics Genome project Genomic Library Construction Epigenetics Sequence Analysis Algorithms Research Article Heterozygote Single-nucleotide polymorphism DNA construction Biology Genome Complexity Polymorphism Single Nucleotide Evolution Molecular Genomic Imprinting 03 medical and health sciences Computer Simulation Sequencing Techniques Genetic association Evolutionary Biology Sequence Assembly Tools Population Biology Gene Mapping lcsh:R Haplotype Biology and Life Sciences Computational Biology Reproducibility of Results Sequence Analysis DNA Genome Analysis Diploidy Research and analysis methods Molecular biology techniques 030104 developmental biology Haplotypes Mutation lcsh:Q Sequence Alignment Population Genetics Software Developmental Biology |
Zdroj: | PLoS ONE, Vol 11, Iss 11, p e0166721 (2016) PLoS ONE |
ISSN: | 1932-6203 |
Popis: | The genome of many species in the biosphere is a diploid consisting of paternal and maternal haplotypes. The differences between these two haplotypes range from single nucleotide polymorphisms (SNPs) to large-scale structural variations (SVs). Existing genome assemblers for next-generation sequencing platforms attempt to reconstruct one consensus sequence, which is a mosaic of two parental haplotypes. Reconstructing paternal and maternal haplotypes is an important task in linkage analysis and association studies. This study designs and implemented HapSVAssembler on the basis of Genetic Algorithm (GA) and paired-end sequencing. The proposed method builds a consensus sequence, identifies various types of heterozygous variants, and reconstructs the paternal and maternal haplotypes by solving an optimization problem with a GA algorithm. Experimental results indicate that the HapSVAssembler has high accuracy and contiguity under various sequencing coverage, error rates, and insert sizes. The program is tested on pilot sequencing of a highly heterozygous genome, and 12,781 heterozygous SNPs and 602 hemizygous SVs are identified. We observe that, although the number of SVs is much less than that of SNPs, the genomic regions occupied by SVs are much larger, implying the heterozygosity computed using SNPs or k-mer spectrum may be under-estimated. |
Databáze: | OpenAIRE |
Externí odkaz: |