GRASShopPER-An algorithm for de novo assembly based on GPU alignments

Autor: Jan Badura, Michal Kierzynka, Wojciech Frohmberg, Jacek Blazewicz, Artur Laskowski, Aleksandra Swiercz, Piotr Zurkowski, Paweł T. Wojciechowski, Marta Kasprzak
Rok vydání: 2018
Předmět:
0301 basic medicine
Genomics Statistics
Nematoda
Computer science
0211 other engineering and technologies
Sequence assembly
lcsh:Medicine
02 engineering and technology
Genome
Database and Informatics Methods
lcsh:Science
021103 operations research
Multidisciplinary
Contig
Bacterial Genomics
Chromosome Biology
Autosomes
Microbial Genetics
High-Throughput Nucleotide Sequencing
Eukaryota
Genomics
Animal Models
Chromosome 14
Insects
Experimental Organism Systems
Graph (abstract data type)
Algorithm
Sequence Analysis
Algorithms
Research Article
Arthropoda
Sequence analysis
Bioinformatics
Sequence alignment
Grasshoppers
Microbial Genomics
Research and Analysis Methods
Microbiology
DNA sequencing
Chromosomes
03 medical and health sciences
Model Organisms
Actinomycetales
Genetics
Animals
Humans
Bacterial Genetics
Caenorhabditis elegans
Massively parallel
Whole genome sequencing
Chromosomes
Human
Pair 14

Sequence Assembly Tools
lcsh:R
Organisms
Chromosome
Biology and Life Sciences
Computational Biology
Bacteriology
Sequence Analysis
DNA

Cell Biology
Genome Analysis
Genomic Libraries
Chromosome Pairs
Invertebrates
030104 developmental biology
Caenorhabditis
lcsh:Q
Sequence Alignment
Zdroj: PLoS ONE
PLoS ONE, Vol 13, Iss 8, p e0202355 (2018)
ISSN: 1932-6203
Popis: Next generation sequencers produce billions of short DNA sequences in a massively parallel manner, which causes a great computational challenge in accurately reconstructing a genome sequence de novo using these short sequences. Here, we propose the GRASShopPER assembler, which follows an approach of overlap-layout-consensus. It uses an efficient GPU implementation for the sequence alignment during the graph construction stage and a greedy hyper-heuristic algorithm at the fork detection stage. A two-part fork detection method allows us to identify repeated fragments of a genome and to reconstruct them without misassemblies. The assemblies of data sets of bacteria Candidatus Microthrix, nematode Caenorhabditis elegans, and human chromosome 14 were evaluated with the golden standard tool QUAST. In comparison with other assemblers, GRASShopPER provided contigs that covered the largest part of the genomes and, at the same time, kept good values of other metrics, e.g., NG50 and misassembly rate.
Databáze: OpenAIRE