Polypolish: Short-read polishing of long-read bacterial genome assemblies

Autor: Ryan Wick, Kathryn Holt
Jazyk: angličtina
Rok vydání: 2022
Předmět:
DNA
Bacterial

Multiple Alignment Calculation
Bioinformatics
QH301-705.5
Microbial Genomics
Research and Analysis Methods
Pathology and Laboratory Medicine
Microbiology
Biochemistry
Database and Informatics Methods
Cellular and Molecular Neuroscience
Klebsiella
Computational Techniques
Medicine and Health Sciences
Genetics
Bacterial Genetics
Nanotechnology
Genome Sequencing
Biology (General)
Molecular Biology Techniques
Sequencing Techniques
Microbial Pathogens
Molecular Biology
Ecology
Evolution
Behavior and Systematics

Repetitive Sequences
Nucleic Acid

Bacterial Genomics
Bacteria
Ecology
Nucleotides
Microbial Genetics
Organisms
High-Throughput Nucleotide Sequencing
Biology and Life Sciences
Bacteriology
Genomics
Sequence Analysis
DNA

Split-Decomposition Method
Bacterial Pathogens
Computational Theory and Mathematics
Medical Microbiology
Modeling and Simulation
Engineering and Technology
Pathogens
Klebsiella Oxytoca
Sequence Alignment
Sequence Analysis
Genome
Bacterial

Research Article
Zdroj: PLoS Computational Biology, Vol 18, Iss 1, p e1009802 (2022)
PLoS Computational Biology
ISSN: 1553-734X
Popis: Long-read-only bacterial genome assemblies usually contain residual errors, most commonly homopolymer-length errors. Short-read polishing tools can use short reads to fix these errors, but most rely on short-read alignment which is unreliable in repeat regions. Errors in such regions are therefore challenging to fix and often remain after short-read polishing. Here we introduce Polypolish, a new short-read polisher which uses all-per-read alignments to repair errors in repeat sequences that other polishers cannot. Polypolish performed well in benchmarking tests using both simulated and real reads, and it almost never introduced errors during polishing. The best results were achieved by using Polypolish in combination with other short-read polishers.
Author summary Recent improvements in Oxford Nanopore Technologies sequencing platforms and assembly algorithms have made it easier than ever to generate complete bacterial genome sequences. However, Oxford Nanopore genome sequences suffer from errors that limit their utility in downstream analyses. To fix these errors, one can ‘polish’ the genome with Illumina sequencing, exploiting the fact that Oxford Nanopore and Illumina sequencing have different error profiles. There are several polishing tools which can fix most errors in an Oxford Nanopore genome, but they struggle with errors in repetitive regions of the genome. With this in mind, we have developed a polisher, Polypolish, which uses a novel approach that allows it to fix more errors in genomic repeats. Our results show that Polypolish is both effective at repairing sequence errors and very unlikely to introduce new errors. Polypolish can often fix errors that other polishers cannot and vice versa, so the best results come from using a combination of tools. Polypolish therefore has an important role in bacterial genome assembly methods that aim for the highest possible sequence accuracy.
Databáze: OpenAIRE