A hybrid approach for the automated finishing of bacterial genomes

Autor:	David Hsu, James H. Bullard, Dale R. Webster, Jackie Yen, Ellen E. Paxinos, Lori A. Rowe, Susana Wang, Paul Peluso, John J. Mekalanos, Brigid M. Davis, Khai Luong, Jon M. Sorenson, Andrew Kasarskis, Marie Valdovino, Matthew K. Waldor, Amruta Joshi, Maryann Turnsek, Eric E. Schadt, Cheryl L. Tarr, Brianna Lamay, Michael Frace, Chen-Shan Chin, Robert Sebra, Emilia Mollova, Ali Bashir, William P. Robins, Aaron Klammer, Steven Lin, Meredith Ashby
Rok vydání:	2012
Předmět:	Sequence analysis Molecular Sequence Data Biomedical Engineering Sequence assembly Bioengineering Bacterial genome size Computational biology Biology Applied Microbiology and Biotechnology Genome Article DNA sequencing Contig Mapping Cholera Genetics Base Sequence Contig Computational Biology Genes rRNA Sequence Analysis DNA Hybrid approach Molecular Medicine Algorithms Genome Bacterial Biotechnology
Zdroj:	Nature Biotechnology. 30:701-707
ISSN:	1546-1696 1087-0156
DOI:	10.1038/nbt.2288
Popis:	The multikilobase reads that can be produced by single-molecule sequencing technologies may span complex, repetitive genomic regions but have high error rates. Bashir et al. use these reads to organize contigs assembled from accurate, short-read data, facilitating the analysis of clinically important regions of an outbreak strain of cholera. Advances in DNA sequencing technology have improved our ability to characterize most genomic diversity. However, accurate resolution of large structural events is challenging because of the short read lengths of second-generation technologies. Third-generation sequencing technologies, which can yield longer multikilobase reads, have the potential to address limitations associated with genome assembly. Here we combine sequencing data from second- and third-generation DNA sequencing technologies to assemble the two-chromosome genome of a recent Haitian cholera outbreak strain into two nearly finished contigs at >99.9% accuracy. Complex regions with clinically relevant structure were completely resolved. In separate control assemblies on experimental and simulated data for the canonical N16961 cholera reference strain, we obtained 14 scaffolds of greater than 1 kb for the experimental data and 8 scaffolds of greater than 1 kb for the simulated data, which allowed us to correct several errors in contigs assembled from the short-read data alone. This work provides a blueprint for the next generation of rapid microbial identification and full-genome assembly.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::a15d8723ca843cdd0b8a8d2eaa794df1 https://doi.org/10.1038/nbt.2288 Zobrazit plný text záznamu