Capturing diverse microbial sequence with comprehensive and scalable probe design

Autor: James Qu, Ikponmwonsa Odia, Douglas S. Kwon, Yasmine Rangel Vieira, Etienne Simon-Loriere, Hayden C. Metsky, Patrick Brehio, Leda Parham, Giselle Barbosa-Lima, Scott F. Michael, Scott Hennigan, David K Yang, Andreas Gnirke, Gregory D. Ebel, Augustine Goba, Eva Harris, Shirlee Wohl, Adrianne Gladden-Young, Fernando A. Bozza, Kayla G. Barnes, Amber Carter, Katherine J. Siddle, Lauren M. Paul, Aaron E. Lin, Souza Tml, Sandra Smole, Jonathan A. Runstadler, Pardis C. Sabeti, Damien C. Tully, Anne Piantadosi, Daniel J. Park, Christian T. Happi, Sharon Isern, Ivette Lorenzana, Andrew Goldfarb, Lee Gehrke, Bjӧrn Corleis, Todd M. Allen, Amanda L Tan, Angel Balmaseda, Philomena Eromon, Kimberly García, Irene Bosch, Donald S. Grant, Lisa E. Hensley, Onikepe A. Folarin, Christian B. Matranga
Rok vydání: 2018
Předmět:
DOI: 10.1101/279570
Popis: Metagenomic sequencing has the potential to transform microbial detection and characterization, but new tools are needed to improve its sensitivity. We developed CATCH (Compact Aggregation of Targets for Comprehensive Hybridization), a computational method to enhance nucleic acid capture for enrichment of diverse microbial taxa. CATCH designs compact probe sets that achieve full coverage of known sequence diversity and that scale well with this diversity. To illustrate applications of CATCH, we focused on capturing viral genomes. We designed, synthesized, and validated multiple probe sets, including one that targets whole genomes of the 356 viral species known to infect humans. Capture with these probe sets enriched unique viral content on average 18× and allowed us to assemble genomes that we could not otherwise recover, while accurately preserving within-sample diversity. We used this approach to recover genomes from the 2018 Lassa fever outbreak in Nigeria and to improve detection of viral infections in samples with unknown content. Together, this work demonstrates a path toward more sensitive, cost-effective metagenomic sequencing.
Databáze: OpenAIRE