SeqScreen: accurate and sensitive functional screening of pathogenic sequences via ensemble learning.

Autor: Balaji A; Department of Computer Science, Rice University, Houston, TX, USA., Kille B; Department of Computer Science, Rice University, Houston, TX, USA., Kappell AD; Signature Science, LLC, 8329 North Mopac Expressway, Austin, TX, USA., Godbold GD; Signature Science, LLC, 1670 Discovery Drive, Charlottesville, VA, USA., Diep M; Fraunhofer USA Center Mid-Atlantic CMA, Riverdale, MD, USA., Elworth RAL; Department of Computer Science, Rice University, Houston, TX, USA., Qian Z; Department of Computer Science, Rice University, Houston, TX, USA., Albin D; Department of Computer Science, Rice University, Houston, TX, USA., Nasko DJ; Department of Computer Science, University of Maryland, College Park, MD, USA., Shah N; Department of Computer Science, University of Maryland, College Park, MD, USA., Pop M; Department of Computer Science, University of Maryland, College Park, MD, USA., Segarra S; Department of Electrical and Computer Engineering, Rice University, Houston, TX, USA., Ternus KL; Signature Science, LLC, 8329 North Mopac Expressway, Austin, TX, USA. kternus@signaturescience.com., Treangen TJ; Department of Computer Science, Rice University, Houston, TX, USA. treangen@rice.edu.
Jazyk: angličtina
Zdroj: Genome biology [Genome Biol] 2022 Jun 20; Vol. 23 (1), pp. 133. Date of Electronic Publication: 2022 Jun 20.
DOI: 10.1186/s13059-022-02695-x
Abstrakt: The COVID-19 pandemic has emphasized the importance of accurate detection of known and emerging pathogens. However, robust characterization of pathogenic sequences remains an open challenge. To address this need we developed SeqScreen, which accurately characterizes short nucleotide sequences using taxonomic and functional labels and a customized set of curated Functions of Sequences of Concern (FunSoCs) specific to microbial pathogenesis. We show our ensemble machine learning model can label protein-coding sequences with FunSoCs with high recall and precision. SeqScreen is a step towards a novel paradigm of functionally informed synthetic DNA screening and pathogen characterization, available for download at www.gitlab.com/treangenlab/seqscreen .
(© 2022. The Author(s).)
Databáze: MEDLINE