Popis: |
High-throughput DNA sequencing technologies open the gate to tremendous (meta)genomic data from yet-to-be-explored microbial dark matter. However, accurately assigning protein functions to new gene sequences remains challenging. To this end, we developed FunGeneTyper, an expandable deep learning-based framework with models, structured databases and tools for ultra-accurate (>0.99) and fine-grained classification and discovery of antibiotic resistance genes (ARGs) and virulence factor or toxin genes. Specifically, this new framework achieves superior performance in discovering new ARGs from human gut (accuracy: 0.8512; and F1-score: 0.6948), wastewater (0.7273; 0.6072), and soil (0.8269; 0.5445) samples, beating the state-of-the-art bioinformatics tools and protein sequence-based (F1-score: 0.0556-0.5065) and domain-based (F1-score: 0.2630-0.5224) alignment approaches. We empowered the generalized application of the framework by implementing a lightweight, privacy-preserving and plug-and-play neural network module shareable among global developers and users. The FunGeneTyper*is released to promote the monitoring of key functional genes and discovery of precious enzymatic resources from diverse microbiomes. |