Benchmarking bacterial taxonomic classification using nanopore metagenomics data of several mock communities.
Autor: | Van Uffelen A; Transversal activities in Applied Genomics, Sciensano, Brussels, Belgium.; Department of Information Technology, Internet Technology and Data Science Lab (IDLab), Interuniversity Microelectronics Centre (IMEC), Ghent University, Ghent, Belgium.; Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium., Posadas A; Transversal activities in Applied Genomics, Sciensano, Brussels, Belgium.; Department of Information Technology, Internet Technology and Data Science Lab (IDLab), Interuniversity Microelectronics Centre (IMEC), Ghent University, Ghent, Belgium.; Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium., Roosens NHC; Transversal activities in Applied Genomics, Sciensano, Brussels, Belgium., Marchal K; Department of Information Technology, Internet Technology and Data Science Lab (IDLab), Interuniversity Microelectronics Centre (IMEC), Ghent University, Ghent, Belgium.; Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium.; Department of Genetics, University of Pretoria, Pretoria, South Africa., De Keersmaecker SCJ; Transversal activities in Applied Genomics, Sciensano, Brussels, Belgium., Vanneste K; Transversal activities in Applied Genomics, Sciensano, Brussels, Belgium. kevin.vanneste@sciensano.be. |
---|---|
Jazyk: | angličtina |
Zdroj: | Scientific data [Sci Data] 2024 Aug 10; Vol. 11 (1), pp. 864. Date of Electronic Publication: 2024 Aug 10. |
DOI: | 10.1038/s41597-024-03672-8 |
Abstrakt: | Taxonomic classification is crucial in identifying organisms within diverse microbial communities when using metagenomics shotgun sequencing. While second-generation Illumina sequencing still dominates, third-generation nanopore sequencing promises improved classification through longer reads. However, extensive benchmarking studies on nanopore data are lacking. We systematically evaluated performance of bacterial taxonomic classification for metagenomics nanopore sequencing data for several commonly used classifiers, using standardized reference sequence databases, on the largest collection of publicly available data for defined mock communities thus far (nine samples), representing different research domains and application scopes. Our results categorize classifiers into three categories: low precision/high recall; medium precision/medium recall, and high precision/medium recall. Most fall into the first group, although precision can be improved without excessively penalizing recall with suitable abundance filtering. No definitive 'best' classifier emerges, and classifier selection depends on application scope and practical requirements. Although few classifiers designed for long reads exist, they generally exhibit better performance. Our comprehensive benchmarking provides concrete recommendations, supported by publicly available code for reassessment and fine-tuning by other scientists. (© 2024. The Author(s).) |
Databáze: | MEDLINE |
Externí odkaz: |