Lightweight taxonomic profiling of long-read metagenomic datasets with Lemur and Magnet.

Autor: Sapoval N; Department of Computer Science, Rice University, Houston, TX 77005, USA., Liu Y; Department of Computer Science, Rice University, Houston, TX 77005, USA., Curry KD; Department of Computer Science, Rice University, Houston, TX 77005, USA., Kille B; Department of Computer Science, Rice University, Houston, TX 77005, USA., Huang W; Department of Computer Science, Rice University, Houston, TX 77005, USA., Kokroko N; Department of Computer Science, Rice University, Houston, TX 77005, USA., Nute MG; Department of Computer Science, Rice University, Houston, TX 77005, USA., Tyshaieva A; Department of Computer Science, University of Maryland, College Park, MD 20742, USA., Dilthey A; Department of Computer Science, University of Maryland, College Park, MD 20742, USA., Molloy EK; Department of Bioengineerings, Rice University, Houston, TX 77005, USA., Treangen TJ; Department of Computer Science, Rice University, Houston, TX 77005, USA.; Department of Bioengineerings, Rice University, Houston, TX 77005, USA.
Jazyk: angličtina
Zdroj: BioRxiv : the preprint server for biology [bioRxiv] 2024 Aug 25. Date of Electronic Publication: 2024 Aug 25.
DOI: 10.1101/2024.06.01.596961
Abstrakt: The advent of long-read sequencing of microbiomes necessitates the development of new taxonomic profilers tailored to long-read shotgun metagenomic datasets. Here, we introduce Lemur and Magnet, a pair of tools optimized for lightweight and accurate taxonomic profiling for long-read shotgun metagenomic datasets. Lemur is a marker-gene-based method that leverages an EM algorithm to reduce false positive calls while preserving true positives; Magnet is a whole-genome read-mapping-based method that provides detailed presence and absence calls for bacterial genomes. We demonstrate that Lemur and Magnet can run in minutes to hours on a laptop with 32 GB of RAM, even for large inputs, a crucial feature given the portability of long-read sequencing machines. Furthermore, the marker gene database used by Lemur is only 4 GB and contains information from over 300,000 RefSeq genomes. Lemur and Magnet are open-source and available at https://github.com/treangenlab/lemur and https://github.com/treangenlab/magnet.
Competing Interests: 6Competing interests The authors declare that they have no competing interests.
Databáze: MEDLINE