MetaBakery: a Singularity implementation of bioBakery tools as a skeleton application for efficient HPC deconvolution of microbiome metagenomic sequencing data to machine learning ready information.
Autor: | Murovec B; University of Ljubljana, Faculty of Electrical Engineering, Ljubljana, Slovenia., Deutsch L; University of Ljubljana, Department of Animal Science, Biotechnical Faculty, Ljubljana, Slovenia.; The NU, The Nu B.V., Leiden, Netherlands., Osredkar D; Department of Pediatric Neurology, University Children's Hospital, University Medical Centre Ljubljana, Ljubljana, Slovenia.; University of Ljubljana, Medical Faculty, Ljubljana, Slovenia., Stres B; University of Ljubljana, Department of Animal Science, Biotechnical Faculty, Ljubljana, Slovenia.; D13 Department of Catalysis and Chemical Reaction Engineering, National Institute of Chemistry, Ljubljana, Slovenia.; University of Ljubljana, Faculty of Civil and Geodetic Engineering, Ljubljana, Slovenia.; Department of Automation, Biocybernetics and Robotics, Jožef Stefan Institute, Ljubljana, Slovenia. |
---|---|
Jazyk: | angličtina |
Zdroj: | Frontiers in microbiology [Front Microbiol] 2024 Jul 30; Vol. 15, pp. 1426465. Date of Electronic Publication: 2024 Jul 30 (Print Publication: 2024). |
DOI: | 10.3389/fmicb.2024.1426465 |
Abstrakt: | In this study, we present MetaBakery (http://metabakery.fe.uni-lj.si), an integrated application designed as a framework for synergistically executing the bioBakery workflow and associated utilities. MetaBakery streamlines the processing of any number of paired or unpaired fastq files, or a mixture of both, with optional compression (gzip, zip, bzip2, xz, or mixed) within a single run. MetaBakery uses programs such as KneadData (https://github.com/bioBakery/kneaddata), MetaPhlAn, HUMAnN and StrainPhlAn as well as integrated utilities and extends the original functionality of bioBakery. In particular, it includes MelonnPan for the prediction of metabolites and Mothur for calculation of microbial alpha diversity. Written in Python 3 and C++ the whole pipeline was encapsulated as Singularity container for efficient execution on various computing infrastructures, including large High-Performance Computing clusters. MetaBakery facilitates crash recovery, efficient re-execution upon parameter changes, and processing of large data sets through subset handling and is offered in three editions with bioBakery ingredients versions 4, 3 and 2 as versatile, transparent and well documented within the MetaBakery Users' Manual (http://metabakery.fe.uni-lj.si/metabakery_manual.pdf). It provides automatic handling of command line parameters, file formats and comprehensive hierarchical storage of output to simplify navigation and debugging. MetaBakery filters out potential human contamination and excludes samples with low read counts. It calculates estimates of alpha diversity and represents a comprehensive and augmented re-implementation of the bioBakery workflow. The robustness and flexibility of the system enables efficient exploration of changing parameters and input datasets, increasing its utility for microbiome analysis. Furthermore, we have shown that the MetaBakery tool can be used in modern biostatistical and machine learning approaches including large-scale microbiome studies. Competing Interests: LD was employed by The Nu B.V. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. (Copyright © 2024 Murovec, Deutsch, Osredkar and Stres.) |
Databáze: | MEDLINE |
Externí odkaz: |