Cluster-efficient pangenome graph construction with nf-core/pangenome.

Autor: Heumos S; Quantitative Biology Center (QBiC) Tübingen, University of Tübingen, Tübingen, Germany.; Biomedical Data Science, Dept. of Computer Science, University of Tübingen, Tübingen, Germany.; M3 Research Center, University Hospital Tübingen, Tübingen, Germany.; Institute for Bioinformatics and Medical Informatics (IBMI), Eberhard-Karls University of Tübingen, Tübingen, Germany., Heuer ML; University of California, Berkeley, Berkeley, California 94720, United States., Hanssen F; Quantitative Biology Center (QBiC) Tübingen, University of Tübingen, Tübingen, Germany.; Biomedical Data Science, Dept. of Computer Science, University of Tübingen, Tübingen, Germany.; M3 Research Center, University Hospital Tübingen, Tübingen, Germany.; Institute for Bioinformatics and Medical Informatics (IBMI), Eberhard-Karls University of Tübingen, Tübingen, Germany., Heumos L; Institute of Computational Biology, Department of Computational Health, Helmholtz Munich, Germany.; Comprehensive Pneumology Center with the CPC-M bioArchive, Helmholtz Zentrum Munich, Member of the German Center for Lung Research (DZL), Munich, Germany.; TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany., Guarracino A; Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee 38163, United States.; Human Technopole, Milan 20157, Italy., Heringer P; Quantitative Biology Center (QBiC) Tübingen, University of Tübingen, Tübingen, Germany.; Biomedical Data Science, Dept. of Computer Science, University of Tübingen, Tübingen, Germany.; M3 Research Center, University Hospital Tübingen, Tübingen, Germany.; Institute for Bioinformatics and Medical Informatics (IBMI), Eberhard-Karls University of Tübingen, Tübingen, Germany., Ehmele P; Institute of Computational Biology, Department of Computational Health, Helmholtz Munich, Germany., Prins P; Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee 38163, United States., Garrison E; Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee 38163, United States., Nahnsen S; Quantitative Biology Center (QBiC) Tübingen, University of Tübingen, Tübingen, Germany.; Biomedical Data Science, Dept. of Computer Science, University of Tübingen, Tübingen, Germany.; M3 Research Center, University Hospital Tübingen, Tübingen, Germany.; Institute for Bioinformatics and Medical Informatics (IBMI), Eberhard-Karls University of Tübingen, Tübingen, Germany.
Jazyk: angličtina
Zdroj: Bioinformatics (Oxford, England) [Bioinformatics] 2024 Oct 14. Date of Electronic Publication: 2024 Oct 14.
DOI: 10.1093/bioinformatics/btae609
Abstrakt: Motivation: Pangenome graphs offer a comprehensive way of capturing genomic variability across multiple genomes. However, current construction methods often introduce biases, excluding complex sequences or relying on references. The PanGenome Graph Builder (PGGB) addresses these issues. To date, though, there is no state-of-the-art pipeline allowing for easy deployment, efficient and dynamic use of available resources, and scalable usage at the same time.
Results: To overcome these limitations, we present nf-core/pangenome, a reference-unbiased approach implemented in Nextflow following nf-core's best practices. Leveraging biocontainers ensures portability and seamless deployment in HPC environments. Unlike PGGB, nf-core/pangenome distributes alignments across cluster nodes, enabling scalability. Demonstrating its efficiency, we constructed pangenome graphs for 1000 human chromosome 19 haplotypes and 2146 E. coli sequences, achieving a two to threefold speedup compared to PGGB without increasing greenhouse gas emissions.
Availability: Nf-core/pangenome is released under the MIT open-source license, available on GitHub and Zenodo, with documentation accessible at https://nf-co.re/pangenome/1.1.2/docs/usage.
Supplementary: Supplementary data are available at Bioinformatics online.
(© The Author(s) 2024. Published by Oxford University Press.)
Databáze: MEDLINE