MetaFlow|mics: Scalable and Reproducible Nextflow Pipelines for the Analysis of Microbiome Marker Data

Autor: Sean B. Cleveland, Cedric Arisdakessian, Mahdi Belcaid
Rok vydání: 2020
Předmět:
Zdroj: PEARC
Popis: Computational scalability has become an important requirement for processing the massive amounts of data generated in contemporary sequencing-based experiments. The availability of large computational resources through academic, regional or national cyber-infrastructure efforts, as well as through inexpensive cloud offerings, has shifted the bottleneck, which now lies in the extensive expertise necessary to create reproducible and scalable bioinformatics pipelines and deploy them to such diverse infrastructures. We present here MetaFlow|mics, a comprehensive pipeline for the analysis of microbiome marker data using best practices and state-of-the-art cyberinfrastructure standards to ensure reproducibility. MetaFlow|mics provides seamless scalability and extensibility, allowing users to build and test their pipelines on a laptop with small datasets and to subsequently run them on large datasets on an HPC or on the Cloud with a change to a single line of code. Our framework is built on top of the Nextflow workflow management system and provides an interoperable architecture that leverages self-contained Docker and Singularity instances with all the dependencies and requirements needed to quickly deploy and use the pipeline.
Databáze: OpenAIRE