CellHeap: A Workflow for Optimizing COVID-19 Single-Cell RNA-Seq Data Processing in the Santos Dumont Supercomputer

Autor: Anna Cristina Calçada Carvalho, Maiana O. C. Costa, Maria Emilia M. T. Walter, Andrea Henriques-Pons, Marcelo dos Santos, Vanessa S. Silva, Marisa Fabiana Nicolás, Maria Clicia S. Castro, Fabricio Alves Barbosa da Silva, Kary A. C. S. Ocaña, Alba Cristina Magalhaes Alves de Melo, Helena Schubert da Incarnação Lima Silva
Rok vydání: 2021
Předmět:
Zdroj: Advances in Bioinformatics and Computational Biology ISBN: 9783030918132
Popis: Currently, several hundreds of Terabytes of COVID-19 single-cell RNA-seq (scRNA-seq) data are available in public repositories. This data refers to multiple tissues, comorbidities, and conditions. We expect this trend to continue, and it is realistic to predict amounts of COVID-19 scRNA-seq data increasing to several Petabytes in the coming years. However, thoughtful analysis of this data requires large-scale computing infrastructures, and software systems optimized for such platforms to generate biological knowledge. This paper presents CellHeap, a portable and robust workflow for scRNA-seq customizable analyses, with quality control throughout the execution steps and deployable on supercomputers. Furthermore, we present the deployment of CellHeap in the Santos Dumont supercomputer for analyzing COVID-19 scRNA-seq datasets, and discuss a case study that processed dozens of Terabytes of COVID-19 scRNA-seq raw data.
Databáze: OpenAIRE