Autor: |
Dezordi FZ; Department of Entomology and Bioinformatics Core, Aggeu Magalhães Institute-Oswaldo Cruz Foundation (Fiocruz), Campus UFPE-Av. Prof. Moraes Rego s/n, Recife 50670-420, Brazil., Neto AMDS; Bioinformatics Core, Aggeu Magalhães Institute-Oswaldo Cruz Foundation (Fiocruz), Campus UFPE-Av. Prof. Moraes Rego s/n, Recife 50670-420, Brazil., Campos TL; Bioinformatics Core, Aggeu Magalhães Institute-Oswaldo Cruz Foundation (Fiocruz), Campus UFPE-Av. Prof. Moraes Rego s/n, Recife 50670-420, Brazil., Jeronimo PMC; Oswaldo Cruz Foundation (Fiocruz), Branch Ceará, Eusebio 61760-000, Brazil., Aksenen CF; Oswaldo Cruz Foundation (Fiocruz), Branch Ceará, Eusebio 61760-000, Brazil., Almeida SP; Oswaldo Cruz Foundation (Fiocruz), Branch Ceará, Eusebio 61760-000, Brazil., Wallau GL; Department of Entomology and Bioinformatics Core, Aggeu Magalhães Institute-Oswaldo Cruz Foundation (Fiocruz), Campus UFPE-Av. Prof. Moraes Rego s/n, Recife 50670-420, Brazil., On Behalf Of The Fiocruz Covid-Genomic Surveillance Network |
Abstrakt: |
The COVID-19 pandemic is driven by Severe Acute Respiratory Syndrome coronavirus 2 (SARS-CoV-2) that emerged in 2019 and quickly spread worldwide. Genomic surveillance has become the gold standard methodology used to monitor and study this fast-spreading virus and its constantly emerging lineages. The current deluge of SARS-CoV-2 genomic data generated worldwide has put additional pressure on the urgent need for streamlined bioinformatics workflows. Here, we describe a workflow developed by our group to process and analyze large-scale SARS-CoV-2 Illumina amplicon sequencing data. This workflow automates all steps of SARS-CoV-2 reference-based genomic analysis: data processing, genome assembly, PANGO lineage assignment, mutation analysis and the screening of intrahost variants. The pipeline is capable of processing a batch of around 100 samples in less than half an hour on a personal laptop or in less than five minutes on a server with 50 threads. The workflow presented here is available through Docker or Singularity images, allowing for implementation on laptops for small-scale analyses or on high processing capacity servers or clusters. Moreover, the low requirements for memory and CPU cores and the standardized results provided by ViralFlow highlight it as a versatile tool for SARS-CoV-2 genomic analysis. |