RENANO: a REference-based compressor for NANOpore FASTQ files

Autor: Pablo Smircich, Gadiel Seroussi, José R. Sotelo-Silveira, Alvaro Martin, Guillermo Dufort y Álvarez, Idoia Ochoa
Rok vydání: 2021
Předmět:
Zdroj: Bioinformatics (Oxford, England).
ISSN: 1367-4811
Popis: Motivation Nanopore sequencing technologies are rapidly gaining popularity, in part, due to the massive amounts of genomic data they produce in short periods of time (up to 8.5 TB of data in Results We introduce RENANO, a reference-based lossless data compressor specifically tailored to FASTQ files generated with nanopore sequencing technologies. RENANO improves on its predecessor ENANO, currently the state of the art, by providing a more efficient base call sequence compression component. Two compression algorithms are introduced, corresponding to the following scenarios: (1) a reference genome is available without cost to both the compressor and the decompressor and (2) the reference genome is available only on the compressor side, and a compacted version of the reference is included in the compressed file. We compare the compression performance of RENANO against ENANO on several publicly available nanopore datasets. RENANO improves the base call sequences compression of ENANO by 39.8% in scenario (1), and by 33.5% in scenario (2), on average, over all the datasets. As for total file compression, the average improvements are 12.7% and 10.6%, respectively. We also show that RENANO consistently outperforms the recent general-purpose genomic compressor Genozip. Availability and implementation RENANO is freely available for download at: https://github.com/guilledufort/RENANO. Supplementary information Supplementary data are available at Bioinformatics online.
Databáze: OpenAIRE