Genome Compression: An Image-Based Approach

Autor: Roberto Hiroshi Herai, Juliano V. Martins, Kelvin V. Kredens, Edson Emílio Scalabrin, Osmar Betazzi Dordal, Bráulio Coelho Ávila
Rok vydání: 2018
Předmět:
Zdroj: Artificial Intelligence and Soft Computing ISBN: 9783319912615
ICAISC (2)
DOI: 10.1007/978-3-319-91262-2_22
Popis: With the advent of Next Generation Sequencing Technologies, it has been possible to reduce the cost and time of genome sequencing. Thus, there was a significant increase in demand for genomes that were assembled daily. This demand requires more efficient techniques for storing and transmitting genomic data. In this research, we discussed the horizontal compression of lossless genomic sequences, using two image formats, WEBP, and FLIF. For this, the genomic sequence is transformed into a matrix of colored pixels, where an RGB color is assigned to each symbol of the A, T, C, G alphabet at a position x-y. The WEBP format showed the best data-rate saving (76.15%, SD = 0.84) when compared to FLIF. In addition, we compared the data-rate savings of two specialized DELIMINATE and MPCompress genomic data compression tools with WEBP. The results obtained show that the WEBP is close to DELIMINATE (76.03%, SD = 2.54%) and MFCompress (76.97%). SD = 1.36%). Finally, we suggest using WEBP for genomic data compression.
Databáze: OpenAIRE