Compression for population genetic data through finite-state entropy

Autor: Lloyd T. Elliott, Winfield Chen
Rok vydání: 2021
Předmět:
Zdroj: Journal of bioinformatics and computational biology. 19(5)
ISSN: 1757-6334
Popis: We improve the efficiency of population genetic file formats and GWAS computation by leveraging the distribution of samples in population-level genetic data. We identify conditional exchangeability of these data, recommending finite state entropy algorithms as an arithmetic code naturally suited for compression of population genetic data. We show between [Formula: see text] and [Formula: see text] speed and size improvements over modern dictionary compression methods that are often used for population genetic data such as Zstd and Zlib in computation and decompression tasks. We provide open source prototype software for multi-phenotype GWAS with finite state entropy compression demonstrating significant space saving and speed comparable to the state-of-the-art.
Databáze: OpenAIRE