Compression for population genetic data through finite-state entropy
Autor: | L. T. Elliott, Winfield Chen |
---|---|
Rok vydání: | 2021 |
Předmět: |
education.field_of_study
Computer science Entropy (statistical thermodynamics) Computation Population Entropy compression Sample (statistics) Dictionary coder Entropy (classical thermodynamics) Compression (functional analysis) Entropy (information theory) Entropy (energy dispersal) education Entropy (arrow of time) Algorithm Entropy (order and disorder) |
DOI: | 10.1101/2021.02.17.431713 |
Popis: | We improve the efficiency of population genetic file formats and GWAS computation by leveraging the distribution of sample ordering in population-level genetic data. We identify conditional exchangeability of these data, recommending finite state entropy algorithms as an arithmetic code naturally suited to population genetic data. We show between 10% and 40% speed and size improvements over dictionary compression methods for population genetic data such as Zstd and Zlib in computation and and decompression tasks. We provide a prototype for genome-wide association study with finite state entropy compression demonstrating significant space saving and speed comparable to the state-of-the-art. |
Databáze: | OpenAIRE |
Externí odkaz: |