Compression for population genetic data through finite-state entropy
Autor: | Lloyd T. Elliott, Winfield Chen |
---|---|
Rok vydání: | 2021 |
Předmět: |
education.field_of_study
Computer science Computation Entropy Population Entropy compression Dictionary coder File format Data Compression Biochemistry Computer Science Applications Genetics Population Compression (functional analysis) Code (cryptography) Entropy (information theory) education Molecular Biology Algorithm Algorithms Software Genome-Wide Association Study |
Zdroj: | Journal of bioinformatics and computational biology. 19(5) |
ISSN: | 1757-6334 |
Popis: | We improve the efficiency of population genetic file formats and GWAS computation by leveraging the distribution of samples in population-level genetic data. We identify conditional exchangeability of these data, recommending finite state entropy algorithms as an arithmetic code naturally suited for compression of population genetic data. We show between [Formula: see text] and [Formula: see text] speed and size improvements over modern dictionary compression methods that are often used for population genetic data such as Zstd and Zlib in computation and decompression tasks. We provide open source prototype software for multi-phenotype GWAS with finite state entropy compression demonstrating significant space saving and speed comparable to the state-of-the-art. |
Databáze: | OpenAIRE |
Externí odkaz: |