Lossless Compression of Cytometric Data

Autor: Vincent H.J. van der Velden, Anne E. Bras
Přispěvatelé: Immunology
Rok vydání: 2019
Předmět:
Zdroj: Cytometry Part A, 95(10), 1108-1112. Wiley-Liss Inc.
ISSN: 1552-4922
DOI: 10.1002/cyto.a.23879
Popis: Nowadays, most cytometrists apply lossless compression by storing their FCS files in ZIP archives. Unfortunately, ZIP only achieves modest space savings in cytometric data, due to DEFLATE being used as the underlying lossless compression algorithm (LCA). Presumably, other modern LCA can outperform DEFLATE, especially in terms of space savings. Twenty-one codecs (programs implementing LCA) were evaluated in 167,131 publicly available FCS files. Within floating-point data, as produced by modern instruments, most favorable compression ratios (CRs) were achieved by ZPAQ (median 0.469), BCM (median 0.523), and LZMA (median 0.545). In comparison, the DEFLATE-based codecs only achieved median CR of 0.728 under the most optimal conditions. By default, ZIP offers nine compression level (CL) settings, where lower ZIP-CL optimizes for time efficiency, while higher ZIP-CL optimizes for space efficiency. Interestingly, the third ZIP-CL already resulted in near optimal CR in 90% of the files with floating-point data, as produced by digital cytometers. LZMA is well established, widely supported, and actively maintained (in sharp contrast to ZPAQ and BCM) and therefore arguably the most attractive alternative for ZIP. Within floating-point data, by shifting from ZIP (under optimal conditions) to LZMA (at default settings), the median CR can be improved by 25%. Based on our results, cytometrists can benefit from state-of-the-art compression by choosing the appropriate codec for their situation. Our results are likely to speed-up the adaptation of modern codecs, as CR around 0.5 were beyond all expectations, and such space savings will benefit the field of cytometry. © 2019 International Society for Advancement of Cytometry.
Databáze: OpenAIRE