Popis: |
Order-preserving compression can improve sorting and searching performance, and hence the performance of database systems. We describe a new parsing (tokenization) technique that can be applied to variable-length "keys", producing substantial compression. It can both compress and decompress data, permitting variable lengths for dictionary entries and compressed forms. The key notion is to partition the space of strings into ranges, encoding the common prefix of each range. We illustrate our method with padding character compression for multi-field keys, demonstrating the dramatic gains possible. A specific version of the method has been implemented in Digital's Rdb relational database system to enable effective multi-field compression. |