Optimistically Compressed Hash Tables & Strings in theUSSR

Autor:	Peter Boncz, Viktor Leis, Tim Gubner
Rok vydání:	2021
Předmět:	Computer science Value (computer science) 02 engineering and technology Parallel computing Hash table Prefix String operations 020204 information systems 0202 electrical engineering electronic engineering information engineering Benchmark (computing) Memory footprint 020201 artificial intelligence & image processing Tuple Software Information Systems Integer (computer science)
Zdroj:	ACM SIGMOD Record. 50:60-67
ISSN:	0163-5808
DOI:	10.1145/3471485.3471500
Popis:	Modern query engines rely heavily on hash tables for query processing. Overall query performance and memory footprint is often determined by how hash tables and the tuples within them are represented. In this work, we propose three complementary techniques to improve this representation: Domain-Guided Prefix Suppression bit-packs keys and values tightly to reduce hash table record width. Optimistic Splitting decomposes values (and operations on them) into (operations on) frequently- and infrequently-accessed value slices. By removing the infrequently-accessed value slices from the hash table record, it improves cache locality. The Unique Strings Self-aligned Region (USSR) accelerates handling frequently occurring strings, which are widespread in real-world data sets, by creating an on-the-fly dictionary of the most frequent strings. This allows executing many string operations with integer logic and reduces memory pressure. We integrated these techniques into Vectorwise. On the TPC-H benchmark, our approach reduces peak memory consumption by 2-4× and improves performance by up to 1.5×. On a real-world BI workload, we measured a 2× improvement in performance and in micro-benchmarks we observed speedups of up to 25×.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::aa0ef71f267758fa1afaf64e417694c6 https://doi.org/10.1145/3471485.3471500 Zobrazit plný text záznamu