Memory-efficient groupby-aggregate using compressed buffer trees

Autor: David G. Andersen, Karsten Schwan, Erik Zawadzki, Athula Balachandran, Michael Kaminsky, Wolfgang Richter, Hrishikesh Amur
Rok vydání: 2013
Předmět:
Zdroj: SoCC
DOI: 10.1145/2523616.2523625
Popis: The rapid growth of fast analytics systems, that require data processing in memory, makes memory capacity an increasingly-precious resource. This paper introduces a new compressed data structure called a Compressed Buffer Tree (CBT). Using a combination of techniques including buffering, compression, and serialization, CBTs improve the memory efficiency and performance of the GroupBy-Aggregate abstraction that forms the basis of not only batch-processing models like MapReduce, but recent fast analytics systems too. For streaming workloads, aggregation using the CBT uses 21--42% less memory than using Google SparseHash with up to 16% better throughput. The CBT is also compared to batch-mode aggregators in MapReduce runtimes such as Phoenix++ and Metis and consumes 4x and 5x less memory with 1.5--2x and 3--4x more performance respectively.
Databáze: OpenAIRE