Memory-efficient groupby-aggregate using compressed buffer trees

Autor:	David G. Andersen, Karsten Schwan, Erik Zawadzki, Athula Balachandran, Michael Kaminsky, Wolfgang Richter, Hrishikesh Amur
Rok vydání:	2013
Předmět:	Tree (data structure) Hardware_MEMORYSTRUCTURES Compressed data structure Computer science Analytics business.industry Serialization Aggregate (data warehouse) Parallel computing business Throughput (business) Attribute–value pair Abstraction (linguistics)
Zdroj:	SoCC
DOI:	10.1145/2523616.2523625
Popis:	The rapid growth of fast analytics systems, that require data processing in memory, makes memory capacity an increasingly-precious resource. This paper introduces a new compressed data structure called a Compressed Buffer Tree (CBT). Using a combination of techniques including buffering, compression, and serialization, CBTs improve the memory efficiency and performance of the GroupBy-Aggregate abstraction that forms the basis of not only batch-processing models like MapReduce, but recent fast analytics systems too. For streaming workloads, aggregation using the CBT uses 21--42% less memory than using Google SparseHash with up to 16% better throughput. The CBT is also compared to batch-mode aggregators in MapReduce runtimes such as Phoenix++ and Metis and consumes 4x and 5x less memory with 1.5--2x and 3--4x more performance respectively.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::6e96fdf6a9be7f0f0c5ed1f8b0a013c2 https://doi.org/10.1145/2523616.2523625 Zobrazit plný text záznamu