An index scheme for fast data stream to distributed append-only store

Autor: Li Wang, Zhenjie Zhang, Deokwoo Jung, Marianne Winslet, Parijat Mazumdar
Rok vydání: 2016
Předmět:
Zdroj: WebDB
Popis: Distributed systems are now commonly used to manage massive data flooding from physical world, such as user-generated contents from online social media and communication records from mobile phones. The new generation of distributed data management systems, such as HBase, are designed to accept tuple insertions only, such that other database operations (e.g., deletion and update) are simply simulated by appending operation logs with keys associated to the target tuples. Such append-only store architecture maximizes the processing throughput on incoming data, but potentially incur higher costs on query processing, in which additional computation is needed to generate consistent snapshots of the database. Indexing is known as the key to enable efficient query processing by fast data retrieval and aggregation under such system architecture. This paper presents a new indexing scheme for distributed append-only stores. Our new scheme utilizes traditional index structures based on B-trees and its variant without the overhead of expensive node split, by using template-based tree construction. Optimized domain partitioning and multi-thread insertion techniques are further proposed to exploit the advantages of our template B-tree structure. Empirical evaluations show that our proposal outperforms existing solutions on insertion throughput by a large margin on a variety of real and synthetic workloads.
Databáze: OpenAIRE