Scaling Distributed Database Joins by Decoupling Computation and Communication

Autor: Abhirup Chakraborty
Rok vydání: 2023
Předmět:
Zdroj: International Journal of Database Management Systems. 15:1-18
ISSN: 0975-5985
DOI: 10.5121/ijdms.2023.15102
Popis: To process a large volume of data, modern data management systems use a collection of machines connected through a network. This paper proposes frameworks and algorithms for processing distributed joins—a compute- and communication-intensive workload in modern data-intensive systems. By exploiting multiple processing cores within the individual machines, we implement a system to process database joins that parallelizes computation within each node, pipelines the computation with communication, parallelizes the communication by allowing multiple simultaneous data transfers (send/receive). Our experimental results show that using only four threads per node the framework achieves a 3.5x gains in intra-node performance while compared with a single-threaded counterpart. Moreover, with the join processing workload the cluster-wide performance (and speedup) is observed to be dictated by the intra-node computational loads; this property brings a near-linear speedup with increasing nodes in the system, a feature much desired in modern large-scale data processing system.
Databáze: OpenAIRE