Scaling Distributed Database Joins by Decoupling Computation and Communication

Autor:	Abhirup Chakraborty
Rok vydání:	2023
Předmět:	General Medicine
Zdroj:	International Journal of Database Management Systems. 15:1-18
ISSN:	0975-5985
DOI:	10.5121/ijdms.2023.15102
Popis:	To process a large volume of data, modern data management systems use a collection of machines connected through a network. This paper proposes frameworks and algorithms for processing distributed joins—a compute- and communication-intensive workload in modern data-intensive systems. By exploiting multiple processing cores within the individual machines, we implement a system to process database joins that parallelizes computation within each node, pipelines the computation with communication, parallelizes the communication by allowing multiple simultaneous data transfers (send/receive). Our experimental results show that using only four threads per node the framework achieves a 3.5x gains in intra-node performance while compared with a single-threaded counterpart. Moreover, with the join processing workload the cluster-wide performance (and speedup) is observed to be dictated by the intra-node computational loads; this property brings a near-linear speedup with increasing nodes in the system, a feature much desired in modern large-scale data processing system.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::424cc5a3804875b6eb9454721c434bea https://doi.org/10.5121/ijdms.2023.15102 Zobrazit plný text záznamu