Connected Components in MapReduce and Beyond

Autor:	Raimondas Kiveris, Vahab Mirrokni, Silvio Lattanzi, Vibhor Rastogi, Sergei Vassilvitskii
Rok vydání:	2014
Předmět:	Connected component Theoretical computer science Computer science Subroutine Large scale data Graph Data mining algorithm Distributed hash table Clustering coefficient
Zdroj:	SoCC
DOI:	10.1145/2670979.2670997
Popis:	Computing connected components of a graph lies at the core of many data mining algorithms, and is a fundamental subroutine in graph clustering. This problem is well studied, yet many of the algorithms with good theoretical guarantees perform poorly in practice, especially when faced with graphs with hundreds of billions of edges. In this paper, we design improved algorithms based on traditional MapReduce architecture for large scale data analysis. We also explore the effect of augmenting MapReduce with a distributed hash table (DHT) service. We show that these algorithms have provable theoretical guarantees, and easily outperform previously studied algorithms, sometimes by more than an order of magnitude. In particular, our iterative MapReduce algorithms run 3 to 15 times faster than the best previously studied algorithms, and the MapReduce implementation using a DHT is 10 to 30 times faster than the best previously studied algorithms. These are the fastest algorithms that easily scale to graphs with hundreds of billions of edges.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::f35bbe0cbb950793b700a94dbfc34356 https://doi.org/10.1145/2670979.2670997 Zobrazit plný text záznamu