Scalable Community Detection via Parallel Correlation Clustering

Autor:	Jakub Lacki, David Eisenstat, Vahab Mirrokni, Jessica Shi, Laxman Dhulipala
Rok vydání:	2021
Předmět:	Social and Information Networks (cs.SI) FOS: Computer and information sciences Computer Science - Machine Learning 020203 distributed computing Computer Science - Artificial Intelligence Computer science Correlation clustering General Engineering Computer Science - Social and Information Networks 02 engineering and technology computer.software_genre Machine Learning (cs.LG) Artificial Intelligence (cs.AI) Computer Science - Distributed Parallel and Cluster Computing 020204 information systems Scalability 0202 electrical engineering electronic engineering information engineering Data mining Distributed Parallel and Cluster Computing (cs.DC) computer
DOI:	10.48550/arxiv.2108.01731
Popis:	Graph clustering and community detection are central problems in modern data mining. The increasing need for analyzing billion-scale data calls for faster and more scalable algorithms for these problems. There are certain trade-offs between the quality and speed of such clustering algorithms. In this paper, we design scalable algorithms that achieve high quality when evaluated based on ground truth. We develop a generalized sequential and shared-memory parallel framework based on the LambdaCC objective (introduced by Veldt et al.), which encompasses modularity and correlation clustering. Our framework consists of highly-optimized implementations that scale to large data sets of billions of edges and that obtain high-quality clusters compared to ground-truth data, on both unweighted and weighted graphs. Our empirical evaluation shows that this framework improves the state-of-the-art trade-offs between speed and quality of scalable community detection. For example, on a 30-core machine with two-way hyper-threading, our implementations achieve orders of magnitude speedups over other correlation clustering baselines, and up to 28.44x speedups over our own sequential baselines while maintaining or improving quality. Comment: This is a preliminary version of a paper that will appear at VLDB'21
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::a897d5af99e6691f28675bcb41d1656a Zobrazit plný text záznamu