Efficient hierarchical clustering for single-dimensional data using CUDA

Autor:	Adam Rehn, Aidan L. Possemiers, Jason Holdsworth
Rok vydání:	2018
Předmět:	0301 basic medicine 020203 distributed computing Computer science Correlation clustering 02 engineering and technology Parallel computing Benchmarking computer.software_genre Hierarchical clustering 03 medical and health sciences CUDA 030104 developmental biology 0202 electrical engineering electronic engineering information engineering Data mining General-purpose computing on graphics processing units Cluster analysis Implementation Massively parallel computer
Zdroj:	ACSW
DOI:	10.1145/3167918.3167929
Popis:	Hierarchical clustering is a widely-used and well-researched clustering technique. The classical algorithm for agglomerative hierarchical clustering is prohibitively expensive for use with large datasets. Numerous algorithms exist to improve the efficiency of hierarchical clustering for various linkage metrics, and for large datasets. Recent research has proposed approaches for improving the efficiency of hierarchical clustering through parallelism. The newest approaches utilise GPGPU technologies, which facilitate massive parallelism on commodity consumer hardware. Existing GPGPU implementations fail to maximise the number of merges that can be performed in parallel, and feature high use of memory. These limitations prevent existing implementations from achieving the full performance offered by GPGPU. In this paper, we propose a novel GPGPU algorithm for hierarchical clustering of single-dimensional data. Our proposed algorithm exploits the unique characteristics of one-dimensional data to maximise merge parallelism and significantly reduce memory requirements. Validation demonstrates that our proposed algorithm produces equivalent results to the classical algorithm for both the single-linkage and complete-linkage metrics. Benchmarking results show that our algorithm scales well to large datasets, and offers a substantial speed-up over the classical algorithm. Future work will look to extend our proposed approach to larger datasets with higher dimensions.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::9b1e825c20102452b49b2b6a08318641 https://doi.org/10.1145/3167918.3167929 Zobrazit plný text záznamu