Genetic Algorithm Based Parallel K-Means Data Clustering Algorithm Using MapReduce Programming Paradigm on Hadoop Environment (GAPKCA)

Autor:	Rusli Bin Abdullah, Maslina Binti Zolkepli, Sayer Alshammari
Rok vydání:	2019
Předmět:	0209 industrial biotechnology Speedup Computer science k-means clustering Image processing 02 engineering and technology Document clustering 020901 industrial engineering & automation Genetic algorithm 0202 electrical engineering electronic engineering information engineering Programming paradigm 020201 artificial intelligence & image processing Document retrieval Cluster analysis Algorithm
Zdroj:	Advances in Intelligent Systems and Computing ISBN: 9783030360559 SCDM
DOI:	10.1007/978-3-030-36056-6_10
Popis:	Data clustering algorithm has been receiving considerable attention in many application areas such as data mining, document retrieval, image processing and pattern classification. A hybrid data clustering algorithm using the combination of genetic algorithm (GA) with a popular variant of K-Means clustering algorithm, parallel k-Means clustering algorithm (PKCA) is proposed in this paper. The objective of the proposed algorithm is to combine the search process of GA to generate new data clusters and apply parallel K-Means to further speed up the quality of the search process during clusters formation. The proposed approach is implemented using the popular MapReduce programming model on Hadoop framework. Experiments were conducted with multiple synthetic datasets to evaluate the performance of the proposed algorithm. Results show that the proposed algorithm was able to speed up document clustering process by 0.54 s on average and outperformed PKCA. Data analysts in marketing and finance, telecommunication and transport companies and researchers in academia can use this algorithm to make sense out of their huge volume of data.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::bfa4168200a105421f6932151e0f3690 https://doi.org/10.1007/978-3-030-36056-6_10 Zobrazit plný text záznamu