Genetic Algorithm Based Parallel K-Means Data Clustering Algorithm Using MapReduce Programming Paradigm on Hadoop Environment (GAPKCA)
Autor: | Rusli Bin Abdullah, Maslina Binti Zolkepli, Sayer Alshammari |
---|---|
Rok vydání: | 2019 |
Předmět: |
0209 industrial biotechnology
Speedup Computer science k-means clustering Image processing 02 engineering and technology Document clustering 020901 industrial engineering & automation Genetic algorithm 0202 electrical engineering electronic engineering information engineering Programming paradigm 020201 artificial intelligence & image processing Document retrieval Cluster analysis Algorithm |
Zdroj: | Advances in Intelligent Systems and Computing ISBN: 9783030360559 SCDM |
DOI: | 10.1007/978-3-030-36056-6_10 |
Popis: | Data clustering algorithm has been receiving considerable attention in many application areas such as data mining, document retrieval, image processing and pattern classification. A hybrid data clustering algorithm using the combination of genetic algorithm (GA) with a popular variant of K-Means clustering algorithm, parallel k-Means clustering algorithm (PKCA) is proposed in this paper. The objective of the proposed algorithm is to combine the search process of GA to generate new data clusters and apply parallel K-Means to further speed up the quality of the search process during clusters formation. The proposed approach is implemented using the popular MapReduce programming model on Hadoop framework. Experiments were conducted with multiple synthetic datasets to evaluate the performance of the proposed algorithm. Results show that the proposed algorithm was able to speed up document clustering process by 0.54 s on average and outperformed PKCA. Data analysts in marketing and finance, telecommunication and transport companies and researchers in academia can use this algorithm to make sense out of their huge volume of data. |
Databáze: | OpenAIRE |
Externí odkaz: |