Popis: |
Clustering is a crucial data-mining tool for analyzing valuable information from a massive data volume. Partition Around Medoids (PAM), one of the clustering algorithms that is simple, scalable and can easily implement but sensitive to initial medoids and vast amount of data. Meta-heuristics algorithms such as Ant Colony Optimization algorithm, Bat algorithm, Bees algorithm, etc. used to introduce the combinative in the clustering algorithm that will gives optimum medoids and hence find the better cluster quality. But, the main issue of very large data is in time consumption and lack of quality. To avoid issued of time consumption, existing clustering approaches are run on parallel frameworks. So, this paper proposed the hybrid approach to integrate PAM and Bat which one of meta-heuristic algorithm to obtain optimal initial medoids and PAM to get the better clusters. To handle a large number of datasets for fast and parallel processing, all experiments are done in Apache Spark Framework. |