K‐Means Centroids Initialization Based on Differentiation Between Instances Attributes.

Autor:	Khan, Ali Akbar, Bashir, Muhammad Salman, Batool, Asma, Raza, Muhammad Summair, Bashir, Muhammad Adnan, Cirillo, Stefano
Předmět:	CLUSTERING algorithms K-means clustering ALGORITHMS
Zdroj:	International Journal of Intelligent Systems; 11/18/2024, Vol. 2024, p1-27, 27p
Abstrakt:	The conventional K‐Means clustering algorithm is widely used for grouping similar data points by initially selecting random centroids. However, the accuracy of clustering results is significantly influenced by the initial centroid selection. Despite different approaches, including various K‐Means versions, suboptimal outcomes persist due to inadequate initial centroid choices and reliance on common normalization techniques like min‐max normalization. In this study, we propose an improved algorithm that selects initial centroids more effectively by utilizing a novel formula to differentiate between instance attributes, creating a single weight for differentiation. We introduce a preprocessing phase for dataset normalization without forcing values into a specific range, yielding significantly improved results compared to unnormalized datasets and those normalized using min‐max techniques. For our experiments, we used five real datasets and five simulated datasets. The proposed algorithm is evaluated using various metrics and an external benchmark measure, such as the Adjusted Rand Index (ARI), and compared with the traditional K‐Means algorithm and 11 other modified K‐Means algorithms. Experimental evaluations on these datasets demonstrate the superiority of our proposed methodologies, achieving an impressive average accuracy rate of up to 95.47% and an average ARI score of 0.95. Additionally, the number of iterations required is reduced compared to the conventional K‐Means algorithm. By introducing innovative techniques, this research provides significant contributions to the field of data clustering, particularly in addressing modern data‐driven clustering challenges. [ABSTRACT FROM AUTHOR]
Databáze:	Complementary Index
Externí odkaz:	Zobrazit plný text záznamu Plný text