WKA: The Analysis and Application of An Efficient Data Clustering Algorithm for Solving Data Mining Problems in Large Databases

Autor: Chin-Jung Tsai, 蔡金榮
Rok vydání: 2005
Druh dokumentu: 學位論文 ; thesis
Popis: 93
Data clustering is a very hot and important research topic in data mining area. In this thesis, we propose a new algorithm called Weighted K-Means Algorithm (WKA) for data clustering. The K-Means algorithm is a well-known and powerful algorithm and it may implement clustering quickly. However, it may suffer from the possibility of being trapped at local optima for data clustering. To avoid the possibility of being trapped at local optima for data clustering, we utilize a minimum of square-error concept for K-Means to be an initial centroid of first stage in this research. Moreover, we propose a proportional distance concept instead of using Euclidean distance to calculate the distance of data objects in different clusters using decreasing or increasing weighting strategy. It can improve the weighting relationship for each data dimension; therefore, the proposed WKA approach can decrease the clustering error rate and very quickly to perform data clustering. We compare our proposed WKA approach with the K-Means, GMA, GKA, KGA, and DBSCAN, and we find WKA algorithm is very efficient for the data clustering problem in large amounts of data, where the performance using evaluation methods based on cluster distance and time cost generated by WKA is, better than that by many methods such as the K-Means algorithm, GMA, GKA, KGA, and DBSCAN.
Databáze: Networked Digital Library of Theses & Dissertations