Effects of some design factors on the distribution of similarity indices in cluster analysis

Autor: Golam B. M. Kibria, Ahmed N. Albatineh, Bashar Zogheib, Hafiz M. R. Khan
Rok vydání: 2015
Předmět:
Zdroj: Communications in Statistics - Simulation and Computation. :1-17
ISSN: 1532-4141
0361-0918
DOI: 10.1080/03610918.2015.1082586
Popis: This article investigates the effects of number of clusters, cluster size, and correction for chance agreement on the distribution of two similarity indices, namely, Jaccard and Rand indices. Skewness and kurtosis are calculated for the two indices and their corrected forms then compared with those of the normal distribution. Three clustering algorithms are implemented: complete linkage, Ward, and K-means. Data were randomly generated from bivariate normal distributions with specified means and variance covariance matrices. Three-way ANOVA is performed to assess the significance of the design factors using skewness and kurtosis of the indices as responses. Test statistics for testing skewness and kurtosis and observed power are calculated. Simulation results showed that independent of the clustering algorithms or the similarity indices used, the interaction effect cluster size x number of clusters and the main effects of cluster size and number of clusters were found always significant for skewnes...
Databáze: OpenAIRE