An extended visual methods to perform data cluster assessment in distributed data systems.

Autor: Subba Reddy, K., Rajendra Prasad, K., Kamatam, Govardhan Reddy, Ramanjaneya Reddy, N.
Předmět:
Zdroj: Journal of Supercomputing; Apr2022, Vol. 78 Issue 6, p8810-8829, 20p
Abstrakt: The cluster tendency is one of the major problems in data clustering. Deriving the number of clusters for an unlabeled dataset is known as the cluster tendency problem. In this paper, the preclustering problem for important clustering methods, such as k-means, hierarchical clustering, etc., is considered. Existing preclustering methods, i.e., the visual assessment tendency (VAT), effectively solve the cluster tendency (i.e., k in the k-means). Enhanced methods, such as the improved VAT (iVAT) and other related visual methods, have greatly succeeded in determining the precluster tendency for complex and large datasets. Clustering using the improved visual assessment tendency (ClusiVAT) is a recent visual method and is widely used for large datasets. However, it focuses primarily on the amount of data rather than the dimensionality. Big data in real-time applications possess large sizes and higher dimensions. The ClusiVAT uses the sampling technique to handle the amount of original data; however, it is not focused on high-dimensional big data. Thus, the proposed method develops scalable visual methods using linear subspace learning (LSL) techniques to overcome the curse of dimensionality. Empirical analysis is performed to demonstrate the efficiency of the proposed LSL-based visual methods using benchmarked datasets. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index
Nepřihlášeným uživatelům se plný text nezobrazuje