CPOCEDS-concept preserving online clustering for evolving data streams.

Autor: Jafseer, K. T., Shailesh, S., Sreekumar, A.
Předmět:
Zdroj: Cluster Computing; Jun2024, Vol. 27 Issue 3, p2983-2998, 16p
Abstrakt: Clustering streaming data is challenging due to many temporal dynamics, such as concept drift, concept evolution, and feature evolution. Concept evolution is the most challenging of these. Due to concept evolution, new classes may emerge or existing classes may disappear, so it is crucial to process streaming data continuously. This paper proposes a novel online clustering method, specifically for streaming data with concept evolution. It consists of three phases: initialization, clustering and outlier handling. To identify recurrences of previous data in streaming data, it is critical to preserve the sequential properties of data chunks. In the proposed model, representatives from previous windows are added to the current window, making it distinct from existing models. The detection and handling of outliers are very challenging tasks in streaming data analysis. Outliers are often the first instances of a new cluster. The proposed model stores the outliers from each data window. When the number of outliers exceeds a certain threshold, the representatives of outliers are added to the next window to identify new classes. The lack of data sets made it necessary for us to create a synthetic data set with 22020 data instances and test the model on both synthetic and real datasets. Using Silhouette Coefficient, Calinski–Harabasz index, and Davies–Bouldin index analysis, this model yielded the most favourable results. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index