Clustering Through Probability Distribution Analysis Along Eigenpaths
Autor: | Wenming Yang, Xiang Sun, Changqing Hui, Qingmin Liao, Daren Sun |
---|---|
Rok vydání: | 2021 |
Předmět: |
Computer science
Feature vector 02 engineering and technology computer.software_genre 01 natural sciences Computer Science Applications Human-Computer Interaction Data set 010104 statistics & probability Exploratory data analysis Control and Systems Engineering Outlier 0202 electrical engineering electronic engineering information engineering Probability distribution 020201 artificial intelligence & image processing Data mining 0101 mathematics Electrical and Electronic Engineering Cluster analysis computer Software Curse of dimensionality |
Zdroj: | IEEE Transactions on Systems, Man, and Cybernetics: Systems. 51:875-884 |
ISSN: | 2168-2232 2168-2216 |
DOI: | 10.1109/tsmc.2018.2884839 |
Popis: | Data clustering is one of the most fundamental techniques in exploratory data analysis. It is widely used for determining the underlying data structure, classifying natural data and compressing data in engineering, business management, social statistics, computer science, and medicine. Under the assumption that clusters are high density regions in the feature space separated by relatively low density neighbors, a novel approach is proposed for modeling any high dimensional clustering problem as a one-dimensional analysis of the probability distribution. First, a special path between two vertexes, namely eigenpath, is defined in this paper to represent their close connection. Second, we propose the connectedness index based on the eigenpath for quantitatively describing the connection between two vertexes. Third, the connectedness index is applied to the candidates of cluster centers and measures the connection between different candidates. Then an indicative curve can be drawn with the knowledge of connectedness index. This approach not only provides effective indicative curve for unknown data sets but also facilitates eliminating the curse of dimensionality partly as well as correctly recognizes arbitrary cluster forms and automatically excludes outliers. Extensive experiments showed the effectiveness and efficiency of the proposed approach. |
Databáze: | OpenAIRE |
Externí odkaz: |