Dimensionality reduction for data of unknown cluster structure
Autor: | Jacek Koronacki, Stan Lipovetsky, Ewa Nowakowska |
---|---|
Rok vydání: | 2016 |
Předmět: |
0209 industrial biotechnology
Information Systems and Management business.industry Dimensionality reduction Data transformation (statistics) Pattern recognition 02 engineering and technology Mixture model Linear subspace Computer Science Applications Theoretical Computer Science 020901 industrial engineering & automation Artificial Intelligence Control and Systems Engineering Principal component analysis 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Artificial intelligence business Cluster analysis Software Subspace topology Mathematics Curse of dimensionality |
Zdroj: | Information Sciences. 330:74-87 |
ISSN: | 0020-0255 |
Popis: | Dimensionality reduction that preserves certain characteristics of data is needed for numerous reasons. In this work we focus on data coming from a mixture of Gaussian distributions and we propose a method that preserves the distinctness of the clustering structure, although this structure is assumed to be yet unknown. The rationale behind the method is the following: (i) had one known the clusters (classes) within the data, one could facilitate further analysis and reduce space dimensionality by projecting the data to the Fisher's linear subspace, which - by definition - best preserves the structure of the given classes; (ii) under some reasonable assumptions, this can be done, albeit approximately, without prior knowledge of the clusters (classes). In this paper, we show how this approach works. We present a method of preliminary data transformation that brings the directions of largest overall variability close to the directions of the best between-class separation. Hence, for the transformed data, simple PCA provides an approximation to the Fisher's subspace. We show that the transformation preserves the distinctness of the unknown structure in the data to a great extent. |
Databáze: | OpenAIRE |
Externí odkaz: |