Adaptive Dimensionality Reduction for Local Principal Component Analysis
Autor: | Wolfram Schenck, Nico Migenda |
---|---|
Rok vydání: | 2020 |
Předmět: |
Neural gas
Training set Concept drift Data stream mining Computer science Dimensionality reduction 02 engineering and technology Mixture model computer.software_genre 020204 information systems Principal component analysis 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Anomaly detection Data mining computer Curse of dimensionality |
Zdroj: | ETFA |
Popis: | The early detection of upcoming failures in technical systems, e.g. in factory automation, is crucial for worker safety and to avoid downtimes. Machine learning (ML) approaches are commonly used in this context to learn the standard behavior of technical systems and to identify anomalies by determining if new data points are outside of the training data distribution. Reliable anomaly detection becomes even more challenging when ML models are continuously learnt on data streams and required to account for "concept drift". Gaussian mixture models (GMM) of the Neural Gas Principal Component Analysis (NGPCA) type are well suited for online learning and anomaly detection. NGPCA describes non-linear data distributions with a set of locally linear PCA models. They offer better interpretability and less computational effort compared to autoencoders which are often used in this application field. The contribution of this work is to enhance NGPCA (and generally GMMs) by extending local PCA with an adaptive approach that adjusts the dimensionality of each local model individually. This improves the representation by the overall model and therefore its capability to detect anomalies. The proposed method is applicable to streaming data by continuously and individually adapting the dimensionality of each local PCA model. To achieve this, natural characteristics of online PCA are exploited. The algorithms has low computational complexity enabling the usage in technical systems with limited resources. First the mathematical foundation is explained and used to extend NGPCA. Afterwards, the new combined algorithm is numerically evaluated and advantages over non-linear competitors are pointed out. |
Databáze: | OpenAIRE |
Externí odkaz: |