Popis: |
The internet of things generates huge amounts of multidimensional sensor readings. The analysis of these high dimensional data is chal- lenging and not sufficiently addressed. In this thesis, methods to analyze such data are developed, evaluated and discussed. There are currently several open issues in the context of stream analysis. These issues include the adaption to Concept Drift, which is a shift in the data distribution and needs to be handled by classification algorithms. Furthermore, real world data is often high dimensional, while current research focuses mainly on low dimensional problems which do not represent the real applications well. Especially, more complex stream algorithm become very slow when operating in higher dimensional spaces. Finally, real world data is often not linear separa- ble. For this scenario, the field of kernel learning exists, which allows to transform the data into a kernel space, where it becomes linear separable. In the streaming context, this field has not gained much attention yet. We address the mentioned problems by three major contributions: __A__ In the first part of this thesis, we introduce sparse prototype based algorithms, which can adapt fast to Concept Drift by using momentum-based gradient descent techniques. The algorithms outperform their base versions on synthetic and real world prob- lems as well as being competitive to other state-of-the-art al- gorithms, while having the advantage of interpretability and sparsity. Furthermore, one of these algorithms is combined with a statistical test to handle a greater variety of drifts. __B__ To reduce the complexity of high dimensional data streams, the Random Projection technique is analyzed in non-stationary envi- ronments. It is shown, that the Johnson-Lindenstrauss Lemma also holds for stream classification tasks. Further, performance comparisons of different classifiers on the projected and the orig- inal space are provided, and it is shown how Random Projection can help tackle problems of non-stationary environments. To do so, a method is proposed, which allows to transform a problem of changing dimensionality into a distribution change, which can be handled by Concept Drift detectors. __C__ Besides prototype vectors, sparse representations can also be obtained by creating coresets. Thus, the final Chapter provides techniques which use coresets to maintain a Minimum Enclos- ing Ball in stream settings. These methods have the advantage to also work on non-linear data by choosing a suitable kernel. Specifically, a stream coreset based classifier is proposed, which performs well on a variety of tested streams. While the classi- fier has the downside of using different balls for each class, a viicombined multiclass Core Vector Machine on data streams is provided, which performs better on multiclass problems. Finally, it is shown, that coresets can be used to detect Concept Drift, outperforming many state-of-the-art algorithms with a downside of higher runtime. |