Toward Mining Capricious Data Streams: A Generative Approach
Autor: | Ege Beyazit, Baijun Wu, Yi He, Sheng Chen, Di Wu, Xindong Wu |
---|---|
Rok vydání: | 2021 |
Předmět: |
Computer Networks and Communications
Computer science Data stream mining business.industry Feature vector Process (computing) 02 engineering and technology Machine learning computer.software_genre Computer Science Applications Generative model Artificial Intelligence Feature (computer vision) 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Graphical model Artificial intelligence business computer Software Generative grammar |
Zdroj: | IEEE Transactions on Neural Networks and Learning Systems. 32:1228-1240 |
ISSN: | 2162-2388 2162-237X |
DOI: | 10.1109/tnnls.2020.2981386 |
Popis: | Learning with streaming data has received extensive attention during the past few years. Existing approaches assume that the feature space is fixed or changes by following explicit regularities, limiting their applicability in real-time applications. For example, in a smart healthcare platform, the feature space of the patient data varies when different medical service providers use nonidentical feature sets to describe the patients' symptoms. To fill the gap, we in this article propose a novel learning paradigm, namely, Generative Learning With Streaming Capricious (GLSC) data, which does not make any assumption on the feature space dynamics. In other words, GLSC handles the data streams with a varying feature space, where each arriving data instance can arbitrarily carry new features and/or stop carrying partial old features. Specifically, GLSC trains a learner on a universal feature space that establishes relationships between old and new features, so that the patterns learned in the old feature space can be used in the new feature space. The universal feature space is constructed by leveraging the relatednesses among features. We propose a generative graphical model to model the construction process, and show that learning from the universal feature space can effectively improve the performance with theoretical guarantees. The experimental results demonstrate that GLSC achieves conspicuous performance on both synthetic and real data sets. |
Databáze: | OpenAIRE |
Externí odkaz: |