A grid density based framework for classifying streaming data in the presence of concept drift
Autor: | Tegjyot Singh Sethi, Hanquing Hu, Mehmed Kantardzic |
---|---|
Rok vydání: | 2015 |
Předmět: |
Concept drift
Computer Networks and Communications Computer science 02 engineering and technology Machine learning computer.software_genre Artificial Intelligence 020204 information systems 0202 electrical engineering electronic engineering information engineering Cluster analysis business.industry Data stream mining Process (computing) Response time Sampling (statistics) Statistical classification ComputingMethodologies_PATTERNRECOGNITION Hardware and Architecture 020201 artificial intelligence & image processing Artificial intelligence Data mining State (computer science) business computer Software Information Systems |
Zdroj: | Journal of Intelligent Information Systems. 46:179-211 |
ISSN: | 1573-7675 0925-9902 |
DOI: | 10.1007/s10844-015-0358-3 |
Popis: | Mining data streams is the process of extracting information from non-stopping, rapidly flowing data records to provide knowledge that is reliable and timely. Streaming data algorithms need to be one pass and operate under strict limitations of memory and response time. In addition, the classification of streaming data requires learning in an environment where the data characteristics might change constantly. Many of the classification algorithms presented in literature assume a 100 % labeling rate, which is impractical and expensive when data records are rapidly flowing in. In this paper, a new incremental grid density based learning framework, the GC3 framework, is proposed to perform classification of streaming data with concept drift and limited labeling. The proposed framework uses grid density clustering to detect changes in the input data space. It maintains an evolving ensemble of classifiers to learn and adapt to the model changes over time. The framework also uses a uniform grid density sampling mechanism to obtain a uniform subset of samples for better classification performance with a lower labeling rate. The entire framework is designed to be one-pass, incremental and work with limited memory to perform any-time classification on demand. Experimental comparison with state of the art concept drift handling systems demonstrate the GC3 frameworks ability to provide high classification performance, using fewer models in the ensemble and with only 4-6 % of the samples labeled. The results show that the GC3 framework is effective and attractive for use in real world data stream classification applications. |
Databáze: | OpenAIRE |
Externí odkaz: |