Combining Diverse Meta-Features to Accurately Identify Recurring Concept Drift in Data Streams
Autor: | Ben Halstead, Yun Sing Koh, Patricia Riddle, Mykola Pechenizkiy, Albert Bifet |
---|---|
Rok vydání: | 2023 |
Předmět: | |
Zdroj: | ACM Transactions on Knowledge Discovery from Data. 17:1-36 |
ISSN: | 1556-472X 1556-4681 |
Popis: | Learning from streaming data is challenging as the distribution of incoming data may change over time, a phenomenon known as concept drift. The predictive patterns, or experience learned under one distribution may become irrelevant as conditions change under concept drift, but may become relevant once again when conditions reoccur. Adaptive learning methods adapt a classifier to concept drift by identifying which distribution, or concept , is currently present in order to determine which experience is relevant. Identifying a concept requires some representation to be stored for comparison, with the quality of the representation being key to accurate identification. Existing concept representations are based on meta-features, efficient univariate summaries of a concept. However, no single meta-feature can fully represent a concept, leading to severe accuracy loss when existing representations cannot describe concept drift. To avoid these failure cases, we propose the first general framework for combining a diverse range of meta-features into a single representation. We solve two main challenges, first presenting a method of efficiently computing, storing, and querying an arbitrary set of meta-features as a single representation, showing that a combination of meta-features may successfully avoid failure cases seen with existing methods. Second, we present the first method for dynamically learning which meta-features distinguish concepts in any given dataset, significantly improving performance. Our proposed approach enables state-of-the-art feature selection methods, such as mutual information, to be applied to concept representation meta-features for the first time. We investigate tradeoffs between memory budget and classification performance, observing accuracy increases of up to 16% by dynamically weighting the contribution of each meta-feature. |
Databáze: | OpenAIRE |
Externí odkaz: |