Evolving Gradient Boost: A Pruning Scheme Based on Loss Improvement Ratio for Learning Under Concept Drift

Autor:	Li Xiong, Guangquan Zhang, Anjin Liu, Jie Lu, Kun Wang
Rok vydání:	2023
Předmět:	Concept drift Ensemble forecasting Computer science 0102 Applied Mathematics 0801 Artificial Intelligence and Image Processing 0906 Electrical and Electronic Engineering Computer Science Applications Human-Computer Interaction Control and Systems Engineering Metric (mathematics) Artificial Intelligence & Image Processing Performance measurement Pruning (decision trees) AdaBoost Gradient boosting Electrical and Electronic Engineering Scaling Algorithm Software Information Systems
Zdroj:	IEEE Transactions on Cybernetics. 53:2110-2123
ISSN:	2168-2275 2168-2267
DOI:	10.1109/tcyb.2021.3109796
Popis:	In nonstationary environments, data distributions can change over time. This phenomenon is known as concept drift, and the related models need to adapt if they are to remain accurate. With gradient boosting (GB) ensemble models, selecting which weak learners to keep/prune to maintain model accuracy under concept drift is nontrivial research. Unlike existing models such as AdaBoost, which can directly compare weak learners' performance by their accuracy (a metric between [0, 1]), in GB, weak learners' performance is measured with different scales. To address the performance measurement scaling issue, we propose a novel criterion to evaluate weak learners in GB models, called the loss improvement ratio (LIR). Based on LIR, we develop two pruning strategies: 1) naive pruning (NP), which simply deletes all learners with increasing loss and 2) statistical pruning (SP), which removes learners if their loss increase meets a significance threshold. We also devise a scheme to dynamically switch between NP and SP to achieve the best performance. We implement the scheme as a concept drift learning algorithm, called evolving gradient boost (LIR-eGB). On average, LIR-eGB delivered the best performance against state-of-the-art methods on both stationary and nonstationary data.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::c8af90b3f5c95da3312f4858a89e83cc https://doi.org/10.1109/tcyb.2021.3109796 Zobrazit plný text záznamu