Dynamic swarm class rebalancing for the process mining of rare events
Autor: | Kok-Leong Ong, Jinyan Li, Kelvin K. L. Wong, Simon Fong, Yaoyang Wu, Raymond K. Wong, Victor W. Chu |
---|---|
Rok vydání: | 2021 |
Předmět: |
020203 distributed computing
Process (engineering) Computer science Swarm behaviour Process mining Particle swarm optimization 02 engineering and technology Decision rule computer.software_genre Theoretical Computer Science Hardware and Architecture 0202 electrical engineering electronic engineering information engineering Rare events Data mining computer Software Information Systems |
Zdroj: | The Journal of Supercomputing. 77:7549-7583 |
ISSN: | 1573-0484 0920-8542 |
DOI: | 10.1007/s11227-020-03500-x |
Popis: | Process mining is becoming an indispensable method in workflow model reconstructions, offering insights into mission critical systems. The efficacy of process mining depends on whether the underlying data mining algorithms can accurately classify or predict future events from process logs. However, exceptional events are scarce in most operational processes. Hence, the process logs generated from these processes are highly imbalanced. It is quite often that any model learned from imbalanced data tends to be overly generalized toward the normal classes but under-trained to recognize the rare classes. In this paper, we propose 3 methods to rectify this class imbalance problem. They are founded upon a meta-heuristic–swarm intelligence algorithm. The first method, and also the base of the remaining 2 methods, is Dynamic Multi-objective Rebalancing Algorithm, which considers both high accuracy and high confidence level of classification in its objective function, and it is draw upon the particle swarm optimization algorithm. The other two algorithms are hybrid methods by combining the first base method with over-sampling and under-sampling techniques. Experiments are conducted using the three above-mentioned methods to process rebalanced dataset, as well as using other classic resampling methods for comparison. According to the results, our proposed methods show satisfactory performance over other comparison methods, and we extracted meaningful decision rules from a rebalanced dataset in process mining. |
Databáze: | OpenAIRE |
Externí odkaz: |