Geometric mean based boosting algorithm with over-sampling to resolve data imbalance problem for bankruptcy prediction
Autor: | Myoung-Jong Kim, Dae-Ki Kang, Hong Bae Kim |
---|---|
Rok vydání: | 2015 |
Předmět: |
Boosting (machine learning)
business.industry Computer science General Engineering Word error rate computer.software_genre Machine learning BrownBoost Computer Science Applications Artificial Intelligence Bankruptcy Bankruptcy prediction Oversampling Artificial intelligence Data mining AdaBoost Geometric mean business computer |
Zdroj: | Expert Systems with Applications. 42:1074-1082 |
ISSN: | 0957-4174 |
DOI: | 10.1016/j.eswa.2014.08.025 |
Popis: | We propose geometric mean based boosting algorithm (GMBoost).We propose GMBoost to resolve data imbalance problem.GMBoost considers geometric mean of error rates of majority and minority classes.We experiment GMBoost, AdaBoost and cost-sensitive boosting on bankruptcy prediction.The comparative results shows GMBoost outperforms in imbalanced and balanced data. In classification or prediction tasks, data imbalance problem is frequently observed when most of instances belong to one majority class. Data imbalance problem has received considerable attention in machine learning community because it is one of the main causes that degrade the performance of classifiers or predictors. In this paper, we propose geometric mean based boosting algorithm (GMBoost) to resolve data imbalance problem. GMBoost enables learning with consideration of both majority and minority classes because it uses the geometric mean of both classes in error rate and accuracy calculation. To evaluate the performance of GMBoost, we have applied GMBoost to bankruptcy prediction task. The results and their comparative analysis with AdaBoost and cost-sensitive boosting indicate that GMBoost has the advantages of high prediction power and robust learning capability in imbalanced data as well as balanced data distribution. |
Databáze: | OpenAIRE |
Externí odkaz: |