LIUBoost: Locality Informed Under-Boosting for Imbalanced Data Classification
Autor: | Swakkhar Shatabda, Dewan Md. Farid, Md. Rafsan Jani, Farshid Rayhan, Asif Mahbub, Sajid Ahmed |
---|---|
Rok vydání: | 2018 |
Předmět: |
Boosting (machine learning)
Computer science business.industry Supervised learning Locality 02 engineering and technology Overfitting Machine learning computer.software_genre Imbalanced data Statistical classification ComputingMethodologies_PATTERNRECOGNITION Undersampling 020204 information systems 0202 electrical engineering electronic engineering information engineering Oversampling 020201 artificial intelligence & image processing Artificial intelligence business computer |
Zdroj: | Advances in Intelligent Systems and Computing ISBN: 9789811314971 |
DOI: | 10.1007/978-981-13-1498-8_12 |
Popis: | The problem of class imbalance along with class overlapping has become a major issue in the domain of supervised learning. Most classification algorithms assume equal cardinality of the classes under consideration while optimising the cost function, and this assumption does not hold true for imbalanced datasets, which results in suboptimal classification. Therefore, various approaches, such as undersampling, oversampling, cost-sensitive learning and ensemble-based methods, have been proposed for dealing with imbalanced datasets. However, undersampling suffers from information loss, oversampling suffers from increased runtime and potential overfitting, while cost-sensitive methods suffer due to inadequately defined cost assignment schemes. In this paper, we propose a novel boosting-based method called Locality Informed Under-Boosting (LIUBoost). LIUBoost uses undersampling for balancing the datasets in every boosting iteration like Random Undersampling with Boosting (RUSBoost), while incorporating a cost term for every instance based on their hardness into the weight update formula minimising the information loss introduced by undersampling. LIUBoost has been extensively evaluated on 18 imbalanced datasets, and the results indicate significant improvement over existing best performing method RUSBoost. |
Databáze: | OpenAIRE |
Externí odkaz: |