FIUS: Fixed partitioning undersampling method

Autor: M. I. M. Wahab, Azam Dekamin, Karim Keshavjee, Aziz Guergachi
Rok vydání: 2021
Předmět:
Zdroj: Clinica Chimica Acta. 522:174-183
ISSN: 0009-8981
DOI: 10.1016/j.cca.2021.08.023
Popis: Background and Objective In the medical field, data techniques for prediction and finding patterns of prevalent diseases are of increasing interest. Classification is one of the methods used to provide insight into predicting the future onset of type 2 diabetes of those at high risk of progression from pre-diabetes to diabetes. When applying classification techniques to real-world datasets, imbalanced class distribution has been one of the most significant limitations that leads to patients’ misclassification. In this paper, we propose a novel balancing method to improve the prediction performance of type 2 diabetes mellitus in imbalanced electronic medical records (EMR). Methods A novel undersampling method is proposed by utilizing a fixed partitioning distribution scheme in a regular grid. The proposed approach retains valuable information when balancing methods are applied to datasets. Results The best AUC of 80% compared to other classifiers was obtained from the logistic regression (LR) classifier for EMR by applying our proposed undersampling method to balance the data. The new method improved the performance of the LR classifier compared to existing undersampling methods used in the balancing stage. Conclusion The results demonstrate the effectiveness and high performance of the proposed method for predicting diabetes in a Canadian imbalanced dataset. Our methodology can be used in other areas to overcome the limitations of imbalanced class distributions.
Databáze: OpenAIRE