FIUS: Fixed partitioning undersampling method
Autor: | M. I. M. Wahab, Azam Dekamin, Karim Keshavjee, Aziz Guergachi |
---|---|
Rok vydání: | 2021 |
Předmět: |
Canada
Computer science Clinical Biochemistry 02 engineering and technology Logistic regression computer.software_genre Biochemistry Field (computer science) Regular grid 03 medical and health sciences 0302 clinical medicine Classifier (linguistics) 0202 electrical engineering electronic engineering information engineering Humans 030212 general & internal medicine Biochemistry (medical) General Medicine Grid Class (biology) 3. Good health Logistic Models Diabetes Mellitus Type 2 Research Design Undersampling Pre diabetes 020201 artificial intelligence & image processing Data mining computer Algorithms |
Zdroj: | Clinica Chimica Acta. 522:174-183 |
ISSN: | 0009-8981 |
DOI: | 10.1016/j.cca.2021.08.023 |
Popis: | Background and Objective In the medical field, data techniques for prediction and finding patterns of prevalent diseases are of increasing interest. Classification is one of the methods used to provide insight into predicting the future onset of type 2 diabetes of those at high risk of progression from pre-diabetes to diabetes. When applying classification techniques to real-world datasets, imbalanced class distribution has been one of the most significant limitations that leads to patients’ misclassification. In this paper, we propose a novel balancing method to improve the prediction performance of type 2 diabetes mellitus in imbalanced electronic medical records (EMR). Methods A novel undersampling method is proposed by utilizing a fixed partitioning distribution scheme in a regular grid. The proposed approach retains valuable information when balancing methods are applied to datasets. Results The best AUC of 80% compared to other classifiers was obtained from the logistic regression (LR) classifier for EMR by applying our proposed undersampling method to balance the data. The new method improved the performance of the LR classifier compared to existing undersampling methods used in the balancing stage. Conclusion The results demonstrate the effectiveness and high performance of the proposed method for predicting diabetes in a Canadian imbalanced dataset. Our methodology can be used in other areas to overcome the limitations of imbalanced class distributions. |
Databáze: | OpenAIRE |
Externí odkaz: |