Over-sampling strategies with data cleaning for handling imbalanced problems for diabetes prediction.

Autor: Nugraha, Wahyu, Maulana, Reza, Latifah, Rahayuningsih, Panny Agustia, Nurmalasari
Předmět:
Zdroj: AIP Conference Proceedings; 5/12/2023, Vol. 2714 Issue 1, p1-5, 5p
Abstrakt: There were 347 million diabetics worldwide with a mortality rate of 4.6 million in 2011 and is expected to continue to rise globally to 552 million by 2030. Prevention of diabetes may be done effectively by detecting it early. Machine learning can find learning patterns from datasets so that it can predict diabetes. However, the main problem that often occurs in analyzing diabetes is the problem of class imbalance. In this study we proposed the SMOTE+Tomek link method to address the problem of class imbalance in Pima Indians dataset and decision tree classification algorithm C5.0. Tomek link is used for cleaning noise data resulting from smote oversampling process. The experimental results showed that SMOTE resampling techniques using Tomek links performed better when compared to SMOTE without Tomek links. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index