Abstrakt: |
Diabetes is a chronic disease that affects millions of people worldwide. Accurate and timely diagnosis of diabetes is crucial for its effective treatment and management. While machine learning has shown promise in predicting the disease, missing data, outliers, class imbalance and limitations of classifiers can hinder accuracy. To address these challenges, we propose a novel machine learning approach that combines adaptive iterative imputation (AII) for missing value imputation, dynamic ensemble isolation forest (DE-IF) for outlier detection and removal, Iterated KMeans SMOTEENN (IKMSENN) for class imbalance, and an adaptive extra tree classifier (AETC) for classification. Our approach is evaluated using the Pima Indian Diabetes Dataset (PIDD), a widely used benchmark dataset in diabetes disease prediction. Experimental results show that our approach outperforms several state-of-the-art machine learning models in terms of accuracy, precision, recall, f -measure, and the area under the receiver operating characteristic (ROC) curve (AUC-ROC). Our approach achieved an accuracy of 98.58%, with a precision of 0.986, recall of 0.987, f -measure of 0.985, and ROC of 0.965 on the PIDD dataset. Our research presents a significant contribution to the field of diabetes disease prediction by introducing novel machine learning approaches that address common challenges such as missing data, outliers and class imbalance, as well as limitations of classifiers. Our approach has the potential to greatly improve the accuracy and effectiveness of diabetes disease prediction and has important implications for the diagnosis and management of the disease. This research work focuses on the development of a machine learning approach to improve the accuracy of diabetes diagnosis. Diabetes is a chronic disease that affects millions of people globally, and timely diagnosis is essential for its effective treatment and management. However, traditional diagnosis methods can be limited by missing data, outliers, class imbalance, and classifier limitations. To overcome these challenges, the proposed approach combines Adaptive Iterative Imputation (AII) for missing value imputation, Dynamic Ensemble Isolation Forest (DE-IF) for outlier detection and removal, Iterated KMeans SMOTEENN (IKMSENN) for class imbalance, and an Adaptive Extra Tree classifier (AETC) for classification. The approach is evaluated using the Pima Indian Diabetes Dataset (PIDD), a widely used benchmark dataset in diabetes disease prediction. Overall, this research work provides a novel and effective machine learning approach for accurate diabetes diagnosis. [ABSTRACT FROM AUTHOR] |