Popis: |
The increasing prevalence of diabetes necessitates the development of effective early detection methods to mitigate its health impacts. This paper investigates the impact of feature transformation and machine learning (ML) models on the early detection of diabetes using a binary tabular classification dataset. We explore three feature transformation techniques, no transformation, normalization, and min-max scaling, to assess their influence on the performance of various ML models. To comprehensively evaluate the effectiveness of these preprocessing techniques, we experimented with twelve different ML models, including both traditional algorithms and ensemble methods. A publicly available dataset has been used for this research, containing 768 samples and 8 features. To ensure their effectiveness, the models are assessed using several evaluation metrics, including accuracy, precision, recall, F1-score, and Area Under the Receiver Operating Characteristic Curve (AUC-ROC). Among the ML models, Light Gradient Boosting Machine (LGBM) achieved the highest accuracy of 82.91% when min-max scaling was applied to the data. Our results demonstrate the varying effectiveness of different combinations of feature transformation techniques and ML models in enhancing diabetes detection performance. Furthermore, it has been observed that the ensemble models generally achieved better performance than traditional ML models. These findings provide valuable insights for optimizing preprocessing and model selection strategies in the development of robust early diabetes detection systems. |