Popis: |
Objective: Diabetes patients are closely related to blood glucose levels. Predicting blood glucose levels through routine blood test data can provide auxiliary diagnosis for diabetes risk prediction in the medical field. However, physical examination datasets are often accompanied by problems such as high feature dimensions and uneven blood glucose distribution, which significantly affect the effect of machine learning models. Methods: This paper proposes a GA-KDE-GAN stacking model combined with feature engineering technology, referred to as the GKN framework. GKN integrates genetic algorithm and random forest (GA-RF) for feature selection, kernel density estimation (KDE) for data smoothing and small sample oversampling, and generative adversarial network (GAN) for expanding the training set. The framework uses GA-RF to select feature subsets and obtain the global optimal solution based on LightGBM evaluation, and applies KDE and GAN to balance the dataset. The final model adopts a stacking strategy to enhance the accuracy of blood glucose prediction. Results: By combining GKN feature engineering, the proposed model showed significant performance improvement. Under the challenging data high dimensionality and complexity, the model achieved a mean square error (MSE) of 1.529 and the highest R-square. More importantly, it significantly improved the accuracy of diabetes classification, with accuracy (Acc) and precision (Pre) exceeding 97%. Conclusion: This study addressed the problem of high feature dimension and uneven sample distribution in the physical examination dataset. The GKN framework proved to be effective in improving the prediction performance by integrating GA-RF, KDE and GAN. These findings are promising for glucose-assisted diagnosis of diabetes, as they can predict blood glucose levels based on routine blood test data and help in diabetes risk assessment. |