Popis: |
Coronary Artery Disease (CAD) is an increasingly prevalent disorder that significantly affects both longevity and quality of life, particularly among people aged 30 to 60. Lifestyle, genetics, nutrition, and stress are contributing factors to the increasing mortality rates. Therefore, there is a need for low-cost automated technology to diagnose CAD early and help medical practitioners treat chronic illnesses effectively. However, machine learning methods are typically designed to perform well with large datasets and may not be well-suited for smaller clinical datasets that contain categorical features. To address this issue, alternative approaches such as Data Preprocessing (DP), feature selection techniques, and Hyperparameter tuning (HP) are necessary to achieve optimal performance and accuracy on such datasets. Data preprocessing is also crucial to obtain accurate results by eliminating noise, handling missing values, and dealing with outliers. To address the challenges associated with feature selection, manual selection of hyperparameter tuning, and optimization, we have developed a novel model called BSOXGB (BorutaShap feature selection based Optuna hyperparameter tuning of eXtremely Gradient Boost). The proposed model achieves a remarkable accuracy of 97.70%, outperforming other classifiers like Random Forest (RF), AdaBoost (AB), CatBoost (CB), and ExtraTrees (ET) on the publicly available Z-Alizadeh Sani dataset. BSOXGB, with only 9 relevant features out of 56, has the highest classification accuracy, demonstrating its potential as a practical solution for automatically detecting and diagnosing CAD in the real world. |