Heart Disease Prediction Using Stacking Model With Balancing Techniques and Dimensionality Reduction

Autor:	Ayesha Noor, Nadeem Javaid, Nabil Alrajeh, Babar Mansoor, Ali Khaqan, Safdar Hussain Bouk
Jazyk:	angličtina
Rok vydání:	2023
Předmět:	Data balancing dimensionality reduction heart disease machine learning prediction stacking model Electrical engineering. Electronics. Nuclear engineering TK1-9971
Zdroj:	IEEE Access, Vol 11, Pp 116026-116045 (2023)
Druh dokumentu:	article
ISSN:	2169-3536
DOI:	10.1109/ACCESS.2023.3325681
Popis:	Heart disease is a serious worldwide health issue with wide-reaching effects. Since heart disease is one of the leading causes of mortality worldwide, early detection is crucial. Emerging technologies like Machine Learning (ML) are currently being actively used by the biomedical, healthcare, and health prediction industries. PaRSEL, a new stacking model is proposed in this research, that combines four classifiers, Passive Aggressive Classifier (PAC), Ridge Classifier (RC), Stochastic Gradient Descent Classifier (SGDC), and eXtreme Gradient Boosting (XGBoost), at the base layer, and LogitBoost is deployed for the final predictions at the meta layer. The imbalanced and irrelevant features in the data increase the complexity of the classification models. The dimensionality reduction and data balancing approaches are considered very important for lowering costs and increasing the accuracy of the model. In PaRSEL, three dimensionality reduction techniques, Recursive Feature Elimination (RFE), Linear Discriminant Analysis (LDA), and Factor Analysis (FA), are used to reduce the dimensionality and select the most relevant features for the diagnosis of heart disease. Furthermore, eight balancing techniques, Proximity Weighted Random Affine Shadowsampling (ProWRAS), Localized Randomized Affine Shadowsampling (LoRAS), Random Over Sampling (ROS), Adaptive Synthetic (ADASYN), Synthetic Minority Oversampling Technique (SMOTE), Borderline SMOTE (B-SMOTE), Majority Weighted Minority Oversampling Technique (MWMOTE) and Random Walk Oversampling (RWOS), are used to deal with the imbalanced nature of the dataset. The performance of PaRSEL is compared with the other standalone classifiers using different performance measures like accuracy, F1-score, precision, recall and AUC-ROC score. Our proposed model achieves 97% accuracy, 80% F1-score, precision is greater than 90%, 67% recall, and 98% AUC-ROC score. This shows that PaRSEL outperforms other standalone classifiers in terms of heart disease prediction. Additionally, we deploy SHapley Additive exPlanations (SHAP) on our proposed model. It helps to understand the internal working of the model. It illustrates how much influence a classifier has on the final prediction outcome.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/4380b46dcd414356843622635549e274 Zobrazit plný text záznamu View record in DOAJ