Identification of noteworthy features and data mining techniques for heart disease prediction.

Autor: Kumar, Parvathaneni Rajendra, Ravichandran, Suban, Narayana, S.
Předmět:
Zdroj: International Journal of Modeling, Simulation & Scientific Computing; Oct2024, Vol. 15 Issue 5, p1-24, 24p
Abstrakt: The most fatal disease on the earth is thought to be illness of the heart. There are a lot of features that change the heart's composition or functionality. In most cases, it is hard for doctors to make a diagnosis accurately and quickly. This study's objective is to determine critical factors and methods of data mining which can improve the accuracy for prediction of heart disease. Further, it is essential to make use of automatic technologies in diagnosing heart diseases as early as possible. To develop a new prediction technique for heart disease that comprises four phases such as "(a) Pre-processing, (b) Feature extraction, (c) Feature selection and (d) Classification". The initial stage of pre-processing is when the incoming data is treated to the elimination of redundant and missed numbers. Then, from the initial stage of data, the higher-order statistical and statistical characteristics, chi-squared features and symmetrical uncertainty attributes are derived. However, when working with a greater number of characteristics, the curse of dimensionality was a severe issue. Hence, the characteristics of optimal features are planned from the overall set of features. A novel Hybrid Bull and Elephant Algorithm (HB-EA) is introduced for the selection of optimal features. Consequently, the selected set of features is subjected to various classifiers as an ensemble model that contains "Naïve Bayes (NB), Decision Tree (DT), Neural Network (NN), Support Vector Machine (SVM), Optimized Recurrent Neural Network (RNN) and Linear Regression (LR)". The final step is to log off efficiency for outputs obtained from the group of classifiers and determine the outcome. The RNN weights are ideally tuned by the suggested HB-EA technique to boost the system's accuracy. The proposed model is finally evaluated against existing techniques to determine its superiority. The suggested technique for dataset 1 achieved maximum accuracy (0.916), and it is 15.24%, 8.76%, 7.56%, 4.09% and 1.89% better than convolution schemes like Random Forest (RF), Deep Belief Network (DBN), SVM, K -Nearest Neighbor (KNN) and Elephant Herding Optimization (EHO) models. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index