Heart Disease Risk Prediction Using Machine Learning Classifiers with Attribute Evaluators
Autor: | S. Pranavanand, Karna Vishnu Vardhana Reddy, Hui Na Chua, Irraivan Elamvazuthi, Azrina Abd Aziz, Sivajothi Paramasivam |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2021 |
Předmět: |
attribute evaluation
Technology Computer science QH301-705.5 QC1-999 heart disease Machine learning computer.software_genre Logistic regression Set (abstract data type) Bayes' theorem General Materials Science Biology (General) Instrumentation QD1-999 Fluid Flow and Transfer Processes Hyperparameter Heart disease risk hyperparameter tuning business.industry Process Chemistry and Technology Physics General Engineering Engineering (General). Civil engineering (General) machine learning classifiers Computer Science Applications Chemistry Sequential minimal optimization Data pre-processing Artificial intelligence TA1-2040 business computer Classifier (UML) data pre-processing |
Zdroj: | Applied Sciences Volume 11 Issue 18 Applied Sciences, Vol 11, Iss 8352, p 8352 (2021) |
ISSN: | 2076-3417 |
DOI: | 10.3390/app11188352 |
Popis: | Cardiovascular diseases (CVDs) kill about 20.5 million people every year. Early prediction can help people to change their lifestyles and to ensure proper medical treatment if necessary. In this research, ten machine learning (ML) classifiers from different categories, such as Bayes, functions, lazy, meta, rules, and trees, were trained for efficient heart disease risk prediction using the full set of attributes of the Cleveland heart dataset and the optimal attribute sets obtained from three attribute evaluators. The performance of the algorithms was appraised using a 10-fold cross-validation testing option. Finally, we performed tuning of the hyperparameter number of nearest neighbors, namely, ‘k’ in the instance-based (IBk) classifier. The sequential minimal optimization (SMO) achieved an accuracy of 85.148% using the full set of attributes and 86.468% was the highest accuracy value using the optimal attribute set obtained from the chi-squared attribute evaluator. Meanwhile, the meta classifier bagging with logistic regression (LR) provided the highest ROC area of 0.91 using both the full and optimal attribute sets obtained from the ReliefF attribute evaluator. Overall, the SMO classifier stood as the best prediction method compared to other techniques, and IBk achieved an 8.25% accuracy improvement by tuning the hyperparameter ‘k’ to 9 with the chi-squared attribute set. |
Databáze: | OpenAIRE |
Externí odkaz: |