Popis: |
The difficulty of detecting liver disease at an early stage goes back to its limited number of symptoms. In this study, single and ensemble machine learning (ML) algorithms are applied to the Indian Liver Patient Dataset (ILPD) dataset, and their results, without and with feature selection techniques, are compared between each other and to the existing studies. Also, a two-level ensemble stacking model is applied based on several meta-ensemble classifiers and the feature selection technique to optimize the accuracy of the ensemble classifiers. Several data preprocessing techniques are employed to optimize the accuracy of the proposed work, including data encoding, data cleaning, data scaling, data skewing transformation, data balancing, and feature selection. The choices of single model ML are logistic regression (LR), K-nearest neighbors (KNN), decision tree (DT), linear discriminant analysis (LDA), and multilayer perceptron (MLP). In contrast, the choices of ensemble ML models are extra tree classifier, random forest (RF), gradient boosting, AdaBoost, extreme gradient boosting (XGBoost), and ensemble stacking classifier. Among the ensemble models, the ensemble stacking model achieved the highest accuracies (93.88% and 94.12%) when trained without and with the feature selection technique using the 10-fold cross-validation. The two-level ensemble stacking model achieved the highest performance with the metrics values: accuracy (94.01%), Precision (94.44%), Recall (94.25%), F1-score (94.01%), and area under the ROC curve (94.25%) when trained with feature selection technique. These results indicate that our proposed technique achieved a high prediction model for liver disease. |