Statistical Machine Learning Approaches to Liver Disease Prediction
Autor: | Easin Hasan, Morgan Williamson, Hafiz Khan, Fahad B. Mostafa |
---|---|
Rok vydání: | 2021 |
Předmět: |
Medicine (General)
demographic variables Artificial neural network business.industry Computer science prognostic/biochemical variables statistical learning for variable selection and classification Overfitting Machine learning computer.software_genre Missing data liver disease Random forest Support vector machine Data set R5-920 Binary classification Artificial intelligence Medical diagnosis business computer |
Zdroj: | Livers; Volume 1; Issue 4; Pages: 294-312 Livers, Vol 1, Iss 23, Pp 294-312 (2021) |
ISSN: | 2673-4389 |
DOI: | 10.3390/livers1040023 |
Popis: | Medical diagnoses have important implications for improving patient care, research, and policy. For a medical diagnosis, health professionals use different kinds of pathological methods to make decisions on medical reports in terms of the patients’ medical conditions. Recently, clinicians have been actively engaged in improving medical diagnoses. The use of artificial intelligence and machine learning in combination with clinical findings has further improved disease detection. In the modern era, with the advantage of computers and technologies, one can collect data and visualize many hidden outcomes such as dealing with missing data in medical research. Statistical machine learning algorithms based on specific problems can assist one to make decisions. Machine learning (ML), data-driven algorithms can be utilized to validate existing methods and help researchers to make potential new decisions. The purpose of this study was to extract significant predictors for liver disease from the medical analysis of 615 humans using ML algorithms. Data visualizations were implemented to reveal significant findings such as missing values. Multiple imputations by chained equations (MICEs) were applied to generate missing data points, and principal component analysis (PCA) was used to reduce the dimensionality. Variable importance ranking using the Gini index was implemented to verify significant predictors obtained from the PCA. Training data (ntrain=399) for learning and testing data (ntest=216) in the ML methods were used for predicting classifications. The study compared binary classifier machine learning algorithms (i.e., artificial neural network, random forest (RF), and support vector machine), which were utilized on a published liver disease data set to classify individuals with liver diseases, which will allow health professionals to make a better diagnosis. The synthetic minority oversampling technique was applied to oversample the minority class to regulate overfitting problems. The RF significantly contributed (p<0.001) to a higher accuracy score of 98.14% compared to the other methods. Thus, this suggests that ML methods predict liver disease by incorporating the risk factors, which may improve the inference-based diagnosis of patients. |
Databáze: | OpenAIRE |
Externí odkaz: |