Predictive Analysis of Vector-Borne Diseases through Tabular Classification of Epidemiological Data.

Autor: Iparraguirre-Villanueva, Orlando, Cabanillas-Carbonell, Michael
Předmět:
Zdroj: International Journal of Online & Biomedical Engineering; 2024, Vol. 20 Issue 13, p103-117, 15p
Abstrakt: Vector-borne diseases (VBDs) are major threats to human health. They are estimated to cause more than 700,000 deaths each year. This presents serious health problems for CBD. In recent years, the incidence of VBDs has increased globally, affecting one billion people approximately and accounting for 17% of all infectious diseases. Globally, disease rates have risen at an alarming rate, with more than 3.9 billion people at risk of infection. Therefore, it is essential to find approaches to detect these diseases; this is where machine learning (ML) models come into play. The purpose of this study was to predict VBDs using tabular epidemiological data. For this purpose, a set of ML models was used, such as support vector classifier (SVC), extreme gradient boosting (XGBoost), LightGBM, CatBoost, random forest (RF), and balanced random forest (BRF). A dataset consisting of 65 features and 1262 records was used during the training stage. The results highlighted the successful integration of the different models, such as SVC, XGBoost, LightGBM, CatBoost, BRF, and RF, with weights of 0.49959 ± 0.27112, 0.58496 ± 0.22619, 0.48482 ± 0.29971, 0.54992 ± 0.27982, 0.24924 ± 0.22654, and 0.45592 ± 0.25849. In addition, the BRF model stood out for having the lowest log loss, evaluated through the ensemble log-loss metric, with an average of 0.24924 and a standard deviation of 0.22654. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index