Using machine learning models to classify stroke risk level based on national screening data
Autor: | Xuemeng Li, Huajian Mao, Di Bian, Dongsheng Zhao, Jinghui Yu, Mei Li |
---|---|
Rok vydání: | 2020 |
Předmět: |
Gerontology
Adult China Population Decision tree Psychological intervention 030204 cardiovascular system & hematology Overweight Logistic regression Machine Learning 03 medical and health sciences 0302 clinical medicine Risk Factors Atrial Fibrillation medicine Humans cardiovascular diseases education Stroke education.field_of_study medicine.disease Random forest Ischemic Attack Transient medicine.symptom 030217 neurology & neurosurgery Decision tree model |
Zdroj: | EMBC |
ISSN: | 2694-0604 |
Popis: | With the character of high incidence, high prevalence and high mortality, stroke has brought a heavy burden to families and society in China. In 2009, the Ministry of Health of China launched the China national stroke screening and intervention program, which screens stroke risk factors and conducts high-risk population interventions for people aged over 40 years old all over China. In this program, stroke risk factors include hypertension, diabetes, dyslipidemia, atrial fibrillation, smoking, lack of exercise, apparently overweight or obese and family history of stroke. People with more than two risk factors or with a history of stroke or transient ischemic attack (TIA) are considered as high-risk. However, it is impossible for this criterion to classify stroke risk level for people with "unknown" values in the fields of risk factors. The missing of stroke risk levels results in reduced efficiency of stroke interventions and inaccuracies in the statistical results at the national level. In this paper, firstly, we construct the training set and test set and process the imbalanced training set based on oversampling and undersampling method. Then, we develop logistic regression model, decision tree model, neural network model and random forest model for stroke risk classification, and evaluate these models based on the recall and precision. The results show that the model based on random forest achieves best performance considering recall and precision. The models constructed in this paper can improve the screening efficiency and avoid unnecessary rescreening and intervention expenditures. |
Databáze: | OpenAIRE |
Externí odkaz: |