Using gradient boosting with stability selection on health insurance claims data to identify disease trajectories in chronic obstructive pulmonary disease
Autor: | Jochen Walker, Philipp Drewe-Boss, Tina Ploner, Marcus Grum, Steffen Heß |
---|---|
Rok vydání: | 2020 |
Předmět: |
Statistics and Probability
medicine.medical_specialty Databases Factual Epidemiology Stability (learning theory) Pulmonary disease Disease 01 natural sciences Severity of Illness Index 010104 statistics & probability 03 medical and health sciences Pulmonary Disease Chronic Obstructive 0302 clinical medicine Health Information Management Risk Factors Claims data medicine Health insurance Humans 030212 general & internal medicine 0101 mathematics Intensive care medicine Selection (genetic algorithm) Insurance Health business.industry Disease progression Gradient boosting business |
Zdroj: | Statistical methods in medical research. 29(12) |
ISSN: | 1477-0334 |
Popis: | Objective We propose a data-driven method to detect temporal patterns of disease progression in high-dimensional claims data based on gradient boosting with stability selection. Materials and methods We identified patients with chronic obstructive pulmonary disease in a German health insurance claims database with 6.5 million individuals and divided them into a group of patients with the highest disease severity and a group of control patients with lower severity. We then used gradient boosting with stability selection to determine variables correlating with a chronic obstructive pulmonary disease diagnosis of highest severity and subsequently model the temporal progression of the disease using the selected variables. Results We identified a network of 20 diagnoses (e.g. respiratory failure), medications (e.g. anticholinergic drugs) and procedures associated with a subsequent chronic obstructive pulmonary disease diagnosis of highest severity. Furthermore, the network successfully captured temporal patterns, such as disease progressions from lower to higher severity grades. Discussion The temporal trajectories identified by our data-driven approach are compatible with existing knowledge about chronic obstructive pulmonary disease showing that the method can reliably select relevant variables in a high-dimensional context. Conclusion We provide a generalizable approach for the automatic detection of disease trajectories in claims data. This could help to diagnose diseases early, identify unknown risk factors and optimize treatment plans. |
Databáze: | OpenAIRE |
Externí odkaz: |