Individual risk prediction: Comparing random forests with Cox proportional‐hazards model by a simulation study.

Autor: Baralou, Valia, Kalpourtzi, Natasa, Touloumi, Giota
Zdroj: Biometrical Journal; Aug2023, Vol. 65 Issue 6, p1-13, 13p
Abstrakt: With big data becoming widely available in healthcare, machine learning algorithms such as random forest (RF) that ignores time‐to‐event information and random survival forest (RSF) that handles right‐censored data are used for individual risk prediction alternatively to the Cox proportional hazards (Cox‐PH) model. We aimed to systematically compare RF and RSF with Cox‐PH. RSF with three split criteria [log‐rank (RSF‐LR), log‐rank score (RSF‐LRS), maximally selected rank statistics (RSF‐MSR)]; RF, Cox‐PH, and Cox‐PH with splines (Cox‐S) were evaluated through a simulation study based on real data. One hundred eighty scenarios were investigated assuming different associations between the predictors and the outcome (linear/linear and interactions/nonlinear/nonlinear and interactions), training sample sizes (500/1000/5000), censoring rates (50%/75%/93%), hazard functions (increasing/decreasing/constant), and number of predictors (seven, 15 including noise variables). Methods' performance was evaluated with time‐dependent area under curve and integrated Brier score. In all scenarios, RF had the worst performance. In scenarios with a low number of events (⩽70), Cox‐PH was at least noninferior to RSF, whereas under linearity assumption it outperformed RSF. Under the presence of interactions, RSF performed better than Cox‐PH as the number of events increased whereas Cox‐S reached at least similar performance with RSF under nonlinear effects. RSF‐LRS performed slightly worse than RSF‐LR and RSF‐MSR when including noise variables and interaction effects. When applied to real data, models incorporating survival time performed better. Although RSF algorithms are a promising alternative to conventional Cox‐PH as data complexity increases, they require a higher number of events for training. In time‐to‐event analysis, algorithms that consider survival time should be used. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index