Comparing methods for estimation of heterogeneous treatment effects using observational data from health care databases
Autor: | Alejandro Schuler, Blanca Gallego, Alison Callahan, Thierry Wendling, Kenneth Jung, Nigam H. Shah |
---|---|
Rok vydání: | 2018 |
Předmět: |
Statistics and Probability
Databases Factual Epidemiology Computer science Biostatistics computer.software_genre 01 natural sciences Statistics Nonparametric Machine Learning 010104 statistics & probability 03 medical and health sciences 0302 clinical medicine Outcome Assessment Health Care Covariate Health care Humans Computer Simulation 030212 general & internal medicine 0101 mathematics Propensity Score Models Statistical Multivariate adaptive regression splines Database business.industry Confounding Bayes Theorem Causality Observational Studies as Topic Treatment Outcome Estimand Causal inference Propensity score matching Regression Analysis Observational study business computer |
Zdroj: | Statistics in Medicine. 37:3309-3324 |
ISSN: | 0277-6715 |
DOI: | 10.1002/sim.7820 |
Popis: | There is growing interest in using routinely collected data from health care databases to study the safety and effectiveness of therapies in "real-world" conditions, as it can provide complementary evidence to that of randomized controlled trials. Causal inference from health care databases is challenging because the data are typically noisy, high dimensional, and most importantly, observational. It requires methods that can estimate heterogeneous treatment effects while controlling for confounding in high dimensions. Bayesian additive regression trees, causal forests, causal boosting, and causal multivariate adaptive regression splines are off-the-shelf methods that have shown good performance for estimation of heterogeneous treatment effects in observational studies of continuous outcomes. However, it is not clear how these methods would perform in health care database studies where outcomes are often binary and rare and data structures are complex. In this study, we evaluate these methods in simulation studies that recapitulate key characteristics of comparative effectiveness studies. We focus on the conditional average effect of a binary treatment on a binary outcome using the conditional risk difference as an estimand. To emulate health care database studies, we propose a simulation design where real covariate and treatment assignment data are used and only outcomes are simulated based on nonparametric models of the real outcomes. We apply this design to 4 published observational studies that used records from 2 major health care databases in the United States. Our results suggest that Bayesian additive regression trees and causal boosting consistently provide low bias in conditional risk difference estimates in the context of health care database studies. |
Databáze: | OpenAIRE |
Externí odkaz: |