Abstrakt: |
Neural networks are a contending data mining procedure to estimate propensity scores due to its robustness to non-normal residual distributions, ability to detect complex nonlinear relationships between treatments and confounding variables, nonessential model specification, and compatibility to train based on observed events. In this study, we develop artificial neural network architectures to estimate propensity scores for categorical treatments. For comparison, we estimated propensity scores with more popular techniques: logistic regression, multinomial logistic regression, and generalized boosted logistic regression using regression trees (GBM). Previous studies found lower prediction error of GBM compared with alternative methods and demonstrated that it does not require model specification yet mentions several cases of overfitting. We used Monte Carlo simulations manipulating sample coefficients, model specifications, and fixed sample sizes to compare the generalization error of trained machine-learning algorithms to never-before-seen data. Neural networks resulted in higher correlations between true propensity scores and estimated propensity scores. Also, other performance measures, such as cross-entropy values, suggest that artificial neural networks may be more accurate than more popular methods to estimate propensity scores. [ABSTRACT FROM AUTHOR] |