Popis: |
Regression analyses in epidemiological and medical research typically begin with a model selection process, followed by inference assuming the selected model has generated the data at hand. It is well-known that this two-step procedure can yield biased estimates and invalid confidence intervals for model coefficients due to the uncertainty associated with the model selection. To account for this uncertainty, multiple models may be selected as a basis for inference. This method, commonly referred to as model-averaging, is increasingly becoming a viable approach in practice. Previous research has demonstrated the advantage of model-averaging in reducing bias of parameter estimates. However, there is lack of methods for constructing confidence intervals around parameter estimates using model-averaging. In the context of multiple logistic regression models, we propose and evaluate new confidence interval estimation approaches for regression coefficients. Specifically, we study the properties of confidence intervals constructed by averaging tail errors arising from confidence limits obtained from all models included in model-averaging for parameter estimation. We propose model-averaging confidence intervals based on the score test. For selection of models to be averaged, we propose the bootstrap inclusion fractions method. We evaluate the performance of our proposed methods using simulation studies, in a comparison with model-averaging interval procedures based on likelihood ratio and Wald tests, traditional stepwise procedures, the bootstrap approach, penalized regression, and the Bayesian model-averaging approach. Methods with good performance have been implemented in the 'mataci' R package, and illustrated using data from a low birth weight study. |