Popis: |
BackgroundEpidemiological studies often have missing data. Multiple imputation (MI) is a commonly-used strategy for such studies. MI guidelines for structuring the imputation model have focused on compatibility with the analysis model, but not on the need for the (compatible) imputation model(s) to be correctly specified. Standard (default) MI procedures use simple linear functions. We examine the bias this causes and performance of methods to identify problematic imputation models, providing practical guidance for researchers.MethodsBy simulation and real data analysis, we investigated how imputation model mis-specification affected MI performance, comparing results with complete records analysis (CRA). We considered scenarios in which imputation model mis-specification occurred because (i) the analysis model was mis-specified, or (ii) the relationship between exposure and confounder was mis-specified.ResultsMis-specification of the relationship between outcome and exposure, or between exposure and confounder in the imputation model for the exposure, could result in substantial bias in CRA and MI estimates (in addition to any bias in the full-data estimate due to analysis model mis-specification). MI by predictive mean matching could mitigate for model mis-specification. Model mis-specification tests were effective in identifying mis-specified relationships. These could be easily applied in any setting in which CRA was, in principle, valid and data were missing at random (MAR).ConclusionWhen using MI methods that assume data are MAR, compatibility between the analysis and imputation models is necessary, but is not sufficient to avoid bias. We propose an easy-to-follow, step-by-step procedure for identifying and correcting mis-specification of imputation models. |