MICE vs PPCA: Missing data imputation in healthcare

Autor: Harshad Hegde, Neel Shimpi, Aloksagar Panny, Ingrid Glurich, Pamela Christie, Amit Acharya
Jazyk: angličtina
Rok vydání: 2019
Předmět:
Zdroj: Informatics in Medicine Unlocked, Vol 17, Iss , Pp - (2019)
Druh dokumentu: article
ISSN: 2352-9148
DOI: 10.1016/j.imu.2019.100275
Popis: Retrospective analyses of real-world clinical data face challenges owing to the absence of some data elements. Historically, missing data was addressed by first classifying its presence into one of three categories: missing completely at random (MCAR), missing at random (MAR) and missing not at random (MNAR). Imputation techniques continue to be developed and tested to gauge their capacity to mitigate the negative impact of missing data types on analyses and their results. This study undertook a comparison of two techniques of data imputation: probabilistic principal component analysis (PPCA) and multiple imputation using chained equations (MICE).Retrospective data from 41,543 unique patients including both medical and dental variables (n = 116) were mined from the institutional research data warehouse, which captures data through an integrated medical and dental electronic health record (iEHR). A subset with complete data on all variables of interest was sampled. “Missing data” were artificially created by randomly removing data elements to create the missing data problem. Applying PPCA and MICE, the capacity of the two techniques to create an accurate imputed dataset was tested. Comparisons were drawn between imputed dataset and sampled subset, to investigate which technique more closely simulated the true data.PPCA outperformed MICE with an overall correct imputation percentage (accuracy) and root mean square error (RMSE) of approximately 65% and 0.29, respectively, compared to MICE, which yielded approximately 38% accuracy with a RMSE of 0.83.Overall, this study concluded that PPCA demonstrated higher capacity to impute MCAR data than MICE. Keywords: Imputation, Probabilistic principal component analysis, PPCA, Multiple imputations using chained equations, MICE, Medical dental data, Dental informatics
Databáze: Directory of Open Access Journals