Multicollinearity Applied Stepwise StochasticImputation: A Large Dataset Imputation throughCorrelation-based Regression

Autor: Benjamin D. Leiby, Darryl K. Ahner
Rok vydání: 2022
DOI: 10.21203/rs.3.rs-1894388/v1
Popis: This paper presents a stochastic imputation approach for large datasets using acorrelation selection methodology when preferred commercial packages struggleto iterate due to numerical problems. A variable range-based guard railmodification is proposed that benefits the convergence rate of data elementswhile simultaneously providing increased confidence in the plausibility of theimputations. A large country conflict dataset motivates the search to imputemissing values well over a common threshold of 20% missingness. TheMulticollinearity Applied Stepwise Stochastic imputation methodology(MASS-impute) capitalizes on correlation between variables within the datasetand uses model residuals to estimate unknown values. Examination of themethodology provides insight toward choosing linear or nonlinear modeling terms.Tailorable tolerances exploit residual information to fit each data element. Themethodology evaluation includes observing computation time, model fit, and thecomparison of known values to replaced values created through imputation.Overall, the methodology provides useable and defendable results in imputingmissing elements of a country conflict dataset.
Databáze: OpenAIRE