Abstrakt: |
We conduct factor importance analysis on the insurance transitioning behavior of the public health insurance enrollees in the U.S., based on the Medical Expenditure Panel Survey (2-year longitudinal files, 2011–2018). The factors in the dataset consist of both numerical and categorical type data, which makes it very challenging to compare their importances. In order to measure the effects of these factors of mixed data types, two relatively "model-free" ranking scores are introduced: vote count and average partial dependence ranking score. Different from traditional ranking scores, the novel ranking scores apply to any regression and classification methods and to both numerical and categorical factors. A voting ensemble is designed to obtain the above two ranking scores, with four competitive base learners: forward and backward stepwise subset selections, LASSO and random forest. The top five driving factors selected by our voting ensemble are number of physician office visits, family size, chronic condition, age and family income. A predictive model based on the top-ranked factors is provided. This model is competitive to other popular prediction methods, according to our model validation result. [ABSTRACT FROM AUTHOR] |