Popis: |
A wide range of analyses in such disciplines as land-use planning, natural hazards, environmental risk assessment and infection control requires information on the spatial distribution of population, which is traditionally obtained through censuses and aggregated up to the level of administrative units for reasons of privacy. Since this generalisation masks detailed information on the spatial distribution of the population within these units, dasymetric mapping techniques were developed to disaggregate population to a finer spatial level using ancillary data. To arrive at useful population estimates through dasymetric mapping, the adopted strategy should not only include a) the collection of relevant additional and related information on the actual population distribution and b) an appropriate calibration procedure, but also c) adequate assessment of the model’s accuracy. For the latter, many studies rely on error measures that are too sensitive to outliers (root mean squared error (RMSE), mean error (ME),mean absolute error (MAE)) and therefore do not guarantee a trustworthy accuracy assessment, especially given that population data and derived information typically do not follow a Gaussian distribution. Furthermore, other measures like the coefficient of determination R² appear regularly in the context of population model evaluation, even though they are in fact only appropriate for assessing explanatory instead of predictive power. Indeed, a common misconception across an extensive range of disciplines aimed at predicting outcomes (such as the field of population estimation) is that a model with high explanatory performance automatically implies high predictive performance. Pursuing explanatory or predictive power are two distinct goals though (see Shmueli, 2010; Shmueli & Koppius, 2011): whereas explanatory modelling focuses on interpretability, high goodness of fit and minimal bias, predictive modelling strives for a high level of accuracy and minimizing both bias and variance. The general lack of appropriate error measures and of correct predictive modelling practice urges the need for a transparent and reliable framework to effectively assess the predictive performance of population disaggregation techniques. Therefore, this study presents a box plot-based prediction error analysis approach that involves (i) the definition of robust measures to describe the error distribution (the median and the interquartile range (IQR)) and (ii) the mapping of outliers to investigate at which locations population estimation fails. The presence of bias and variability is evaluated by means of the proportional prediction error and the absolute proportional prediction error, respectively. The approach is demonstrated for the case of the Flanders and Brussels region (Belgium), where address type and household size information is used to disaggregate population to a fine-grained spatial level. It is found that the combined evaluation of the box-plot elements provides a multi-faceted view on accuracy, whereas a single measure like RMSE conceals important aspects of the error distribution and might prompt one-sided and even misguided conclusions. References Shmueli, G. (2010). To Explain or to Predict?. Statistical Science, 25(3): 289-310. Shmueli, G., Koppius, O.R. (2011). Predictive Analytics in Information System Research. MIS Quarterly, 35(3): 553-572. |