Spatial Random Forest (S-RF): A random forest approach for spatially interpolating missing land-cover data with multiple classes
Autor: | Jacinta Holloway-Brown, Kate J. Helmstedt, Kerrie Mengersen |
---|---|
Rok vydání: | 2021 |
Předmět: |
010504 meteorology & atmospheric sciences
Environmental change Computer science 0211 other engineering and technologies 02 engineering and technology Land cover 01 natural sciences Random forest Key (cryptography) General Earth and Planetary Sciences Satellite imagery Cartography 021101 geological & geomatics engineering 0105 earth and related environmental sciences |
DOI: | 10.6084/m9.figshare.14034617.v1 |
Popis: | Land-cover maps are important tools for monitoring large-scale environmental change and can be regularly updated using free satellite imagery data. A key challenge with constructing these maps is missing data in the satellite images on which they are based. To address this challenge, we created a Spatial Random Forest (S-RF) model that can accurately interpolate missing data in satellite images based on a modest training set of observed data in the image of interest. We demonstrate that this approach can be effective with only a minimal number of spatial covariates, namely latitude and longitude. The motivation for only using latitude and longitude in our model is that these covariates are available for all images whether the data are observed or missing due to cloud cover. The S-RF model can flexibly partition these covariates to provide accurate estimates, with easy incorporation of additional covariates to improve estimation if available. The effectiveness of our approach has been previously demonstrated for prediction of two land-cover classes in an Australian case study. In this paper, we extend the method to more than two classes. We demonstrate the performance of the S-RF method at interpolating multiple land-cover classes, using a case study drawn from South America. The results show that the method is best at predicting three land-cover classes, compared with 5 or 10 classes, and that other information is needed to improve performance as the number of classes grows, particularly if the classes are unbalanced. We explore two issues through a sensitivity analysis: the influence of the amount of missing data in the image and the influence of the amount of training data for model development and performance. The results show that the amount of missing data due to cloud cover is influential on model performance for multiple classes. We also found that increasing the amount of training data beyond 100,000 observations had minimal impact on model accuracy. Hence, a relatively small amount of observed data is required for training the model, which is beneficial for computation time. |
Databáze: | OpenAIRE |
Externí odkaz: |