Geostatistical Learning: Challenges and Opportunities
Autor: | Maciel Zortea, Bianca Zadrozny, Breno de Carvalho, Júlio Hoffimann |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2021 |
Předmět: |
FOS: Computer and information sciences
Statistics and Probability Independent and identically distributed random variables Computer Science - Machine Learning Geospatial analysis Computer science Machine Learning (stat.ML) Context (language use) transfer learning Machine learning computer.software_genre 01 natural sciences QA273-280 Machine Learning (cs.LG) Domain (software engineering) 010104 statistics & probability density ratio estimation Statistics - Machine Learning 0103 physical sciences Covariate 0101 mathematics 010303 astronomy & astrophysics geospatial T57-57.97 Applied mathematics. Quantitative methods business.industry Applied Mathematics Model selection covariate shift Statistical learning theory Artificial intelligence geostatistical learning Transfer of learning business computer Probabilities. Mathematical statistics importance weighted cross-validation |
Zdroj: | Frontiers in Applied Mathematics and Statistics, Vol 7 (2021) |
ISSN: | 2297-4687 |
DOI: | 10.3389/fams.2021.689393/full |
Popis: | Statistical learning theory provides the foundation to applied machine learning, and its various successful applications in computer vision, natural language processing and other scientific domains. The theory, however, does not take into account the unique challenges of performing statistical learning in geospatial settings. For instance, it is well known that model errors cannot be assumed to be independent and identically distributed in geospatial (a.k.a. regionalized) variables due to spatial correlation; and trends caused by geophysical processes lead to covariate shifts between the domain where the model was trained and the domain where it will be applied, which in turn harm the use of classical learning methodologies that rely on random samples of the data. In this work, we introduce the geostatistical (transfer) learning problem, and illustrate the challenges of learning from geospatial data by assessing widely-used methods for estimating generalization error of learning models, under covariate shift and spatial correlation. Experiments with synthetic Gaussian process data as well as with real data from geophysical surveys in New Zealand indicate that none of the methods are adequate for model selection in a geospatial context. We provide general guidelines regarding the choice of these methods in practice while new methods are being actively researched. |
Databáze: | OpenAIRE |
Externí odkaz: |