Predicting the future is hard and other lessons from a population time series data science competition
Autor: | Peter Bull, Ambarish Ganguly, Thomas Bolton, Heather J. Lynch, Benjamin Carrion, Grant R. W. Humphries, Christian Che-Castaldo, Greg Lipstein, Aharon Ravia |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2018 |
Předmět: |
0106 biological sciences
010504 meteorology & atmospheric sciences Computer science Population Bayesian inference Machine learning computer.software_genre 010603 evolutionary biology 01 natural sciences Domain (software engineering) Component (UML) Time series education Ecology Evolution Behavior and Systematics 0105 earth and related environmental sciences Structure (mathematical logic) education.field_of_study Ecology Ensemble forecasting business.industry Applied Mathematics Ecological Modeling Computer Science Applications Subject-matter expert Computational Theory and Mathematics Modeling and Simulation Artificial intelligence business computer |
Zdroj: | Ecological Informatics. 48 |
ISSN: | 1878-0512 1574-9541 |
Popis: | Population forecasting, in which past dynamics are used to make predictions of future state, has many real-world applications. While time series of animal abundance are often modeled in ways that aim to capture the underlying biological processes involved, doing so is neither necessary nor sufficient for making good predictions. Here we report on a data science competition focused on modelling time series of Antarctic penguin abundance. We describe the best performing submitted models and compare them to a Bayesian model previously developed by domain experts and build an ensemble model that outperforms the individual component models in prediction accuracy. The top performing models varied tremendously in model complexity, ranging from very simple forward extrapolations of average growth rate to ensembles of models integrating recently developed machine learning techniques. Despite the short time frame for the competition, four of the submitted models outperformed the model previously created by the team of domain experts. We discuss the structure of the best performing models and components therein that might be useful for other ecological applications, the benefit of creating ensembles of models for ecological prediction, and the costs and benefits of including detailed domain expertise in ecological modelling. Additionally, we discuss the benefits of data science competitions, among which are increased visibility for challenging science questions, the generation of new techniques not yet adopted within the ecological community, and the ability to generate ensemble model forecasts that directly address model uncertainty. |
Databáze: | OpenAIRE |
Externí odkaz: |