Effects of the Training Dataset Characteristics on the Performance of Nine Species Distribution Models: Application to Diabrotica virgifera virgifera

Autor: Jan Pergl, Philippe Reynaud, Dominic Eyre, Richard Baker, Maxime Dupin, Vojtěch Jarošík, Sarah Brunel, David Makowski
Přispěvatelé: Unité de recherche Zoologie Forestière (URZF), Institut National de la Recherche Agronomique (INRA), Peuplements végétaux et bioagresseurs en milieu tropical (UMR PVBMT), Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-Institut de Recherche pour le Développement (IRD)-Institut National de la Recherche Agronomique (INRA)-Université de La Réunion (UR), Lab Sante Vegetaux, Stn Angers, Agence nationale de sécurité sanitaire de l'alimentation, de l'environnement et du travail (ANSES), Fac Sci, Dept Ecol, Charles University [Prague] (CU), Inst Bot, Biology Centre of the ASCR, Food & Environm Res Agcy, EPPO OEPP, Inst Ecol & Evolut, University of Bern, Agronomie, Institut National de la Recherche Agronomique (INRA)-AgroParisTech, European Commission [212459], Czech Science Foundation [206/09/0563], Ministry of Education, Youth and Sports of the Czech Republic [MSM0021620828, AV0Z60050516, LC06073], Unité de recherche Zoologie Forestière (UZF), Institut de Recherche pour le Développement (IRD)-Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-Université de La Réunion (UR)-Institut National de la Recherche Agronomique (INRA), Charles University, AgroParisTech-Institut National de la Recherche Agronomique (INRA)
Jazyk: angličtina
Rok vydání: 2011
Předmět:
0106 biological sciences
Calibration (statistics)
ENVELOPE MODELS
Systems Engineering
ACCURACY
[SDV]Life Sciences [q-bio]
Species distribution
lcsh:Medicine
Plant Science
01 natural sciences
Engineering
Statistics
WESTERN CORN-ROOTWORM
lcsh:Science
TEMPERATURE
Mathematics
Principal Component Analysis
Plant Pests
Multidisciplinary
CLIMATE-CHANGE
Geography
Ecology
biology
Agriculture
Research Assessment
Europe
Community Ecology
Principal component analysis
COLEOPTERA
Risk Analysis
Research Article
SAMPLE-SIZE
GEOGRAPHICAL-DISTRIBUTION
BIOLOGICAL INVASIONS
CHRYSOMELIDAE
Science Policy
Cereals
Crops
Ecological Risk
Zea mays
010603 evolutionary biology
Model Organisms
Plant and Algal Models
Plant-Environment Interactions
Animals
Biology
Receiver operating characteristic
Plant Ecology
010604 marine biology & hydrobiology
lcsh:R
Training (meteorology)
Plant Pathology
biology.organism_classification
Maize
Support vector machine
Western corn rootworm
Sample size determination
North America
lcsh:Q
Pest Control
Zdroj: PLoS ONE
PLoS ONE, Public Library of Science, 2011, 6 (6), ⟨10.1371/journal.pone.0020957⟩
PLoS ONE, 2011, 6 (6), ⟨10.1371/journal.pone.0020957⟩
Plos One 6 (6), . (2011)
PLoS ONE, Vol 6, Iss 6, p e20957 (2011)
ISSN: 1932-6203
DOI: 10.1371/journal.pone.0020957⟩
Popis: Many distribution models developed to predict the presence/absence of invasive alien species need to be fitted to a training dataset before practical use. The training dataset is characterized by the number of recorded presences/absences and by their geographical locations. The aim of this paper is to study the effect of the training dataset characteristics on model performance and to compare the relative importance of three factors influencing model predictive capability; size of training dataset, stage of the biological invasion, and choice of input variables. Nine models were assessed for their ability to predict the distribution of the western corn rootworm, Diabrotica virgifera virgifera, a major pest of corn in North America that has recently invaded Europe. Twenty-six training datasets of various sizes (from 10 to 428 presence records) corresponding to two different stages of invasion (1955 and 1980) and three sets of input bioclimatic variables (19 variables, six variables selected using information on insect biology, and three linear combinations of 19 variables derived from Principal Component Analysis) were considered. The models were fitted to each training dataset in turn and their performance was assessed using independent data from North America and Europe. The models were ranked according to the area under the Receiver Operating Characteristic curve and the likelihood ratio. Model performance was highly sensitive to the geographical area used for calibration; most of the models performed poorly when fitted to a restricted area corresponding to an early stage of the invasion. Our results also showed that Principal Component Analysis was useful in reducing the number of model input variables for the models that performed poorly with 19 input variables. DOMAIN, Environmental Distance, MAXENT, and Envelope Score were the most accurate models but all the models tested in this study led to a substantial rate of mis-classification.
Databáze: OpenAIRE