Model selection with multiple regression on distance matrices leads to incorrect inferences

Autor: Marie-Josée Fortin, Erin L. Landguth, Ryan P. Franckowiak, Ian S. Acuña-Rodríguez, Karl J. Jarvis, Helene H. Wagner, Michael Panasci
Jazyk: angličtina
Rok vydání: 2017
Předmět:
0106 biological sciences
0301 basic medicine
Gene Flow
Computer and Information Sciences
Research Validity
Heredity
Statistical methods
Bayesian probability
lcsh:Medicine
Statistics (mathematics)
Research and Analysis Methods
010603 evolutionary biology
01 natural sciences
Models
Biological

03 medical and health sciences
Bayesian information criterion
Statistics
Geoinformatics
Genetics
Statistics::Methodology
lcsh:Science
Mathematics
Evolutionary Biology
Multidisciplinary
Population Biology
Geography
Model selection
Simulation and Modeling
lcsh:R
Biology and Life Sciences
Regression analysis
Random Variables
Research Assessment
Probability Theory
Spatial Autocorrelation
Deviance information criterion
Monte Carlo method
030104 developmental biology
Sample size determination
Physical Sciences
Earth Sciences
Mathematical and statistical techniques
Regression Analysis
lcsh:Q
Akaike information criterion
Distance matrices in phylogeny
Population Genetics
Research Article
Zdroj: PLoS ONE
PLoS ONE, Vol 12, Iss 4, p e0175194 (2017)
ISSN: 1932-6203
Popis: In landscape genetics, model selection procedures based on Information Theoretic and Bayesian principles have been used with multiple regression on distance matrices (MRM) to test the relationship between multiple vectors of pairwise genetic, geographic, and environmental distance. Using Monte Carlo simulations, we examined the ability of model selection criteria based on Akaike's information criterion (AIC), its small-sample correction (AICc), and the Bayesian information criterion (BIC) to reliably rank candidate models when applied with MRM while varying the sample size. The results showed a serious problem: all three criteria exhibit a systematic bias toward selecting unnecessarily complex models containing spurious random variables and erroneously suggest a high level of support for the incorrectly ranked best model. These problems effectively increased with increasing sample size. The failure of AIC, AICc, and BIC was likely driven by the inflated sample size and different sum-of-squares partitioned by MRM, and the resulting effect on delta values. Based on these findings, we strongly discourage the continued application of AIC, AICc, and BIC for model selection with MRM.
Databáze: OpenAIRE