Model selection with multiple regression on distance matrices leads to incorrect inferences

Autor:	Marie-Josée Fortin, Erin L. Landguth, Ryan P. Franckowiak, Ian S. Acuña-Rodríguez, Karl J. Jarvis, Helene H. Wagner, Michael Panasci
Jazyk:	angličtina
Rok vydání:	2017
Předmět:	0106 biological sciences 0301 basic medicine Gene Flow Computer and Information Sciences Research Validity Heredity Statistical methods Bayesian probability lcsh:Medicine Statistics (mathematics) Research and Analysis Methods 010603 evolutionary biology 01 natural sciences Models Biological 03 medical and health sciences Bayesian information criterion Statistics Geoinformatics Genetics Statistics::Methodology lcsh:Science Mathematics Evolutionary Biology Multidisciplinary Population Biology Geography Model selection Simulation and Modeling lcsh:R Biology and Life Sciences Regression analysis Random Variables Research Assessment Probability Theory Spatial Autocorrelation Deviance information criterion Monte Carlo method 030104 developmental biology Sample size determination Physical Sciences Earth Sciences Mathematical and statistical techniques Regression Analysis lcsh:Q Akaike information criterion Distance matrices in phylogeny Population Genetics Research Article
Zdroj:	PLoS ONE PLoS ONE, Vol 12, Iss 4, p e0175194 (2017)
ISSN:	1932-6203
Popis:	In landscape genetics, model selection procedures based on Information Theoretic and Bayesian principles have been used with multiple regression on distance matrices (MRM) to test the relationship between multiple vectors of pairwise genetic, geographic, and environmental distance. Using Monte Carlo simulations, we examined the ability of model selection criteria based on Akaike's information criterion (AIC), its small-sample correction (AICc), and the Bayesian information criterion (BIC) to reliably rank candidate models when applied with MRM while varying the sample size. The results showed a serious problem: all three criteria exhibit a systematic bias toward selecting unnecessarily complex models containing spurious random variables and erroneously suggest a high level of support for the incorrectly ranked best model. These problems effectively increased with increasing sample size. The failure of AIC, AICc, and BIC was likely driven by the inflated sample size and different sum-of-squares partitioned by MRM, and the resulting effect on delta values. Based on these findings, we strongly discourage the continued application of AIC, AICc, and BIC for model selection with MRM.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::b6d13cdf5ef2eefd87a3bae503ba9cfd http://europepmc.org/articles/PMC5390996 Zobrazit plný text záznamu Plný text ve formátu PDF