Manipulating the alpha level cannot cure significance testing – comments on 'Redefine statistical significance'

Autor: Armina Janyan, Marion Tegethoff, Martin E. Guerrero-Gimenez, Rubén Daniel Ledesma, Fränzi Korner-Nievergelt, James W. Grice, Yuki Yamada, Corson N. Areshenkoff, Carlos Barrera-Causil, Michiel R. de Boer, Michael J. Marks, Ivan Vankov, Felipe Carlos Martín Zoppino, Martin Lachmair, Susana Ruiz-Fernández, Koji Kosugi, Isabel Suarez, Marian Grendar, Tonghui Wang, Juan José Rahona, Subhra Sankar Dhar, Valentin Amrhein, Jose D Perezgonzalez, Ladislas Nalborczyk, David A. Rodriguez-Medina, Eric J. Beh, Juan Carlos Correa, M. T. Bradley, Fernando Marmolejo-Ramos, William M. Briggs, Rens van de Schoot, Marco Tullio Liuzza, Daniel R. Ciocca, Roberto Limongi, Rosaria Lombardo, Juana Gómez-Benito, Igor Dolgov, Andrés Gutiérrez, Roland Pfister, Yusuf K. Bilgic, Héctor A. Cepeda-Freyre, Ali Karimnezhad, Tania B. Huedo-Medina, David Trafimow, Gunther Meinlschmidt, Denis Cousineau, Sergio E. Chaigneau, Raydonal Ospina, Roser Bono, Xavier Romão, Klaus Jaffe, Santiago Velasco-Forero, Mauricio Tejo, Hung T. Nguyen
Jazyk: angličtina
Rok vydání: 2017
Předmět:
DOI: 10.7287/peerj.preprints.3411v1
Popis: We argue that depending on p-values to reject null hypotheses, including a recent call for changing the canonical alpha level for statistical significance from .05 to .005, is deleterious for the finding of new discoveries and the progress of science. Given that blanket and variable criterion levels both are problematic, it is sensible to dispense with significance testing altogether. There are alternatives that address study design and determining sample sizes much more directly than significance testing does; but none of the statistical tools should replace significance testing as the new magic method giving clear-cut mechanical answers. Inference should not be based on single studies at all, but on cumulative evidence from multiple independent studies. When evaluating the strength of the evidence, we should consider, for example, auxiliary assumptions, the strength of the experimental design, or implications for applications. To boil all this down to a binary decision based on a p-value threshold of .05, .01, .005, or anything else, is not acceptable.
Databáze: OpenAIRE