statistics will avoid approaches especially prone to serious
abuse. In this regard, we join others in singling out the
degradation of P values into ‘‘signiﬁcant’’ and ‘‘nonsignif-
icant’’ as an especially pernicious statistical practice .
Acknowledgments SJS receives funding from the IDEAL project
supported by the European Union’s Seventh Framework Programme
for research, technological development and demonstration under
Grant Agreement No. 602552. We thank Stuart Hurlbert, Deborah
Mayo, Keith O’Rourke, and Andreas Stang for helpful comments, and
Ron Wasserstein for his invaluable encouragement on this project.
Open Access This article is distributed under the terms of the Creative
Commons Attribution 4.0 International License (http://creative
commons.org/licenses/by/4.0/), which permits unrestricted use, distri-
bution, and reproduction in any medium, provided you give appropriate
credit to the original author(s) and the source, provide a link to the
Creative Commons license, and indicate if changes were made.
1. Lang JM, Rothman KJ, Cann CI. That confounded P-value.
2. Traﬁmow D, Marks M. Editorial. Basic Appl Soc Psychol.
3. Ashworth A. Veto on the use of null hypothesis testing and p
intervals: right or wrong? Taylor & Francis Editor. 2015.
Resources online, http://editorresources.taylorandfrancisgroup.
right-or-wrong/. Accessed 27 Feb 2016.
4. Flanagan O. Journal’s ban on null hypothesis signiﬁcance test-
ing: reactions from the statistical arena. 2015. Stats Life online,
Accessed 27 Feb 2016.
5. Altman DG, Machin D, Bryant TN, Gardner MJ, eds. Statistics
with conﬁdence. 2nd ed. London: BMJ Books; 2000.
6. Atkins L, Jarrett D. The signiﬁcance of ‘‘signiﬁcance tests’’. In:
Irvine J, Miles I, Evans J, editors. Demystifying social statistics.
London: Pluto Press; 1979.
7. Cox DR. The role of signiﬁcance tests (with discussion). Scand J
8. Cox DR. Statistical signiﬁcance tests. Br J Clin Pharmacol.
9. Cox DR, Hinkley DV. Theoretical statistics. New York: Chap-
man and Hall; 1974.
10. Freedman DA, Pisani R, Purves R. Statistics. 4th ed. New York:
11. Gigerenzer G, Swijtink Z, Porter T, Daston L, Beatty J, Kruger
L. The empire of chance: how probability changed science and
everyday life. New York: Cambridge University Press; 1990.
12. Harlow LL, Mulaik SA, Steiger JH. What if there were no
signiﬁcance tests?. New York: Psychology Press; 1997.
13. Hogben L. Statistical theory. London: Allen and Unwin; 1957.
14. Kaye DH, Freedman DA. Reference guide on statistics. In:
Reference manual on scientiﬁc evidence, 3rd ed. Washington,
DC: Federal Judicial Center; 2011. p. 211–302.
15. Morrison DE, Henkel RE, editors. The signiﬁcance test con-
troversy. Chicago: Aldine; 1970.
16. Oakes M. Statistical inference: a commentary for the social and
behavioural sciences. Chichester: Wiley; 1986.
17. Pratt JW. Bayesian interpretation of standard inference state-
ments. J Roy Stat Soc B. 1965;27:169–203.
18. Rothman KJ, Greenland S, Lash TL. Modern epidemiology. 3rd
ed. Philadelphia: Lippincott-Wolters-Kluwer; 2008.
19. Ware JH, Mosteller F, Ingelﬁnger JA. p-Values. In: Bailar JC,
Hoaglin DC, editors. Ch. 8. Medical uses of statistics. 3rd ed.
Hoboken, NJ: Wiley; 2009. p. 175–94.
20. Ziliak ST, McCloskey DN. The cult of statistical signiﬁcance:
how the standard error costs us jobs, justice and lives. Ann
Arbor: U Michigan Press; 2008.
21. Altman DG, Bland JM. Absence of evidence is not evidence of
absence. Br Med J. 1995;311:485.
22. Anscombe FJ. The summarizing of clinical experiments by
signiﬁcance levels. Stat Med. 1990;9:703–8.
23. Bakan D. The test of signiﬁcance in psychological research.
Psychol Bull. 1966;66:423–37.
24. Bandt CL, Boen JR. A prevalent misconception about sample
size, statistical signiﬁcance, and clinical importance. J Peri-
25. Berkson J. Tests of signiﬁcance considered as evidence. J Am
Stat Assoc. 1942;37:325–35.
26. Bland JM, Altman DG. Best (but oft forgotten) practices: testing
for treatment effects in randomized trials by separate analyses of
changes from baseline in each group is a misleading approach.
Am J Clin Nutr. 2015;102:991–4.
27. Chia KS. ‘‘Signiﬁcant-itis’’—an obsession with the P-value.
Scand J Work Environ Health. 1997;23:152–4.
28. Cohen J. The earth is round (p \ 0.05). Am Psychol.
29. Evans SJW, Mills P, Dawson J. The end of the P-value? Br
Heart J. 1988;60:177–80.
30. Fidler F, Loftus GR. Why ﬁgures with error bars should replace
p values: some conceptual arguments and empirical demon-
strations. J Psychol. 2009;217:27–37.
31. Gardner MA, Altman DG. Conﬁdence intervals rather than P
values: estimation rather than hypothesis testing. Br Med J.
32. Gelman A. P-values and statistical practice. Epidemiology.
33. Gelman A, Loken E. The statistical crisis in science: Data-de-
pendent analysis—a ‘‘garden of forking paths’’—explains why
many statistically signiﬁcant comparisons don’t hold up. Am
Sci. 2014;102:460–465. Erratum at http://andrewgelman.com/
2014/10/14/didnt-say-part-2/. Accessed 27 Feb 2016.
34. Gelman A, Stern HS. The difference between ‘‘signiﬁcant’’ and
‘‘not signiﬁcant’’ is not itself statistically signiﬁcant. Am Stat.
35. Gigerenzer G. Mindless statistics. J Socioecon.
36. Gigerenzer G, Marewski JN. Surrogate science: the idol of a
universal method for scientiﬁc inference. J Manag. 2015;41:
37. Goodman SN. A comment on replication, p-values and evi-
dence. Stat Med. 1992;11:875–9.
38. Goodman SN. P-values, hypothesis tests and likelihood: impli-
cations for epidemiology of a neglected historical debate. Am J
39. Goodman SN. Towards evidence-based medical statistics, I: the
P-value fallacy. Ann Intern Med. 1999;130:995–1004.
40. Goodman SN. A dirty dozen: twelve P-value misconceptions.
Semin Hematol. 2008;45:135–40.
41. Greenland S. Null misinterpretation in statistical testing and
its impact on health risk assessment. Prev Med. 2011;53:
42. Greenland S. Nonsigniﬁcance plus high power does not imply
support for the null over the alternative. Ann Epidemiol.
348 S. Greenland et al.