Monday, March 24, 2014

Debating the p-value in Ecology

It is interesting that p-values still garner so much ink: it says something about how engrained and yet embattled they are. This month’s Ecology issue explores the p-value problem with a forum of 10 new short papers* on the strengths and weaknesses, defenses and critiques, and various alternatives to “the probability (p) of obtaining a statistic at least as extreme as the observed statistic, given that the null hypothesis is true”.

The defense for p-values is lead by Paul Murtaugh, who provides the opening and closing arguments. Murtaugh, who has written a number of good papers about ecological data analysis and statistics, takes a pragmatic approach to null hypothesis testing and p-values. He argues p-values are not flawed so much as they are regularly and egregiously misused and misinterpreted. In particular, he demonstrates mathematically that alternative approaches to the p-value, particularly the use of confidence intervals or information theoretic criteria (e.g. AIC), simply present the same information as p-values in slightly different fashions. This is a point that the contribution by Perry de Valpine supports, noting that all of these approaches are simply different ways of displaying likelihood ratios and the argument that one is inherently superior ignores their close relationship. In addition, although acknowledging that cutoff values for significant p-values are logically problematic (why is a p-value of 0.049 so much more important than one of 0.051?), Murtaugh notes that cutoffs reflect decisions about acceptable error rates and so are not inherently meaningless. Further, he argues that dAIC cutoffs for model selection are no less arbitrary. 

The remainder of the forum is a back and forth argument about Murtaugh’s particular points and about the merits of the other authors’ chosen methods (Bayesian, information theoretic and other approaches are represented). It’s quite entertaining, and this forum is really a great idea that I hope will feature in future issues. Some of the arguments are philosophical – are p-values really “evidence” and is it possible to “accept” or “reject” a hypothesis using their values? Murtaugh does downplay the well-known problem that p-values summarize the strength of evidence against the null hypothesis, and do not assess the strength of evidence supporting a hypothesis or model. This can make them prone to misinterpretation (most students in intro stats want to say “the p-value supports the alternate hypothesis”) or else interpretation in stilted statements.

Not surprisingly, Murtaugh receives the most flak for defending p-values from researchers working in alternate worldviews like Bayesian and information-theoretic approaches. (His interactions with K. Burnham and D. Anderson (AIC) seem downright testy. Burnham and Anderson in fact start their paper “We were surprised to see a paper defending P values and signi´Čücance testing at this time in history”...) But having this diversity of authors plays a useful role, in that it highlights that each approach has it's own merits and best applications. Null hypothesis testing with p-values may be most appropriate for testing the effects of treatments on randomized experiments, while AIC values are useful when we are comparing multi-parameter, non-nested models. Bayesian similarly may be more useful to apply to some approaches than others. This focus on the “one true path” to statistics may be behind some of the current problems with p-values: they were used as a standard to help make ecology more rigorous, but the focus on p-values came at the expense of reporting effect sizes, making predictions, or careful experimental design.

Even at this point in history, it seems like there is still lots to say about p-values.

*But not open-access, which is really too bad. 

No comments:

Post a Comment