The EEB & Flow: significance

Why do ecologists often get different answers to the same question? Depending on the study, for example, the relationship between biodiversity and ecosystem function could be positive, negative, or absent (e.g. Cardinale et al. 2012). Ecologists explain this in many ways - experimental issues and differences, context dependence. However, it may also be due to an even simpler issue, that of the statistical implications of small effect sizes.

This is the point that Lemoine et al. make in an interesting new report in Ecology. Experimental data from natural systems (e.g. for warming experiments, BEF experiments) is often highly variable, has low replication, and effect sizes are frequently small. Perhaps it is not surprising we see contradictory outcomes, because data with small true effect sizes are prone to high Type S (reflect the chance of obtaining the wrong sign for an effect) and Type M (the amount by with an effect size must be overestimated in order to be significant). Contradictory results arise from these statistical issues, combined with the idea that papers that do get published early on may simply have found significant effects by chance (the Winner's Curse).

Power reflects the chance of failing to correctly reject the null hypothesis (Ho). The power of ecological experiments increases with sample size (N), since uncertainty in data decreases with increasing N. However, if your true effect size is small, studies with low power have to significantly overestimate the effect size to have a significant p-value. This is the result of the fact that if the variation in your data is large and your effect size is small, the critical value for a significant z-score is quite large. Thus for your results to be significant, you need to observe an effect larger than this critical value, which will be much larger than the true effect size. It's a catch-22 for small effect sizes: if your result is correct, it very well may not be significant; if you have a significant result, you may be overestimating the effect size.

From Lemoine et al. 2016.

The solution to this issue is clearly a difficult one, but the authors make some useful suggestions. First, it's really the variability of your data, more than the sample size, that raises the Type M error. So if your data is small but beautifully behaved, this may not be a huge issue for you (but you must be working in a highly atypical system). If you can increase your replication, this is the obvious solution. But the other solutions they see are cultural shifts when we publish statistical results. As with many other, the authors suggest we move away from reliance on p-values as a pass/fail tool for results. In addition to reporting p-values, they suggest we report effect sizes and their error rates. Further, that this be done for all variables regardless of whether the results are significant. Type M error and power analyses can be reported in a fashion meant to inform interpretation of results: “However, low power (0.10) and high Type M error (2.0) suggest that this effect size is likely an overestimate. Attempts to replicate these findings will likely fail.”

Lemoine, N. P., Hoffman, A., Felton, A. J., Baur, L., Chaves, F., Gray, J., Yu, Q. and Smith, M. D. (2016), Underappreciated problems of low replication in ecological field studies. Ecology. doi: 10.1002/ecy.1506

Tuesday, September 20, 2016

The problematic effect of small effects