This is the point that Lemoine et al. make in an interesting new report in Ecology. Experimental data from natural systems (e.g. for warming experiments, BEF experiments) is often highly variable, has low replication, and effect sizes are frequently small. Perhaps it is not surprising we see contradictory outcomes, because data with small true effect sizes are prone to high Type S (reflect the chance of obtaining the wrong sign for an effect) and Type M (the amount by with an effect size must be overestimated in order to be significant). Contradictory results arise from these statistical issues, combined with the idea that papers that do get published early on may simply have found significant effects by chance (the Winner's Curse).
Power reflects the chance of failing to correctly reject the null hypothesis (Ho). The power of ecological experiments increases with sample size (N), since uncertainty in data decreases with increasing N. However, if your true effect size is small, studies with low power have to significantly overestimate the effect size to have a significant p-value. This is the result of the fact that if the variation in your data is large and your effect size is small, the critical value for a significant z-score is quite large. Thus for your results to be significant, you need to observe an effect larger than this critical value, which will be much larger than the true effect size. It's a catch-22 for small effect sizes: if your result is correct, it very well may not be significant; if you have a significant result, you may be overestimating the effect size.
|From Lemoine et al. 2016.|
Lemoine, N. P., Hoffman, A., Felton, A. J., Baur, L., Chaves, F., Gray, J., Yu, Q. and Smith, M. D. (2016), Underappreciated problems of low replication in ecological field studies. Ecology. doi: 10.1002/ecy.1506