Showing posts with label hypothesis testing. Show all posts
Showing posts with label hypothesis testing. Show all posts

Friday, January 20, 2017

True, False, or Neither? Hypothesis testing in ecology.

How science is done is the outcome of many things, from training (both institutional and lab specific), reviewers’ critiques and requests, historical practices, subdiscipline culture and paradigms, to practicalities such as time, money, and trends in grant awards. ‘Ecology’ is the emergent property of thousands of people pursuing paths driven by their own combination of these and other motivators. Not surprisingly, the path of ecology sways and stalls, and in response papers pop up continuing the decades old discussion about philosophy and best practices for ecological research.

A new paper from Betini et al. in the Royal Society Open Science contributes to this discussion by asking why ecologists don’t test multiple competing hypotheses (allowing efficient falsification or “strong inference” a la Popper). Ecologists rarely test multiple competing hypothesis test: Betini et al. found that only 21 of 100 randomly selected papers tested 2 hypotheses, and only 8 tested greater than 2. Multiple hypothesis testing is a key component of strong inference, and the authors hearken to Platt’s 1964 paper “Strong Inference” as to why ecologists should be adopting adopt strong inference. 
Platt
From Platt: “Science is now an everyday business. Equipment, calculations, lectures become ends in themselves. How many of us write down our alternatives and crucial experiments every day, focusing on the exclusion of a hypothesis? We may write our scientific papers so that it looks as if we had steps 1, 2, and 3 in mind all along. But in between, we do busywork. We become "method-oriented" rather than "problem-oriented." We say we prefer to "feel our way" toward generalizations.
[An aside to say that Platt was a brutally honest critic of the state of science and his grumpy complaints would not be out of place today. This makes reading his 1964 paper especially fun. E.g. “We can see from the external symptoms that there is something scientifically wrong. The Frozen Method. The Eternal Surveyor. The Never Finished. The Great Man With a Single Hypothesis. The Little Club of Dependents. The Vendetta. The All-Encompassing Theory Which Can Never Be Falsified.”]
Betini et al. list a number of common practical intellectual and practical biases that likely prevent researchers from using multiple hypothesis testing and strong inference. These range from confirmation bias and pattern-seeking to the fallacy of factorial design (which leads to unreasonably high replication requirements including of uninformative combinations). But the authors are surprisingly unquestioning about the utility of strong inference and multiple hypothesis testing for ecology. For example, Brian McGill has a great post highlighting the importance and difficulties of multi-causality in ecology - many non-trivial processes drive ecological systems (see also). 

Another salient point is that falsification of hypotheses, which is central to strong inference, is especially unserviceable in ecology. There are many reasons that an experimental result could be negative and yet not result in falsification of a hypothesis. Data may be faulty in many ways outside of our control, due to inappropriate scales of analyses, or because of limitations of human perception and technology. The data may be incomplete (for example, from a community that has not reached equilibrium); it may rely inappropriately on proxies, or there could be key variables that are difficult to control (see John A. Wiens' chapter for details). Even in highly controlled microcosms, variation arises and failures occur that are 'inexplicable' given our current ability to perceive and control the system.

Or the data might be accurate but there are statistical issues to be concerned about, given many effect sizes are small and replication can be difficult or limited. Other statistical issues can also make falsification questionable – for example, the use of p-values as the ‘falsify/don’t falsify’ determinant, or the confounding of AIC model selection with true multiple hypothesis testing.

Instead, I think it can be argued that ecologists have relied more on verification – accumulating multiple results supporting a hypothesis. This is slower, logically weaker, and undoubtedly results in mistakes too. Verification is most convincing when effect sizes are large – e.g. David Schindler’s lake 226, which provided a single and principal example of phosphorus supplementation causing eutrophication. Unfortunately small effect sizes are common in ecology. There also isn’t a clear process for dealing with negative results when a field has relied on verification - how much negative evidence is required to remove a hypothesis from use, versus just lead to caveats or modifications?

Perhaps one reason Bayesian methods are so attractive to many ecologists is that they reflect the modified approach we already use - developing priors based on our assessment of evidence in the literature, particularly verifications but also evidence that falsifies (for a better discussion of this mixed approach, see Andrew Gelman's writing). This is exactly where Betini et al.'s paper is especially relevant – intellectual biases and practical limitations are even more important outside of the strict rules of strong inference. It seems important as ecologists to address these biases as much as possible. In particular, better training in philosophical, ethical and methodological practices; priors, which may frequently be amorphous and internal, should be externalized using meta-analyses and reviews that express the state of knowledge in unbiased fashion; and we should strive to formulate hypotheses that are specific and to identify the implicit assumptions.

Monday, August 25, 2014

Researching ecological research

Benjamin Haller. 2014. "Theoretical and Empirical Perspectives in Ecology and Evolution: A Survey". BioScience; doi:10.1093/biosci/biu131.

Etienne Low-Décarie, Corey Chivers, and Monica Granados. 2014. "Rising complexity and falling explanatory power in ecology". Front Ecol Environ 2014; doi:10.1890/130230.

A little navel gazing is good for ecology. Although maybe it seems like it, ecology spends far less time evaluating its approach, compared to simply doing research. Obviously we can't spend all of our time navel-gazing, but the field as a whole would benefit greatly from ongoing conversations about its strength and weaknesses. 

For example, the issue of theory vs. empirical research. Although this issue has received attention and arguments ad nauseum over the years (including here, 1, 2, 3), it never completely goes away. And even though there are arguments that it's not an issue anymore, that everyone recognizes the need for both, if you look closely, the tension continues to exist in subtle ways. If you have participated in a mixed reading group did the common complaint “do we have to read so many math-y papers?" ever arise; or equally “do we have to read so many system specific papers and just critique the methods?” Theory and empirical research don't see eye to eye as closely as we might want to believe.

The good news? Now there is some data. Ben Haller did a survey on this topic that just came out in BioScience. This paper does the probably necessary task of getting some real data beyond the philosophical and argumentative about the theory/data debate. Firstly, he defines empirical research as being involved in the gathering and analysis of real world data, while theoretical research does not gather or analyze real world data, instead involves mathematical models, numerical simulations, and other such work. The survey included 614 scientists from related ecology and evolutionary biology fields, representing a global (rather North American) perspective.

The conclusions are short, sweet and pretty interesting: "(1) Substantial mistrust and tension exists between theorists and empiricists, but despite this, (2) there is an almost universal desire among ecologists and evolutionary biologists for closer interactions between theoretical and empirical work; however, (3) institutions such as journals, funding agencies, and universities often hinder such increased interactions, which points to a need for institutional reforms."
 
For interpreting the plots – the empirical group represents respondents whose research is completely or primarily empirical; the theoretical group's research is mostly or completely related to theory, while the middle group does work that falls equally into both types. Maybe the results don't surprise anyone – scientists still read papers, collaborate, and coauthor papers mostly with others of the same group. What is surprising is that this trend is particularly strong for the empirical group. For example, nearly 80% of theorists have coauthored a paper with someone in the empirical group while only 42% of empiricists have coauthored at least one paper with a theorist. Before we start throwing things at empiricists, it should be noted that this could relate to a relative scarcity of theoretical ecologists, rather than insularity on the part of the empiricists. However, it is interesting that while the responses to the question “how should theory and empiricism coexist together?” across all groups agreed that “theoretical work and empirical work would coexist tightly, driving each other in a continuing feedback loop”, empirical scientists were significantly more likely to say “work would primarily be data-driven; theory would be developed in response to questions raised by empiri­cal findings.”

Most important, and maybe concerning, is that the survey found no real effect of age, stage or gender – i.e. existing attitudes are deeply ingrained and show no sign of changing.

Why is it so important that we reconcile the theoretical/empirical issue? The paper “Rising complexity and falling explanatory power in ecology” offers a pretty compelling reason in its title. Ecological research is getting harder, and we need to marshall all the resources available to us to continue to progress. 

The paper suggests that ecological research is experiencing falling mean Rvalues. Values in published papers have fallen from above 0.75 prior to 1950 to below 0.5 in today's papers.
The worrying thing is that as a discipline progresses and improves, you might predict that the result is an improving ability to explain ecological phenomenon. For comparison, criminology was found to show no decline in R2 values as that matured through time. Why don’t we have that? 

During the same period, however, it is notable that the average complexity of ecological studies also increased – the number of reported p-values is 10x larger on average today compared to the early years (where usually only a single p-value relating to a single question was reported). 

The fall in R2 values and the rise in reported p-values could mean a number of things, some worse for ecology than others. The authors suggest that R2 values may be declining as a result of exhaustion of “easy” questions (“low hanging fruit”), increased effort in experiments, or a change in publication bias, for example. The low hanging fruit hypothesis may have some merit – after all, studies from before the 1950s were mostly population biology with a focus on a single species in a single place over a single time period. Questions have grown increasingly more complex, involving assemblages of species over a greater range of spatial and temporal scales. For complex sciences, this fits a common pattern of diminishing returns: “For example, large planets, large mammals, and more stable elements were discovered first”.

In some ways, ecologists lack a clear definition of success. No one would argue that ecology is less effective now than it was in the 1920s, for example, and yet a simplistic measure (R2) of success might suggest that ecology is in decline. Any biases between theorists and empiricists is obviously misplaced, in that any definition of success for ecology will require both.  

Monday, March 24, 2014

Debating the p-value in Ecology

It is interesting that p-values still garner so much ink: it says something about how engrained and yet embattled they are. This month’s Ecology issue explores the p-value problem with a forum of 10 new short papers* on the strengths and weaknesses, defenses and critiques, and various alternatives to “the probability (p) of obtaining a statistic at least as extreme as the observed statistic, given that the null hypothesis is true”.

The defense for p-values is lead by Paul Murtaugh, who provides the opening and closing arguments. Murtaugh, who has written a number of good papers about ecological data analysis and statistics, takes a pragmatic approach to null hypothesis testing and p-values. He argues p-values are not flawed so much as they are regularly and egregiously misused and misinterpreted. In particular, he demonstrates mathematically that alternative approaches to the p-value, particularly the use of confidence intervals or information theoretic criteria (e.g. AIC), simply present the same information as p-values in slightly different fashions. This is a point that the contribution by Perry de Valpine supports, noting that all of these approaches are simply different ways of displaying likelihood ratios and the argument that one is inherently superior ignores their close relationship. In addition, although acknowledging that cutoff values for significant p-values are logically problematic (why is a p-value of 0.049 so much more important than one of 0.051?), Murtaugh notes that cutoffs reflect decisions about acceptable error rates and so are not inherently meaningless. Further, he argues that dAIC cutoffs for model selection are no less arbitrary. 

The remainder of the forum is a back and forth argument about Murtaugh’s particular points and about the merits of the other authors’ chosen methods (Bayesian, information theoretic and other approaches are represented). It’s quite entertaining, and this forum is really a great idea that I hope will feature in future issues. Some of the arguments are philosophical – are p-values really “evidence” and is it possible to “accept” or “reject” a hypothesis using their values? Murtaugh does downplay the well-known problem that p-values summarize the strength of evidence against the null hypothesis, and do not assess the strength of evidence supporting a hypothesis or model. This can make them prone to misinterpretation (most students in intro stats want to say “the p-value supports the alternate hypothesis”) or else interpretation in stilted statements.

Not surprisingly, Murtaugh receives the most flak for defending p-values from researchers working in alternate worldviews like Bayesian and information-theoretic approaches. (His interactions with K. Burnham and D. Anderson (AIC) seem downright testy. Burnham and Anderson in fact start their paper “We were surprised to see a paper defending P values and significance testing at this time in history”...) But having this diversity of authors plays a useful role, in that it highlights that each approach has it's own merits and best applications. Null hypothesis testing with p-values may be most appropriate for testing the effects of treatments on randomized experiments, while AIC values are useful when we are comparing multi-parameter, non-nested models. Bayesian similarly may be more useful to apply to some approaches than others. This focus on the “one true path” to statistics may be behind some of the current problems with p-values: they were used as a standard to help make ecology more rigorous, but the focus on p-values came at the expense of reporting effect sizes, making predictions, or careful experimental design.

Even at this point in history, it seems like there is still lots to say about p-values.

*But not open-access, which is really too bad.