Friday, January 20, 2017

True, False, or Neither? Hypothesis testing in ecology.

How science is done is the outcome of many things, from training (both institutional and lab specific), reviewers’ critiques and requests, historical practices, subdiscipline culture and paradigms, to practicalities such as time, money, and trends in grant awards. ‘Ecology’ is the emergent property of thousands of people pursuing paths driven by their own combination of these and other motivators. Not surprisingly, the path of ecology sways and stalls, and in response papers pop up continuing the decades old discussion about philosophy and best practices for ecological research.

A new paper from Betini et al. in the Royal Society Open Science contributes to this discussion by asking why ecologists don’t test multiple competing hypotheses (allowing efficient falsification or “strong inference” a la Popper). Ecologists rarely test multiple competing hypothesis test: Betini et al. found that only 21 of 100 randomly selected papers tested 2 hypotheses, and only 8 tested greater than 2. Multiple hypothesis testing is a key component of strong inference, and the authors hearken to Platt’s 1964 paper “Strong Inference” as to why ecologists should be adopting adopt strong inference. 
From Platt: “Science is now an everyday business. Equipment, calculations, lectures become ends in themselves. How many of us write down our alternatives and crucial experiments every day, focusing on the exclusion of a hypothesis? We may write our scientific papers so that it looks as if we had steps 1, 2, and 3 in mind all along. But in between, we do busywork. We become "method-oriented" rather than "problem-oriented." We say we prefer to "feel our way" toward generalizations.
[An aside to say that Platt was a brutally honest critic of the state of science and his grumpy complaints would not be out of place today. This makes reading his 1964 paper especially fun. E.g. “We can see from the external symptoms that there is something scientifically wrong. The Frozen Method. The Eternal Surveyor. The Never Finished. The Great Man With a Single Hypothesis. The Little Club of Dependents. The Vendetta. The All-Encompassing Theory Which Can Never Be Falsified.”]
Betini et al. list a number of common practical intellectual and practical biases that likely prevent researchers from using multiple hypothesis testing and strong inference. These range from confirmation bias and pattern-seeking to the fallacy of factorial design (which leads to unreasonably high replication requirements including of uninformative combinations). But the authors are surprisingly unquestioning about the utility of strong inference and multiple hypothesis testing for ecology. For example, Brian McGill has a great post highlighting the importance and difficulties of multi-causality in ecology - many non-trivial processes drive ecological systems (see also). 

Another salient point is that falsification of hypotheses, which is central to strong inference, is especially unserviceable in ecology. There are many reasons that an experimental result could be negative and yet not result in falsification of a hypothesis. Data may be faulty in many ways outside of our control, due to inappropriate scales of analyses, or because of limitations of human perception and technology. The data may be incomplete (for example, from a community that has not reached equilibrium); it may rely inappropriately on proxies, or there could be key variables that are difficult to control (see John A. Wiens' chapter for details). Even in highly controlled microcosms, variation arises and failures occur that are 'inexplicable' given our current ability to perceive and control the system.

Or the data might be accurate but there are statistical issues to be concerned about, given many effect sizes are small and replication can be difficult or limited. Other statistical issues can also make falsification questionable – for example, the use of p-values as the ‘falsify/don’t falsify’ determinant, or the confounding of AIC model selection with true multiple hypothesis testing.

Instead, I think it can be argued that ecologists have relied more on verification – accumulating multiple results supporting a hypothesis. This is slower, logically weaker, and undoubtedly results in mistakes too. Verification is most convincing when effect sizes are large – e.g. David Schindler’s lake 226, which provided a single and principal example of phosphorus supplementation causing eutrophication. Unfortunately small effect sizes are common in ecology. There also isn’t a clear process for dealing with negative results when a field has relied on verification - how much negative evidence is required to remove a hypothesis from use, versus just lead to caveats or modifications?

Perhaps one reason Bayesian methods are so attractive to many ecologists is that they reflect the modified approach we already use - developing priors based on our assessment of evidence in the literature, particularly verifications but also evidence that falsifies (for a better discussion of this mixed approach, see Andrew Gelman's writing). This is exactly where Betini et al.'s paper is especially relevant – intellectual biases and practical limitations are even more important outside of the strict rules of strong inference. It seems important as ecologists to address these biases as much as possible. In particular, better training in philosophical, ethical and methodological practices; priors, which may frequently be amorphous and internal, should be externalized using meta-analyses and reviews that express the state of knowledge in unbiased fashion; and we should strive to formulate hypotheses that are specific and to identify the implicit assumptions.


BenK said...

Some of Platt's complaints are simply a misunderstanding of scale.

"The Frozen Method. The Eternal Surveyor. The Never Finished. The Great Man With a Single Hypothesis. The Little Club of Dependents. The Vendetta. The All-Encompassing Theory Which Can Never Be Falsified."

Of these, how many are really something wrong with science, and how many are simply the result of science being done on a larger time scale than individual careers, or in a degree of specialization that requires more than one person to accomplish a scientific goal?

Caroline Tucker said...

Maybe a little of A and a little of B? I think that Plat had a pretty single minded vision of what scientists 'should' look like, and he definitely ignores scale in his criticisms. For example, he highlights Louis Pasteur as this ideal scientist who solved multiple problems across multiple fields with his impeccable logic. But as you say, Pasteur built on longer term work and was able to bring several areas of expertise together (and it seems that Pasteur was not entirely without his own dogmatism).

And maybe, a lifetime using a single method proves to be useful for the field. But it would be nice to think that this happened because the frozen methodologist evaluated and recognized their contribution in the larger context of the field, and weren't simply unwilling to adapt as new methods made theirs obsolete. I think this quote is focused on the problem of human biases, but perhaps I'm wrong.

Jeremy Fox said...

Re: the possibility of science being done on a larger scale than individual papers, that's sometimes the case. I'm thinking for instance of the classic work of Ed McCauley and colleagues, testing 5 different hypotheses for absence of "paradox of enrichment" cycles in Daphnia across several papers over a number of years. But honestly, I think that sort of thing is the exception rather than the rule. Think for instance of research on character displacement and bet hedging--two topics on which there are published, widely-accepted checklists of what needs to be shown in order to demonstrate the phenomenon and rule out alternative possibilities. For both topics, the easiest boxes to check off have been checked off many times by many researchers in many systems. The hardest boxes to check off have hardly ever been checked off. Further, in the case of character displacement research, the rate at which researchers check off the more difficult boxes hasn't even increased since the original checklist was published. Even though the whole point of the checklist was to make clear where the knowledge gaps were.

My admittedly cynical (and deliberately provocative) inference is that ecologists are mostly system-first researchers. They're happy to collect some easy-to-collect evidence consistent with some general hypothesis or question like "character displacement" if they can get a paper out of it. But then rather than keep going and really nail whether character displacement or bet hedging or whatever is occurring in their system, they'd rather move on and go pick some other low hanging fruit instead. Do something easy to get evidence consistent with some other general hypothesis or concept. Something to do with functional trait diversity, say. And then when everybody's picked that low-hanging fruit in their own system and that bandwagon is played out, they'll all go and jump on the next one. Lather, rinse, repeat.

I'm very glad to see this paper because I was literally just about to embark on the same exercise as background research for my book. Now I don't have to! :-) As a follow-up, I'm going to look at how often Mercer Award-winning papers have multiple working hypotheses and how many predictions per hypothesis they test (also how many assumptions per hypothesis). I bet they will score better on average than the papers reviewed by Bentini et al. I may also try to do the same exercise in leading evolution journals, as it's my anecdotal sense that in evolutionary biology papers tightly linked to theory, and testing theory very thoroughly, are more common than in ecology. But perhaps my anecdotal sense is wrong; anecdotal senses often are...

Caroline Tucker said...

Jeremy - I completely agree that we repeat lots of easy tests and the leave the hard things undone.

But, luckily, I don't think that the discipline is just systems-people jumping on bandwagons. That's certainly not uncommon, but there are also people people who were there when the bandwagon started, who stay on after the bandwagon has crashed and continue to do the thoughtful work and work on the difficult questions.

Perhaps more focus on formulating 'check-lists' for demonstrating phenomenon would be particularly useful for ecologists..

Brian said...

Although ecologists all love Platt when they read it, I think it is especially telling that Platt took most of his examples from molecular and cell biology in its early days. The questions that he highlights as having decisive tests are really very basic (is DNA the carrier of genetic information). Even Platt's favorite field of molecular/cellular biology has increasingly had to work in a multicausal world as the field has advanced. And multiple hypotheses and falsification doesn't work as well there now either. I am constantly reminded of a colleague at Maine who works on developmental biology in zebra fish. Apparently the fish at Maine are unusually happy/well-cared for and there are many results published from other labs on the same zebra fish that they cannot reproduce. Lakatos, Popper's graduate student, argued that science is full of apparent falsifications where people plow on despite the falsification and are proved correct eventually. H. pylori as the cause of ulcers is a famous one - early published experiments were all negative, yet the result is now so well established that it won a Nobel Prize and has fundamentally changed the action of doctors all over the world.

I do think your idea of checklists is a good one. When I do a philosophy of science unit at the start of my grad stats class I always talk about Koch's Postulates (test for confirming a specific microbe causes a specific disease). We have developed a few such checklists in ecology, but not many.

Jeremy Fox said...

So, does this mean you've changed your mind about ecologists only being able to come up with such checklists after they've already been pursuing a research program for a while and making a lot of mistakes? (I'm recalling this old post and comments: Or are you just suggesting that we more often need to follow up the early false starts and blind alleys of a research program by converting that hard-won wisdom into checklists?

Anyway, if we're all agreed that Koch's postulates-style checklists are really useful, how come we only have them for character displacement and bet hedging, and how come even the checklists we have don't seem to prompt more people to tick all the boxes? It's puzzling. I mean, a checklist like the one for character displacement or bet hedging is basically step-by-step instructions for doing a high-profile paper reporting a future textbook example. There would be a *lot* of professional reward to coming up with a new example of, say, character displacement that checked all the boxed--but apparently that incentive isn't sufficient? Is that because the rewards for checking only some of the easy-to-check boxes also are reasonably high? Is it just because some of the boxes are all but impossible to check? Or what?

Jeremy Fox said...

Still on checklists: There's also the Bradford Hill criteria for showing causality in public health epidemiology: It would be interesting to translate them to ecology.