Thursday, September 7, 2017

Why is prediction not a priority in ecology?

When we learn about the scientific method, the focus is usually on hypothesis testing and deductive reasoning. Less time is spent on considering the various the outcomes of scientific research, specifically: description, understanding, and prediction. Description involves parsimoniously capturing data structure, and may use statistical methods such as PCA to reduce data complexity and identify important axes of variation. Understanding involves the explanation of phenomenon by identifying causal relationships (such as via parameter estimation in models). Finally, prediction involves estimating the values of new or future observations. Naturally, some approaches in ecology orient more closely toward one of these outcomes than others and some areas of research historically have valued one outcome over others. For example, applied approaches such as fisheries population models emphasize predictive accuracy (but even there, there are worries about limits on prediction). On the other hand, studies of biotic interactions or trophic structure typically emphasize identifying causal relationships. The focus in different subdisciplines no doubt owes something to culture and historical priority effects.

In various ways these outcomes feedback on each other – description can inform explanatory models, and explanatory models can be evaluated based on their predictions. In a recent paper in Oikos, Houlahan et al. discuss the tendency of many ecological fields to under-emphasize predictive approaches and instead focus on explanatory statistical models. They note that prediction is rarely at the centre of ecological research and that this may be limiting ecological progress. There are lots of interesting questions that ecologists should be asking, including what are the predictive horizons (spatial and temporal scales) over which predictive accuracy decays? Currently, we don't even know what a typical upper limit on model predictive ability is in ecology.

Although the authors argue for the primacy of prediction ["Prediction is the only way to demonstrate scientific understanding", and "any potentially useful model must make predictions about some unknown state of the natural world"], I think there is some nuance to be gained by recognizing that understanding and prediction are separate outcomes and that their relationship is not always straightforward (for a thorough discussion see Shmueli 2010). Ideally, a mutually informative feedback between explanation and prediction should exist, but it is also true that prediction can be useful and worthy for reasons that are not dependent on explanation and vice versa. Further, to understand why and where prediction is limited or difficult, and what is required to correct this, it is useful to consider it separately from explanation.

Understanding/explanation can be valuable and inspire further research, even if prediction is impossible. The goal of explanatory models is to have the model [e.g., f(x)] match as closely as possible the actual mechanism [F(x)]. A divergence between understanding and prediction can naturally occur when there is a difference between concepts or theoretical constructs and our ability to measure them. In physics, theories explaining phenomenon may arise many years before they can actually be tested (e.g. gravitational waves). Even if useful causal models are available, limitations on prediction can be present: in particle physics, the Heisenberg uncertainty principle identifies limits on the precision at which you can know both the position of a particle and its momentum. In ecology, a major limitation to prediction may simply be data availability. In a similar field (meteorology) in which many processes are important and nonlinearities common, predictions require massive data inputs (frequently collected over near continuous time) and models that can be evaluated only via supercomputers. We rarely collect biotic data at those scales in ecology. We can still gain understanding if predictions are impossible, and hopefully eventually the desire to make predictions will motivate the development of new methods or data collection. In many ecological fields, it might be worth thinking about what can be done in the future to enable predictions, even if they aren't really possible right now.

Approaches that emphasize prediction frequently improve understanding, but this is not necessarily true either. Statistically, understanding can come at the cost of predictive ability. Further, a predictive model may provide accurate predictions, but do so using collinear or synthetic variables that are hard to interpret. For example, a macroecological relationship between temperature and diversity may effectively predict diversity in a new habitat, and yet do little on its own to identify specific mechanisms. Prediction does not require interpretability or explanatory ability, as is clear from papers such as "Model-free forecasting outperforms the correct mechanistic model for simulated and experimental data". So it's worth being wary of the idea that a predictive model is necessarily 'better'.

With this difference between prediction and understanding in mind, it is perhaps easier to understand why ecologists have lagged in prediction. For a long time, statistical approaches used in ecology were biased toward those meant to improve understanding, such as regression models, where parameters estimate the strength and direction of a relationship. This is partially responsible for our obsession with p-values and R^2 terms. What Houlahan et al. do a great job of emphasizing is that by ignoring prediction as a goal, researchers are often limiting their ability confirm their understanding. Predictions that are derived from explanatory models Some approaches in ecology have already moved naturally towards emphasizing prediction, especially SDMs/ecological niche models. They recognized that it was not enough to describe species-environment relationships; testing predictions allowed them to determine how universal and mechanistic these relationships actually were. A number of macroecological models fit nicely with predictive statistical approaches, and could adopt Houlahan’s suggestions quite readily (e.g. reporting measures of predictive ability and testing models on withheld data). But for some approaches, the search for mechanism is so deeply integrated into how they approach science that it will take longer and be more difficult (but not impossible)*. Even for these areas, prediction is a worthy goal, just not necessarily an easy one. 

*I was asked for examples of 'unpredictable' areas of ecology. This may be pessimistic, but I think that something like accurately predicting the composition (both species' abundance and identity) of diverse communities at small spatial scales might always be difficult, especially given the temporal dynamics. But I could be wrong! 

...if the Simpsons could predict Trump, I suppose there's hope for ecologists too...
**This has been edited to correctly spell the author's name.


Jeff Houlahan said...

Hi Caroline, just responding to your post about prediction and understanding. You make a couple of points that I think are absolutely true and important to emphasize. First, that there are models that make good predictions but add little to our understanding. The example of collinearity is a good one but another is the use of deep learning approaches for modeling - this is going to allow for models that make very good predictions but that will be very difficult/impossible for us to interpret. We end up in a situation where the model understands the world very well but we don't understand the model. Here, prediction may be valuable without contributing much to understanding. Second, that prediction and understanding are not the same thing - causal links for models that make good predictions are going to have to be demonstrated (with controlled experiments, structural equation modeling a la Judea Pearl, or from first principles) for us to be confident we understand the world.

But, a key point that often goes underemphasized is that, though prediction and understanding may be different, the only way to demonstrate understanding is through prediction. The statement “We can still gain understanding if predictions are impossible…” is one that I am convinced isn’t true. If you can’t show me that your model captures how the world works through its predictive ability (and I don’t know any other way that you could show that you have captured how the world works) then why would I believe your claim over any other claim that comes without empirical evidence? That doesn’t mean scientists can’t develop theoretical models that may not be able to be tested until sometime in the future – but we shouldn’t make the mistake of believing those models have increased our understanding of how the world works. We’ll only be able to assess that when we can test their predictions against observations. The inability to demonstrate understanding is, in my opinion, the same as not having it.

Sure, there may be very good reasons why we can’t make good predictions (lack of data, complexity of processes etc.) but the extent to which these issues limit our predictive ability they also limit our ability to demonstrate our understanding. When we say that lack of data or complexity of processes limits our ability to make good predictions we are actually saying that lack of data or complexity of processes are limiting our ability to demonstrate we understand the natural world. And if we can’t demonstrate understanding why would we believe we have it? So, in a nutshell,

1. Predictive ability doesn’t necessarily imply understanding (for that you need causal links and you need to understand your model) but,
2. Understanding absolutely implies good predictive ability

I would say that anybody who tells us they understand some natural process very well but can’t make good predictions is either fooling themselves or trying to fool us. Thanks for keeping the conversation going…I think it’s an important one. Best, Jeff.

Geoff Legault said...

Jeff, if I'm interpreting you correctly, you're arguing understanding requires good predictive ability. I am inclined to believe that good predictive ability is absolutely something we can use to assess whether there is understanding (perhaps it is the best way to assess this), but I'm not sure it's necessary or, in some cases, even possible.

Three thoughts/ideas to support this:
- If understanding requires good prediction, then it is impossible to "understand" something that isn't a continuing process. For example, there would be no way a historian - however talented and with whatever resources they needed - could understand the factors that led to World War I. Of course one could say that "understanding history" is different than "understanding science", but I'm not sure historians would agree and, moreover, one would have to do a lot of work to explain why understanding a past process is fundamentally different than understanding a continuing process.

- Hume's unsolved problem of induction. In short, there is (curently) no argument you can make for why the future should resemble the past. As a result, why should we expect the physics or biology of the past to tell us anything meaningful about the future?

- "Good predictive ability" likely has different meanings for different processes and things start to get really tricky when you factor that with the idea of randomness or stochasticity in biology.
For example, if I was predicting the number of offspring produced by a single Daphnia, I could assess the quality of my prediction by plotting a Poisson distribution with my predicted mean and see how well it compared to the actual number of offspring laid. Presumably if the actual number of offspring laid was close-ish to the mean of my Poisson distribution, I'd feel comfortable that I made a good prediction (and if i was doing this formally, I'd probably calculate a likelihood).
But if I was trying to predict, say, the number of Daphnia in a metapopulation after many births, deaths, and migration events, not only would it take quite a lot of work and replicates to figure out what synthetic or compound distributions to use to test the quality of my prediction, but even if I had all those things already, the likelihood of my prediction given the real data would be quite low and perhaps not even that much different than other, less informed, predictions.
In short: (1) It might be very difficult, if not impossible, to assess prediction quality if the process you're trying to predict is highly stochastic (either on its own or because it's the outcome of a combination of stochastic processes); and (2) If it could be assessed, the likelihood of a "good" prediction given even a modest amount of real data could be very low (e.g. 1%) and, furthermore, the difference in the likelihood of a "good" and "bad" prediction could be on the order of a few percentage points (or much lower).
All this to say: the phrase "good predictive ability" may not be especially meaningful if the world is stochastic.

Good to hear your perspective!



Jeff Houlahan said...

Hi Geoff, I’ll take a run at these one at a time - and let me begin by saying you’re pushing me to think about this more deeply.

So, there are different forms of understanding and, here, I mean understanding of how the world works. So, yes, it’s understanding of ongoing processes. That said, let’s think about understanding what caused WW 1 – I would say that there is no way to demonstrate that you understand what caused a single event that occurred in the past. On the other hand, having a theory that makes predictions about what events cause wars is something we could test and if the predictions from the “theory of wars” are consistent with explanation for what caused WW 1, it would be evidence in favour of the explanation. I actually don’t think it’s hard to explain why understanding of a single event from the past (WW 1) is different from an ongoing process (wars) – there is no way to test a hypothesis about what caused WW 1 if the hypothesis is generated entirely from information about WW 1. Tests of hypotheses are only legitimate if the hypothesis is created a priori – when you come up with a hypothesis after the event, you can’t use the past event to test the hypothesis. When you develop a hypothesis about what caused WW 1 based on information about WW 1 you can’t test it against WW 1.

Your second point actually implies that all the events we see around us are caused by processes that constantly changing in entirely unpredictable ways. In my opinion, this is support for our point of view – if Hume is right, we can’t make good predictions because we have no understanding of how the world works. Put simply, if the past tells us nothing about the future then there is no way to understand the world because the only way we understand the world is through human experience – by definition, the past telling us about the future. Understanding relies on the past telling us about the future.
And let’s face it – in a bunch of cases we know with certainty that the past tells us about the future. Hydroelectric dams work because gravity affects water the same now as it did in the past. But the key is – if the past doesn’t tell us about the future there is no way to understand how the world works and our inability to make good predictions will arise from this lack of understanding.

The issue of stochasticity is both a philosophical concern and a practical one but neither, in my opinion, detracts from the central point that prediction is the only way to demonstrate understanding. First, there are lots of smart people who don’t believe that true stochasticity exists – that what ecologists call stochasticity is simply the sum of all the things we don’t understand. The only empirical evidence for ‘true stochasticity” that I’ve seen is Heisenberg’s principle and I don’t think anybody’s provided evidence that quantum-level stochasticity results in true stochasticty at the organismal or population-level. In your example about the Daphnia, you may simply be describing a highly complex system. But not necessarily a truly stochastic one. True stochasticity, to me, implies that two systems that begin at exactly the same point and experience the exact same conditions for their entire existence can end up in different places. And we can’t make good predictions because we don’t understand it very well. It may be that it is impossible to ever understand it very well because it is so complex. But that doesn’t take away from the central point – we can’t make good predictions because we don’t understand it.

But let’s imagine that there is true stochasticity…that doesn’t take away from the central point – it simply implies that perfect predictions will be impossible. It will set a ceiling on predictive ability. But, it doesn’t change the fact that the only way to demonstrate understanding is with prediction. And the only way to measure understanding is with predictive ability.

So, I’m still convinced our assertion is true – but, it’s been fun having to think hard about it.

Best, Jeff H

Geoff Legault said...

Yes, fun to talk and think about. Appreciate your thoughtful response!