The EEB & Flow: prediction

When we learn about the scientific method, the focus is usually on hypothesis testing and deductive reasoning. Less time is spent on considering the various the outcomes of scientific research, specifically: description, understanding, and prediction. Description involves parsimoniously capturing data structure, and may use statistical methods such as PCA to reduce data complexity and identify important axes of variation. Understanding involves the explanation of phenomenon by identifying causal relationships (such as via parameter estimation in models). Finally, prediction involves estimating the values of new or future observations. Naturally, some approaches in ecology orient more closely toward one of these outcomes than others and some areas of research historically have valued one outcome over others. For example, applied approaches such as fisheries population models emphasize predictive accuracy (but even there, there are worries about limits on prediction). On the other hand, studies of biotic interactions or trophic structure typically emphasize identifying causal relationships. The focus in different subdisciplines no doubt owes something to culture and historical priority effects.

In various ways these outcomes feedback on each other – description can inform explanatory models, and explanatory models can be evaluated based on their predictions. In a recent paper in Oikos, Houlahan et al. discuss the tendency of many ecological fields to under-emphasize predictive approaches and instead focus on explanatory statistical models. They note that prediction is rarely at the centre of ecological research and that this may be limiting ecological progress. There are lots of interesting questions that ecologists should be asking, including what are the predictive horizons (spatial and temporal scales) over which predictive accuracy decays? Currently, we don't even know what a typical upper limit on model predictive ability is in ecology.

Although the authors argue for the primacy of prediction ["Prediction is the only way to demonstrate scientific understanding", and "any potentially useful model must make predictions about some unknown state of the natural world"], I think there is some nuance to be gained by recognizing that understanding and prediction are separate outcomes and that their relationship is not always straightforward (for a thorough discussion see Shmueli 2010). Ideally, a mutually informative feedback between explanation and prediction should exist, but it is also true that prediction can be useful and worthy for reasons that are not dependent on explanation and vice versa. Further, to understand why and where prediction is limited or difficult, and what is required to correct this, it is useful to consider it separately from explanation.

Understanding/explanation can be valuable and inspire further research, even if prediction is impossible. The goal of explanatory models is to have the model [e.g., f(x)] match as closely as possible the actual mechanism [F(x)]. A divergence between understanding and prediction can naturally occur when there is a difference between concepts or theoretical constructs and our ability to measure them. In physics, theories explaining phenomenon may arise many years before they can actually be tested (e.g. gravitational waves). Even if useful causal models are available, limitations on prediction can be present: in particle physics, the Heisenberg uncertainty principle identifies limits on the precision at which you can know both the position of a particle and its momentum. In ecology, a major limitation to prediction may simply be data availability. In a similar field (meteorology) in which many processes are important and nonlinearities common, predictions require massive data inputs (frequently collected over near continuous time) and models that can be evaluated only via supercomputers. We rarely collect biotic data at those scales in ecology. We can still gain understanding if predictions are impossible, and hopefully eventually the desire to make predictions will motivate the development of new methods or data collection. In many ecological fields, it might be worth thinking about what can be done in the future to enable predictions, even if they aren't really possible right now.

Approaches that emphasize prediction frequently improve understanding, but this is not necessarily true either. Statistically, understanding can come at the cost of predictive ability. Further, a predictive model may provide accurate predictions, but do so using collinear or synthetic variables that are hard to interpret. For example, a macroecological relationship between temperature and diversity may effectively predict diversity in a new habitat, and yet do little on its own to identify specific mechanisms. Prediction does not require interpretability or explanatory ability, as is clear from papers such as "Model-free forecasting outperforms the correct mechanistic model for simulated and experimental data". So it's worth being wary of the idea that a predictive model is necessarily 'better'.

With this difference between prediction and understanding in mind, it is perhaps easier to understand why ecologists have lagged in prediction. For a long time, statistical approaches used in ecology were biased toward those meant to improve understanding, such as regression models, where parameters estimate the strength and direction of a relationship. This is partially responsible for our obsession with p-values and R^2 terms. What Houlahan et al. do a great job of emphasizing is that by ignoring prediction as a goal, researchers are often limiting their ability confirm their understanding. Predictions that are derived from explanatory models Some approaches in ecology have already moved naturally towards emphasizing prediction, especially SDMs/ecological niche models. They recognized that it was not enough to describe species-environment relationships; testing predictions allowed them to determine how universal and mechanistic these relationships actually were. A number of macroecological models fit nicely with predictive statistical approaches, and could adopt Houlahan’s suggestions quite readily (e.g. reporting measures of predictive ability and testing models on withheld data). But for some approaches, the search for mechanism is so deeply integrated into how they approach science that it will take longer and be more difficult (but not impossible)*. Even for these areas, prediction is a worthy goal, just not necessarily an easy one.

*I was asked for examples of 'unpredictable' areas of ecology. This may be pessimistic, but I think that something like accurately predicting the composition (both species' abundance and identity) of diverse communities at small spatial scales might always be difficult, especially given the temporal dynamics. But I could be wrong!

...if the Simpsons could predict Trump, I suppose there's hope for ecologists too...

**This has been edited to correctly spell the author's name.

There is an ongoing debate about the role of wolves in altering ecosystem dynamics in Yellowstone, which has stimulated a number of recent papers, and apparently inspired an editorial in Nature. Entitled “An Elegant Chaos”, the editorial reads a bit like an apology for ecology’s failure at prediction, suggesting that we should embrace ecology’s lack of universal laws and recognize that “Ecological complexity, which may seem like an impenetrable thicket of nuance, is also the source of much of our pleasure in nature”.

Most of the time, I also fall squarely into the pessimistic “ecological complexity limits predictability” camp. And concerns about prediction in ecology are widespread and understandable. But there is also something frustrating about the way we so often approach ecological prediction. Statements such as “It would be useful to have broad patterns and commonalities in ecology” feel incomplete. Is it that we really lack “broad patterns and commonalities in ecology”, or has ecology adopted a rather precise and self-excoriating definition for “prediction”?

We are fixated on achieving particular forms of prediction (either robust universal relationships, or else precise and specific numerical outputs), and perhaps we are failing at achieving these. But on the other hand, ecology is relatively successful in understanding and predicting qualitative relationships, especially at large spatial and temporal scales. At the broadest scales, ecologists can predict the relationships between species numbers and area, between precipitation, temperature and habitat type, between habitat types and the traits of species found within, between productivity and the general number of trophic levels supported. Not only do we ignore this foundation of large-scale predictable relationships, but we ignore the fact that prediction is full of tradeoffs. As a paper with the excellent title, “The good, the bad, and the ugly of predictive science” states, any predictive model is still limited by tradeoffs between: “robustness-to-uncertainty, fidelity-to-data, and confidence-in-prediction…. [H]igh-fidelity models cannot…be made robust to uncertainty and lack-of-knowledge. Similarly, equally robust models do not provide consistent predictions, hence reducing confidence-in-prediction. The conclusion of the theoretical investigation is that, in assessing the predictive accuracy of numerical models, one should never focus on a single aspect.” Different types of predictions have different limitations. But sometimes it seems that ecologists want to make predictions in the purest, trade-off free sense - robustness-to-uncertainty, fidelity-to-data, and confidence-in-prediction - all at once.

In relation to this, ecological processes tend to be easier to represent in a probabilistic fashion, something that we seem rather uncomfortable with. Ecology is predictive in the way medicine is predictive – we understand the important cause and effect relationships, many of the interactions that can occur, and we can even estimate the likelihood of particular outcomes (of smoking causing lung cancer, of warming climate decreasing diversity), but predicting how a human body or ecosystem will change is always inexact. The complexity of multiple independent species, populations, genes, traits, all interacting with similarly changing abiotic conditions makes precise quantitative predictions at small scales of space or time pretty intractable. So maybe that shouldn’t be our bar for success. The analogous problem for an evolutionary biologist would be to predict not only a change in population genetic structure but also the resulting phenotypes, accounting for epigenetics and plasticity too. I think that would be considered unrealistic, so why is that where we place the bar for ecology?

In part the bar for prediction is set so high because the demand for ecological knowledge, given habitat destruction, climate change, extinction, and a myriad of other changes, is so great. But in attempting to fulfill that need, it may be worth acknowledging that predictions in ecology occur on a hierarchy from those relationships at the broadest scale that we can be most certain about, moving down to the finest scale of interactions and traits and genes where we may be less certain. If we see events as occurring with different probabilities, and our knowledge of those probability distributions declining the farther down that hierarchy we travel, then our predictive ability will decline as well. New and additional research adds to the missing or poor relationships, but at the finest scales, prediction may always be limited.

Thursday, September 7, 2017

Why is prediction not a priority in ecology?

Monday, March 17, 2014

How are we defining prediction in ecology?