Tuesday, September 26, 2017

When do descriptive methods exceed the sum of their points?

The last post here mused on the connection between (but also, distinctness of) the scientific goals of "understanding" and "prediction". An additional goal of science is "description", the attempt to define and classify phenomenon. Much as understanding and prediction are distinct but interconnected, it can be difficult to separate research activities between description and understanding. Descriptive research is frequently considered preliminary or incomplete on its own, meant to be an initial step prior to further analysis. (On the other hand, the decline of more descriptive approaches such as natural history is often bemoaned). With that in mind, it was interesting to see several recent papers in high-impact journals that rely primarily on descriptive methods (especially ordinations) to provide generalizations. It's fairly uncommon to see ordination plots as the key figure in journals like Nature or The American Naturalist, and it opens up the question of 'when do descriptive methods exceed description and provide new insights & understanding?'

For example, Diaz et al.'s 2016 Nature paper took advantage of a massive database of trait data (from ~46000 species) to explore the inter-relationships between 6 ecologically relevant plant traits. The resulting PCA plot (figure below) illustrates, across many species, that well-known tradeoffs between a) organ size and scaling and b) the tissue economic spectrum appear fairly universal. Variation in plant form and function may be huge, but the Diaz et al. ordination highlights that it still is relatively constrained, and that many strategies (trait combinations) are apparently untenable.

From Diaz et al. 2016.
Similarly, a new paper in The American Naturalist relies on ordination methods  to try to identify 'a periodic table of niches' of lizards (Winemiller et al. 2015 first presented this idea) – i.e. a classification framework capturing the minimal, clarifying set of universal positions taken by a set of taxa. Using the data and expert knowledge on lizard species collected over a lifetime of research by E. Pianka and L. Vitt, Pianka et al. (2017) first determine the most important life history axes -- habitat, diet, life history, metabolism, and defense attributes. They use PCoA to calculate the position of each of 134 species in terms of each of the 5 life history axes, and then combined the separate axes into a single ordination (see figure below). This ordination highlights that niche convergence (distant relatives occupy very similar niche space) and niche conservation (close relatives occupy very similar niche space) are both common outcomes of evolution. (For more discussion, this piece from Jonathon Losos is a great). Their results are less clarifying than those in Diaz et al. (2016): a key reason may simply be the smaller size of Pianka et al.'s data set and its greater reliance on descriptive (rather than quantitative) traits.

From Winemiller et al. 2017

Finally, a new TREE paper from Daru et al. (In press) attempts to identify some of the processes underlying the formation of regional assemblages (what they call phylogenetic regionalization, e.g. distinct phylogenetically delimited biogeographic units). They similarly rely on ordinations to take measurements of phylogenetic turnover and then identify clusters of phylogenetically similar sites. Daru et al.'s paper is slightly different, in that rather than presenting insights from descriptive methods, it provides a descriptive method that they feel will lead to such insights.

Part of this blip of descriptive results and methods may be related to a general return to the concept of multidimensional or hypervolume niche (e.g. 1, 2). Models are much more difficult in this context and so description is a reasonable starting point. In addition, the most useful descriptive approaches are like those seen here - where new data or a lot of data (or new techniques that can transform existing data) - are available. In these cases, they provide a route to identifying generalization. (This also leads to an interesting question – are these kind of analyses simply brute force solutions to generalization? Or do descriptive results sometimes exceed the sum of their individual data points?)

Díaz S, Kattge J, Cornelissen JH, Wright IJ, Lavorel S, Dray S, Reu B, Kleyer M, Wirth C, Prentice IC, Garnier E. (2016). The global spectrum of plant form and function. Nature. 529(7585):167.

Eric R. Pianka, Laurie J. Vitt, Nicolás Pelegrin, Daniel B. Fitzgerald, and Kirk O.Winemiller. (2017). Toward a Periodic Table of Niches, or Exploring the Lizard Niche Hypervolume. The American Naturalist. https://doi.org/10.1086/693781

Barnabas H. Daru, Tammy L. Elliott,  Daniel S. Park, T. Jonathan Davies. (
In press). Understanding the Processes Underpinning Patterns of Phylogenetic Regionalization. TREE. DOI: http://dx.doi.org/10.1016/j.tree.2017.08.013

Thursday, September 7, 2017

Why is prediction not a priority in ecology?

When we learn about the scientific method, the focus is usually on hypothesis testing and deductive reasoning. Less time is spent on considering the various the outcomes of scientific research, specifically: description, understanding, and prediction. Description involves parsimoniously capturing data structure, and may use statistical methods such as PCA to reduce data complexity and identify important axes of variation. Understanding involves the explanation of phenomenon by identifying causal relationships (such as via parameter estimation in models). Finally, prediction involves estimating the values of new or future observations. Naturally, some approaches in ecology orient more closely toward one of these outcomes than others and some areas of research historically have valued one outcome over others. For example, applied approaches such as fisheries population models emphasize predictive accuracy (but even there, there are worries about limits on prediction). On the other hand, studies of biotic interactions or trophic structure typically emphasize identifying causal relationships. The focus in different subdisciplines no doubt owes something to culture and historical priority effects.

In various ways these outcomes feedback on each other – description can inform explanatory models, and explanatory models can be evaluated based on their predictions. In a recent paper in Oikos, Houlahan et al. discuss the tendency of many ecological fields to under-emphasize predictive approaches and instead focus on explanatory statistical models. They note that prediction is rarely at the centre of ecological research and that this may be limiting ecological progress. There are lots of interesting questions that ecologists should be asking, including what are the predictive horizons (spatial and temporal scales) over which predictive accuracy decays? Currently, we don't even know what a typical upper limit on model predictive ability is in ecology.

Although the authors argue for the primacy of prediction ["Prediction is the only way to demonstrate scientific understanding", and "any potentially useful model must make predictions about some unknown state of the natural world"], I think there is some nuance to be gained by recognizing that understanding and prediction are separate outcomes and that their relationship is not always straightforward (for a thorough discussion see Shmueli 2010). Ideally, a mutually informative feedback between explanation and prediction should exist, but it is also true that prediction can be useful and worthy for reasons that are not dependent on explanation and vice versa. Further, to understand why and where prediction is limited or difficult, and what is required to correct this, it is useful to consider it separately from explanation.

Understanding/explanation can be valuable and inspire further research, even if prediction is impossible. The goal of explanatory models is to have the model [e.g., f(x)] match as closely as possible the actual mechanism [F(x)]. A divergence between understanding and prediction can naturally occur when there is a difference between concepts or theoretical constructs and our ability to measure them. In physics, theories explaining phenomenon may arise many years before they can actually be tested (e.g. gravitational waves). Even if useful causal models are available, limitations on prediction can be present: in particle physics, the Heisenberg uncertainty principle identifies limits on the precision at which you can know both the position of a particle and its momentum. In ecology, a major limitation to prediction may simply be data availability. In a similar field (meteorology) in which many processes are important and nonlinearities common, predictions require massive data inputs (frequently collected over near continuous time) and models that can be evaluated only via supercomputers. We rarely collect biotic data at those scales in ecology. We can still gain understanding if predictions are impossible, and hopefully eventually the desire to make predictions will motivate the development of new methods or data collection. In many ecological fields, it might be worth thinking about what can be done in the future to enable predictions, even if they aren't really possible right now.

Approaches that emphasize prediction frequently improve understanding, but this is not necessarily true either. Statistically, understanding can come at the cost of predictive ability. Further, a predictive model may provide accurate predictions, but do so using collinear or synthetic variables that are hard to interpret. For example, a macroecological relationship between temperature and diversity may effectively predict diversity in a new habitat, and yet do little on its own to identify specific mechanisms. Prediction does not require interpretability or explanatory ability, as is clear from papers such as "Model-free forecasting outperforms the correct mechanistic model for simulated and experimental data". So it's worth being wary of the idea that a predictive model is necessarily 'better'.

With this difference between prediction and understanding in mind, it is perhaps easier to understand why ecologists have lagged in prediction. For a long time, statistical approaches used in ecology were biased toward those meant to improve understanding, such as regression models, where parameters estimate the strength and direction of a relationship. This is partially responsible for our obsession with p-values and R^2 terms. What Houlahan et al. do a great job of emphasizing is that by ignoring prediction as a goal, researchers are often limiting their ability confirm their understanding. Predictions that are derived from explanatory models Some approaches in ecology have already moved naturally towards emphasizing prediction, especially SDMs/ecological niche models. They recognized that it was not enough to describe species-environment relationships; testing predictions allowed them to determine how universal and mechanistic these relationships actually were. A number of macroecological models fit nicely with predictive statistical approaches, and could adopt Houlahan’s suggestions quite readily (e.g. reporting measures of predictive ability and testing models on withheld data). But for some approaches, the search for mechanism is so deeply integrated into how they approach science that it will take longer and be more difficult (but not impossible)*. Even for these areas, prediction is a worthy goal, just not necessarily an easy one. 

*I was asked for examples of 'unpredictable' areas of ecology. This may be pessimistic, but I think that something like accurately predicting the composition (both species' abundance and identity) of diverse communities at small spatial scales might always be difficult, especially given the temporal dynamics. But I could be wrong! 

...if the Simpsons could predict Trump, I suppose there's hope for ecologists too...
**This has been edited to correctly spell the author's name.