Showing posts with label models. Show all posts
Showing posts with label models. Show all posts

Wednesday, August 26, 2015

Science is a maze

If you want to truly understand how scientific progress works, I suggest fitting mathematical models to dynamical data (i.e. population or community time series) for a few days.
map for science?

You were probably told sometime early on about the map for science: the scientific method. It was probably displayed for your high school class as a tidy flowchart showing how a hypothetico-deductive approach allows scientists to solve problems. Scientists make observations about the natural world, gather data, and come up with a possible explanation or hypothesis. They then deduce the predictions that follow, and design experiments to test those predictions. If you falsify the predictions you then circle back and refine, alter, or eventually reject the hypothesis. Scientific progress arises from this process. Sure, you might adjust your hypothesis a few times, but progress is direct and straightforward. Scientists aren’t shown getting lost.

Then, once you actively do research, you realize that formulation-reformulation process dominates. But because for most applications the formulation-reformulation process is slow – that is, each component takes time (e.g. weeks or months to redo experiments and analyses and work through reviews) – you only go through that loop a few times. So you usually still feel like you are making progress and moving forward.

But if you want to remind yourself just how twisting and meandering science actually is, spend some time fitting dynamic models. Thanks to Ben Bolker’s indispensible book, this also comes with a map, which shows how closely the process of model fitting mirrors the scientific method. The modeller has some question they wish to address, and experimental or observational data they hope to use to answer it. By fitting or selecting the best model for they data, they can obtain estimates for different parameters and so hopefully test predictions from they hypothesis. Or so one naively imagines.
From Bolker's Ecological Models and Data in R,
a map for model selection. 
The reality, however, is much more byzantine. Captured well in Vellend (2010)
“Consider the number of different models that can be constructed from the simple Lotka-Volterra formulation of interactions between two species, layering on realistic complexities one by one. First, there are at least three qualitatively distinct kinds of interaction (competition, predation, mutualism). For each of these we can have either an implicit accounting of basal resources (as in the Lotka-Volterra model) or we can add an explicit accounting in one particular way. That gives six different models so far. We can then add spatial heterogeneity or not (x2), temporal heterogeneity or not (x2), stochasticity or not (x2), immigration or not (x2), at least three kinds of functional relationship between species (e.g., predator functional responses, x3), age/size structure or not (x2), a third species or not (x2), and three ways the new species interacts with one of the existing species (x3 for the models with a third species). Having barely scratched the surface of potentially important factors, we have 2304 different models. Many of them would likely yield the same predictions, but after consolidation I suspect there still might be hundreds that differ in ecologically important ways.”
Model fitting/selection, can actually be (speaking for myself, at least) repetitive and frustrating and filled with wrong turns and dead ends. And because you can make so many loops between formulation and reformulation, and the time penalty is relatively low, you experience just how many possible paths forward there to be explored. It’s easy to get lost and forget which models you’ve already looked at, and keeping detailed notes/logs/version control is fundamental. And since time and money aren’t (as) limiting, it is hard to know/decide when to stop - no model is perfect. When it’s possible to so fully explore the path from question to data, you get to suffer through realizing just how complicated and uncertain that path actually is. 
What model fitting feels like?

Bolker hints at this (but without the angst):
“modeling is an iterative process. You may have answered your questions with a single pass through steps 1–5, but it is far more likely that estimating parameters and confidence limits will force you to redefine your models (changing their form or complexity or the ecological covariates they take into account) or even to redefine your original ecological questions.”
I bet there are other processes that have similar aspects of endless, frustrating ability to consider every possible connection between question and data (building a phylogenetic tree, designing a simulation?). And I think that is what science is like on a large temporal and spatial scale too. For any question or hypothesis, there are multiple labs contributing bits and pieces and manipulating slightly different combinations of variables, and pushing and pulling the direction of science back and forth, trying to find a path forward.

(As you may have guessed, I spent far too much time this summer fitting models…)

Friday, December 12, 2014

A changing world: Themes from the 2014 BES-SFE meeting in Lille #BESSfe

I attended the joint British Ecological Society/Société Française d’Ecologie (BES/SFE) meeting held in Lille, France, Dec. 9-12. I quite enjoy BES meetings, but this one felt just a little more dynamic and exciting. The meeting did a great job of bringing people together who otherwise might not attend the same meetings. The overall quality of talks was excellent and the impression was that labs were presenting their best, most exciting results. One thing that always fascinates me about meetings is the fact that emergent themes arise that reflect what people are currently excited about. Over the three days of talks, I felt that three emergent themes seemed particularly strong among the talks I attended:

1) Pollinators in a changing world

Photo by Marc Cadotte
There were a surprising number of talks focusing on human-caused changes to landscapes affect pollinator abundance and diversity. I am an Editor of a British Journal (Journal of Applied Ecology) and work on pollinator diversity has always been stronger in the UK, but there were just so many talks that it is obvious that this is an important issue for many people in the UK and Europe. Nick Isaac examined whether butterfly abundance was related to the abundance of host plants –which should be a measure of habitat quality. Plants that serves as hosts for caterpillars were more important than those that supply nectar to adults, presumably because the adults can better find resources. And specialist species were especially sensitive to host plant diversity.

Adriana De Palma gave a great talk on reanalyzing global patterns of bee responses to land-use and showed that biases in where research is done is influencing generalities. Bee communities in some well-studied regions appear more sensitive to land-use change and those regions with many bumblebees mask effects that on other types of bees. Bill Kunin examined patterns at a regional scale (UK) where a pollinator crisis was identified in the late 2000s and causes have been attributed to everything from land-use change to pesticide use to cell phones -to the second coming of Jesus. Habitat quality and flora resources do not seem to be that important at large scales, but there seems to be a strong effect of pesticide use. But at a smaller landscape scale, Florence Hecq showed that habitat heterogeneity within agricultural landscapes and the size of semi-natural grasslands were important for maintaining pollinator diversity. Changes in pollinator diversity have consequences for crop yield, as shown nicely by Colin Fontaine.
Photo by Marc Cadotte
 In a really interesting study, Olivia Norfolk showed that traditional agriculture practices by Bedouin minorities in Egypt enhanced pollinator abundance. Because their agricultural practices support high plant diversity, both wild and domestic plant species, pollinators fare better than in intense agriculture. Moreover, one of the most important crops, almonds, sees higher yield with higher plant diversity –though this effect is lost when there are a lot of introduced honeybees.

2) Effects of land-use on biodiversity

A number of other talks examined how human-caused changes influence biodiversity patterns and resulting functions across a number of taxa. Jonathan Tonkin examined a number of different types of species (plants, beetles, spiders, etc.) that occur along riparian habitats and showed that there weren’t concordant changes in richness, but there were simultaneous shifts in composition. Human stressed caused multiple communities to shift to very nonrandom community types. In Agricultural systems, Colette Bertrand showed that agriculture that changed frequently (e.g., crop rotation) supported more beetle species that systems where the same crops are planted year after year.

Human deforestation greatly changes many biodiversity patterns and we need to better understand these make sound conservation decisions. Cecile Albert examined land-use change and fragmentation in southern Quebec and showed that we can determine the importance of forest patches in human-dominated landscapes for the ability of species to move between large forested areas. Using her model she can identify where conservation and habitat protection should be focused. Nicolas Labriere studied how different forest changes influenced the delivery of ecosystem services, including carbon storage, diversity and soil retention. He showed that only intact forests were able to maximally deliver all ecosystem services.
From WWF

3) Species differences and dynamics at different scales

A major theme is how species differences are important for ecological processes, ecosystem function and conservation. I’ve argued elsewhere that we are heading into a paradigm shift in ecology, where we've moved from counting species to accounting for species. Wilfried Thuiller asked how well European reserves conserve different forms of biodiversity, namely functional and phylogenetic diversity. He prioritized species by their distinctiveness and range size so that the most important were functionally or phylogenetically unique and have a small range. Distinct mammals tend to not be well protected and the modern reserve system does not maximally protect biodiversity. This is most acute in eastern Europe where there is a order of magnitude less protected area than in western Europe.

Georges Kunstler argued that trait approaches to understanding competition are valuable because they can reduce the dimensionality of students, from all pairwise species interactions to relative simple measures of trait differences. He showed, using an impressive global forest dataset, that competition appears stronger when neighbour trees are more similar in their traits.

A number of talks examined if measures of species differences can explain biodiversity patterns. At very large scales, Kyle Dexter showed that phylogenetic diversity does not explain where species are across the neotropics. In some places species are in the same habitat as a close relative and sometimes with a distant relative. At smaller scales, talks explored trait or phylogenetic patterns Andros Gianuca, Anne Pilière and Lars Götzenberger all assessed the relative contributions of trait and phylogenetic differences to explain community patterns and all showed that phylogeny may be a stronger explanation than the traits they measured.

4) Species dynamics, coexistence and ecosystem function

Understanding tree growth and dispersal are key to predicting how forests will respond to environmental change and to successfully managing and conserving them. Sean MacMahon showed that the seasonality of tree growth is critical to modelling carbon flux in forests. He developed an ingenious set of modelling approaches to analyze daily tree diameter change and showed that growth is highly concentrated in the middle of the growing season, which is at odds with traditional conceptual models where tree growth is constant from spring to fall. Noelle Beckman examined tree dispersal and the consequence of losing vertebrate seed dispersers. She showed that reducing the number of seed dispersers results in low seeding survival because seedlings are locally very dense, instead of being dispersed, and seed predators and other enemies have an easier time finding them.

The mechanism most often cited by plant community ecologists is competition, but Christian Damgaard states that this simple mechanism is almost never tested. Further, models of competition are often based on numbers of individuals, but plants make such counts notoriously difficult. Instead he developed a very elegant model showing how plant height and horizontal cover feedback to competition. What he calls vertical density is a predictor of the following season’s horizontal cover. Competition is also key to observing a relationship between species richness and ecosystem function. Rudolf Rohr showed, using a series of Lotka-Volterra models that randomly assembling communities always results in a positive relationship between richness and function –which is why experiments often support this pattern. In natural communities, this relationship often disappears, and he shows that simulations with competitive sorting break this relationship.

Finally, Florian Altermatt examined whether the physical structure of stream networks influences the distribution of diversity in streams using protozoan and bacterial communities in series of connected tubes that look like a branch, and compared these to linear tubes. He found that diversity is highest in the interior branches (see image to the left), much like real rivers, and the linear system had no such pattern of diversity. He attributed part of this diversity gradient to competitive differences among species and differences in movement of the organisms.

Thursday, April 24, 2014

Data merging: are we moving forward or dealing with Frankenstein's monster

I’m sitting in the Sydney airport waiting for my delayed flight –which gives me some time to ruminate about the mini-conference I am leaving. The conference, hosted by the Centre for Biodiversity Analysis (CBA) and CSIRO in Australia, on "Understanding biodiversity dynamics using diverse data sources", brought together several fascinating thinkers working on disparate areas including ecology, macroecology, evolution, genomics, and computer science. The goal of the conference was to see if merging different forms of data could lead to greater insights into biodiversity patterns and processes. 

Happy integration

On the surface, it seems uncontroversial to say that bringing together different forms of data really does promote new insights into nature. However, this only really works if the data we combine meaningfully complement one another. When researchers bring together data, there are under-appreciated risks, and the resulting effort could result in trying to combine data that make weird bedfellows.
Weird bedfellows

The risks include data that mismatch in the scale of observation, resulting in meaningful variation being missed. Data are often generated according to certain models with specific assumptions, and these data-generation steps can be misunderstood by end-users, resulting in inappropriate uses of data. Further, different data may be combined in standard statistical models, but the linkages between data types is much more subtle and nuanced, requiring alternative models.

Why these are issues stems from the fact that researchers now have an unprecedented access to numerous large data sets. Whether these are large trait data sets, spatial locations, spatial environmental data, genomes, or historical data, they are all built with specific underlying uses, limitations and assumptions.  

Regardless of these issues of concern, the opportunity and power to address new questions is greatly enhanced by multiple types of data. One thing I gained from this meeting is that there is a new world of biodiversity analysis and understanding emerging by smart people doing smart things with multiple data. We will soon live in a world where the data and analytical tools allow research to truly combine multiple processes to predict species' distributions, or to move from evolutionary events in deep history to modern day ecological patterns.