Thursday, March 9, 2017

Data management for complete beginners

Bill Michener is a longtime advocate of data management and archiving practices for ecologists, and I was lucky to catch him giving talk on the topic this week. It clarified for me the value of formalizing data management plans for institutions and lab groups, but also the gap between recommendations for best practices in data management and the reality in many labs.

Michener started his talk with two contrasting points. First, we are currently deluged by data. There is more data available to scientists now than ever, perhaps 45000 exabytes by 2020. On the other hand, scientific data is constantly lost. The longer since a paper is published, the less likely its data can be recovered (one study he cited showed that data had a half life of 20 years). There are many causes of data loss, some technological, some due to changes in sharing and publishing norms. The rate at which data is lost may be declining though. We're in the middle of a paradigm shift in terms of how scientists see our data. Our vocabulary now includes concepts like 'open access', 'metadata', and 'data sharing'. Many related initiatives (e.g.  GenBank, Dryad, Github, GBIF) are fairly familiar to most ecologists. Journal policies increasingly ask for data to be deposited into publicly available repositories, computer code is increasingly submitted during the review process, and many funding agencies now require statements about data management practices.

This has produced huge changes in typical research workflows over the past 25 years. But data management practices have advanced so quickly there’s a danger that some researchers will begin to feel that it is unobtainable, due to the level of time, expertise, or effort involved. I feel like sometimes data management is presented as a series of unfamiliar tools and platforms (often changing) and this can make it seem hard to opt in. It’s important to emphasize good data management is possible without particular expertise, and in the absence of cutting edge practices and tools. What I liked about Michener's talk is that it presented practices as modular ('if you do nothing else, do this') and as incremental. Further, I think the message was that this paradigm shift is really about moving from a mindset in which data management is done posthoc ('I have a bunch of data, what should I do with it?') to considering how to treat data from the beginning of the research process.

Hierarchy of data management needs.

One you make it to 'Share and archive data', you can follow some of these great references.

Hart EM, Barmby P, LeBauer D, Michonneau F, Mount S, Mulrooney P, et al. (2016) Ten Simple Rules for Digital Data Storage. PLoS Comput Biol 12(10): e1005097. doi:10.1371/journal.pcbi.1005097

James A. Mills, et al. Archiving Primary Data: Solutions for Long-Term Studies, Trends in Ecology & Evolution, Volume 30, Issue 10, October 2015, Pages 581-589, ISSN 0169-5347.

https://software-carpentry.org//blog/2016/11/reproducibility-reading-list.html (lots of references on reproducibility)

K.A.S. Mislan, Jeffrey M. Heer, Ethan P. White, Elevating The Status of Code in Ecology, Trends in Ecology & Evolution, Volume 31, Issue 1, January 2016, Pages 4-7, ISSN 0169-5347.


Thanks to Matthias GreniƩ for discussion on this topic.

Monday, February 27, 2017

Archiving the genomes of all species

There is so much bad news about global biodiversity, that it is nice to hear about new undertakings and approaches. One of these is the 'Earth BioGenome Project' which proposes to sequence the genomes of the entirety of life on earth. Given that sequencing services have never been more affordable and more available to scientists, without question, though ambitious this is a feasible undertaking. Still, with perhaps 9 million eukaryotes on the planet, a rough prediction suggests it could take 10 years and several billion dollars to achieve.

The cost suggests a certain agony of choice - what is the best use of that amount of money (in the dream world where money can be freely moved between projects)? Direct application to conservation and management activities, or a catalog of diversity which may be the only way to save some of these species? 
Leonard Eisenberg's tree of life (https://www.evogeneao.com).

Friday, February 3, 2017

When is the same trait not the same?

Different clades and traits yield similar grassland functional responses. 2016. Elisabeth J. Forrestel, Michael J. Donoghue,  Erika J. Edwards,  Walter Jetz,  Justin C. O. du Toite, and Melinda D. Smith. vol. 114 no. 4, 705–710, doi: 10.1073/pnas.1612909114

A potential benefit of trait-centric approaches is that they may provide a path to generality in community ecology. Functional traits affect growth, reproduction, and survival, and so--indirectly--should determine an organism's fitness; differences in functional traits may delineate niche differences. Since fitness is dependent on the environment, it is generally predicted that there should be strong and consistent trait–environment relationships. Species with drought-tolerant traits will be most dominant in low precipitation regions, etc, etc. Since productivity should also relate to fitness, there should be strong and consistent trait–ecosystem functioning relationships.

There are also quite general descriptions of species traits, and the life histories they imbue (e.g. the leaf economic spectrum), implying again that traits can yield general predictions about an organism's ecology. Still, as McIntyre et al. (1999) pointed out, "A significant advance in functional trait analysis could be achieved if individual studies provide explicit descriptions of their evolutionary and ecological context from a global perspective."

A new(ish) paper does a good job of illustrating this need. In Forrestel et al. the authors compare functional trait values across two different grassland systems, which share very similar environmental gradients and grass families present but entirely different geological and evolutionary histories. The North American and South African grasslands share similar growing season temperatures and the same precipitation gradient, hopefully allowing comparison between regions. They differ in grass species richness (62 grass species in SA and 35 in NA) and species identity (no overlapping species), but contain the same major lineages (Figure below).
From Forrestel et a. Phylogenetic turnover for major lineages along a
precipitation gradient differed between the 2 regions.
Mean annual precipitation (MAP) is well-established as an important selective factor and many studies show relationships between community trait values and MAP. The authors measured a long list of relevant traits, and also determined the above ground net primary productivity (ANPP) for sites in each grassland. When they calculated the community weighted mean value (CWM) of traits along the precipitation gradient, for 6 of the 11 traits measured region was a significant covariate (figure below). The context (region) determined the response of those traits to precipitation.
From Forrestel et al.
Further, different sets of traits were the best predictors of ANPP in NA versus SA. In SA, specific leaf area and stomatal pore index were the best predictors of ANPP, while in NA height and leaf area were. The upside was that for both regions, models of ANPP explained reasonable amounts of variation (48% for SA, 60% for NA).

It's an important message: plant traits matter, but how they matter is not necessarily straightforward or general without further context. The authors note, "Instead, even within a single grass clade, there are multiple evolutionary trajectories that can lead to alternative functional syndromes under a given precipitation regime" 

Tuesday, January 24, 2017

The removal of the predatory journal list means the loss of necessary information for scholars.

We at EEB & Flow periodically post about trends and issues in scholarly publishing, and one issue that we keep coming back to is the existence of predatory Open Access journals. These are journals that abuse a valid publishing model to make a quick buck and use standards that are clearly substandard and are meant to subvert the normal scholarly publishing pipeline (for example, see: here, here and here). In identifying those journals that, though their publishing model and activities, are predatory, we have relied heavily on Beall's list of predatory journals. This list was created by Jeffrey Beall, with the goal of providing scholars with the necessary information needed to make informed decisions about which journals to publish in and to avoid those that likely take advantage of authors.

As of a few days ago, the predatory journal list has been taken down and is no longer available online. Rumour has it that Jeffrey Beall removed the list in response to threats of lawsuits. This is really unfortunate, and I hope that someone who is dedicated to scholarly publishing will assume the mantle.

However, for those who still wish to consult the list, an archive of the list still exists online -found here.

Friday, January 20, 2017

True, False, or Neither? Hypothesis testing in ecology.

How science is done is the outcome of many things, from training (both institutional and lab specific), reviewers’ critiques and requests, historical practices, subdiscipline culture and paradigms, to practicalities such as time, money, and trends in grant awards. ‘Ecology’ is the emergent property of thousands of people pursuing paths driven by their own combination of these and other motivators. Not surprisingly, the path of ecology sways and stalls, and in response papers pop up continuing the decades old discussion about philosophy and best practices for ecological research.

A new paper from Betini et al. in the Royal Society Open Science contributes to this discussion by asking why ecologists don’t test multiple competing hypotheses (allowing efficient falsification or “strong inference” a la Popper). Ecologists rarely test multiple competing hypothesis test: Betini et al. found that only 21 of 100 randomly selected papers tested 2 hypotheses, and only 8 tested greater than 2. Multiple hypothesis testing is a key component of strong inference, and the authors hearken to Platt’s 1964 paper “Strong Inference” as to why ecologists should be adopting adopt strong inference. 
Platt
From Platt: “Science is now an everyday business. Equipment, calculations, lectures become ends in themselves. How many of us write down our alternatives and crucial experiments every day, focusing on the exclusion of a hypothesis? We may write our scientific papers so that it looks as if we had steps 1, 2, and 3 in mind all along. But in between, we do busywork. We become "method-oriented" rather than "problem-oriented." We say we prefer to "feel our way" toward generalizations.
[An aside to say that Platt was a brutally honest critic of the state of science and his grumpy complaints would not be out of place today. This makes reading his 1964 paper especially fun. E.g. “We can see from the external symptoms that there is something scientifically wrong. The Frozen Method. The Eternal Surveyor. The Never Finished. The Great Man With a Single Hypothesis. The Little Club of Dependents. The Vendetta. The All-Encompassing Theory Which Can Never Be Falsified.”]
Betini et al. list a number of common practical intellectual and practical biases that likely prevent researchers from using multiple hypothesis testing and strong inference. These range from confirmation bias and pattern-seeking to the fallacy of factorial design (which leads to unreasonably high replication requirements including of uninformative combinations). But the authors are surprisingly unquestioning about the utility of strong inference and multiple hypothesis testing for ecology. For example, Brian McGill has a great post highlighting the importance and difficulties of multi-causality in ecology - many non-trivial processes drive ecological systems (see also). 

Another salient point is that falsification of hypotheses, which is central to strong inference, is especially unserviceable in ecology. There are many reasons that an experimental result could be negative and yet not result in falsification of a hypothesis. Data may be faulty in many ways outside of our control, due to inappropriate scales of analyses, or because of limitations of human perception and technology. The data may be incomplete (for example, from a community that has not reached equilibrium); it may rely inappropriately on proxies, or there could be key variables that are difficult to control (see John A. Wiens' chapter for details). Even in highly controlled microcosms, variation arises and failures occur that are 'inexplicable' given our current ability to perceive and control the system.

Or the data might be accurate but there are statistical issues to be concerned about, given many effect sizes are small and replication can be difficult or limited. Other statistical issues can also make falsification questionable – for example, the use of p-values as the ‘falsify/don’t falsify’ determinant, or the confounding of AIC model selection with true multiple hypothesis testing.

Instead, I think it can be argued that ecologists have relied more on verification – accumulating multiple results supporting a hypothesis. This is slower, logically weaker, and undoubtedly results in mistakes too. Verification is most convincing when effect sizes are large – e.g. David Schindler’s lake 226, which provided a single and principal example of phosphorus supplementation causing eutrophication. Unfortunately small effect sizes are common in ecology. There also isn’t a clear process for dealing with negative results when a field has relied on verification - how much negative evidence is required to remove a hypothesis from use, versus just lead to caveats or modifications?

Perhaps one reason Bayesian methods are so attractive to many ecologists is that they reflect the modified approach we already use - developing priors based on our assessment of evidence in the literature, particularly verifications but also evidence that falsifies (for a better discussion of this mixed approach, see Andrew Gelman's writing). This is exactly where Betini et al.'s paper is especially relevant – intellectual biases and practical limitations are even more important outside of the strict rules of strong inference. It seems important as ecologists to address these biases as much as possible. In particular, better training in philosophical, ethical and methodological practices; priors, which may frequently be amorphous and internal, should be externalized using meta-analyses and reviews that express the state of knowledge in unbiased fashion; and we should strive to formulate hypotheses that are specific and to identify the implicit assumptions.

Friday, January 13, 2017

87 years ago, in ecology

Louis Emberger was an important French plant ecologist in the first half of the last century, known for his work on the assemblages of plants in the mediterranean.

For example, the plot below is his published diagram showing minimum temperature of the coolest month versus a 'pluviometric quotient' capturing several aspects of temperature and precipitation from:

Emberger; La vƩgƩtation de la rƩgion mƩditerranienne. Rev. GƩn. Bot., 42 (1930)

Note this wasn't an unappreciated or ignored paper - it received a couple hundred citations, up until present day. Further, updated versions have appeared in more recent years (see bottom).

So it's fascinating to see the eraser marks and crossed out lines, this visualisation of scientific uncertainty. The final message from this probably depends on your perspective and personality:
  • Does it show that plant-environment modelling has changed a lot or that plant environmental modelling is still asking about the same underlying processes in similar ways?
  • Does this highlight the value of expert knowledge (still cited) or the limitations of expert knowledge (eraser marks)? 
It's certainly a reminder of how lucky we are to have modern graphical software :)



E.g. updated in Hobbs, Richard J., D. M. Richardson, and G. W. Davis. "Mediterranean-type ecosystems: opportunities and constraints for studying the function of biodiversity." Mediterranean-Type Ecosystems. Springer Berlin Heidelberg, 1995. 1-42.











Thanks to Eric Garnier, for finding and sharing the original Emberger diagram and the more recent versions.

Monday, December 19, 2016

2016 holiday caRd

Once more, tis the season! Hope you had an excellent year of science and R coding. This card requires the igraph library - it (loosely) relies on an infection (S-I model) moving through a network :-)

To view season's greetings from 2016:
Go to the gist and download the file directly ("download gist") or hit "raw" and copy/paste. Or, copy and paste the code below.

Users of Rstudio will not be able to see the animation, so base R is highly recommended.

For those not able or willing to run the card, you can view it and the past years' cards here!

Tuesday, December 13, 2016

150 years of 'ecology'

The word ‘ecology’ was coined 150 years ago by Ernst Haeckel in his book Generelle Morphologie der Organismen published in 1866. Mike Begon gave a fascinating talk at the British Ecological Society meeting in Liverpool on what ecology as meant over these past 150 years and what it should mean in the future. The description of ecology that follows, is largely taken from Begon’s remarks.

Ernst Haeckel, 1860
Haeckel defined ecology as ‘the science of the relations of organism to its surrounding outside world (environment)’, which is in obvious contrast to the then burgeoning science of physiology, which was concerned with the world inside of an organism. Interestingly, the first 50 years of this new field of ecology was dominated by the study of plants. In America, Clements, while in the UK, Tansley, both saw ecology as the description of patterns of plant in relation to the outside world. In many ways, this conception of ecology was what Haeckel had envisioned.

Frederic Clements

However, by the 1960s, the domain of ecology began to grow rapidly. Ecologists like Odum used ‘ecology’ to mean the structure and function of ecosystems, while others focussed on the abundance and distribution of species. By this time ecology had grown to encapsulate all aspects of organismal patterns and functions in nature.

The post-60s period saw another expansion -namely the value of ecology. While Begon points out that text books, including his, focussed on the science of ecology in its pure form, many were ignoring the fact that ecology had/has important repercussions for how humanity will need to deal with the massive environmental impacts we’ve had on Earth’s natural systems. That is, the science of ecology can provide the foundation by which applied management solutions can be built. I personally believe that applied ecology has only just begun its ascension to being the most important element of ecological science (but I’m biassed -being the Executive Editor of the Journal of Applied Ecology). Just like how human physiology has become problem oriented, often focussed on human disease, ecology will too become more problem oriented and focus on our sick patients.


Begon went on to say what ecology should be in the near future. He juxtaposed the fact and truth based necessity of science to the post-truth Brexit/Trump era we now find ourselves in. If ecologists and scientists are to engage the public, and alter self-destructive behaviours, it cannot be with logic and evidence alone. He argued that we need to message like those post-truthers. Use metaphors, simple messages that are repeated, repeated, and repeated.

Friday, November 25, 2016

Can coexistence theories coexist?

These days, the term ‘niche’ manages to cover both incredibly vague and incredibly specific ideas. All the many ways of thinking about an organism’s niche fill the literature, with various degrees of inter-connection and non-independence. The two dominant descriptions in modern ecology (last 30 years or so) are from ‘contemporary niche theory’ and ‘modern coexistence theory’. Contemporary niche theory is developed from consumer-resource theory, where organisms' interactions occur via usage of shared resources. (Though it has expanded to incorporate predators, mutualists, etc), Analytical tools such as ZNGIs and R* values can be used to predict the likelihood of coexistence (e.g. Tilman 1981, Chase & Leibold 2003). Modern coexistence theory is rooted in Peter Chesson’s 2000 ARES review (and earlier work), and describes coexistence in terms of fitness and niche components that allow positive population growth.

On the surface these two theories share many conceptual similarities, particularly the focus on measuring niche overlap for coexistence. [Chesson’s original work explicitly connects the R* values from Tilman’s work to species’ fitnesses in his framework as well]. But as a new article in Ecological Monographs points out, the two theories are separated in the literature and in practice. The divergence started with their theoretical foundations: niche theory relied on consumer-resource models and explicit, mechanistic understanding of organisms’ resource usage, while coexistence theory was presented in terms of Lotka-Volterra competition models and so phenomenological (e.g. the mechanisms determining competition coefficients values are not directly measured). The authors note, “This trade-off between mechanistic precision (e.g. which resources are regulating coexistence?) and phenomenological accuracy (e.g. can they coexist?) has been inherited by the two frameworks….”

There are strengths and weaknesses to both approaches, and both have been used in important ecological studies. So it's surprising that they are rarely mentioned in the same breathe. Letten et al. answer an important question: when directly compared, can we translate the concepts and terms from contemporary niche theory into modern coexistence theory and vice versa?

Background - when is coexistence expected? 
Contemporary niche theory (CNT) (for the simplest case of two limiting resources): for each species, you must know the consumption or impact they have on each resource; the ratio at which the two resources are supplied, and the ZNGIs (zero net growth isoclines, which delimit the resource conditions a species can grow in). Coexistence occurs when the species are better competitors for different resources, when each species has a greater impact on their more limiting resource, and when the supply ratio of the two resources doesn’t favour one species over the other. (simple!)

For modern coexistence theory (MCT), stable coexistence occurs when the combination of fitness differences and niche differences between species allow both species to maintain positive per capita growth rates. As niche overlap decreases, increasingly small fitness differences are necessary for coexistence.

Fig 1, from Letten et al. The criteria for coexistence under modern coexistence theory (a) and contemporary niche theory (b).  In (a), f1 and f2 reflect species' fitnesses. In (b) "coexistence of two species competing for two substitutable resources depends on three criteria: intersecting ZNGIs (solid red and blue lines connecting the x- and y-axes); each species having a greater impact on the resource from which it most benefits (impact vectors denoted by the red and blue arrows); and a resource supply ratio that is intermediate to the inverse of the impact vectors (dashed red and blue lines)."

So how do these two descriptions of coexistence relate to each other? Letten et al. demonstrate that:
1) Changing the supply rates of resources (for CNT) impacts the fitness ratio (equalizing term in MCT). This is a nice illustration of how the environment affects the fitness ratios of species in MCT.

2) Increasing overlap of the impact niche between two species under CNT is consistent with increasing overlap of modern coexistence theory's niche too. When two species have similar impacts on their resources, there should be very high niche overlap (weak stabilizing term) under MCT too.

3) When two species' ZNGI area converge (i.e. the conditions necessary for positive growth rates), it affects both the stabilizing and equalizing terms in MCT. However, this has little meaningful effect on coexistence (since niche overlap increases, but fitness differences decrease as well).

This is a helpful advance because Letten et al. make these two frameworks speak the same (mathematical) language. Further, this connects a phenomological framework with a (more) mechanistic one. The stabilizing-equalizing concept framework (MCT) has been incredibly useful as a way of understanding why we see coexistence, but it is not meant to predict coexistence in new environments/with new combinations of species. On the other hand, contemporary niche theory can be predictive, but is unwieldy and information intensive. One way forward may be this: reconciling the similarities in how both frameworks think about coexistence.

Letten, Andrew D., Ke, Po-Ju, Fukami, Tadashi. 2016. Linking modern coexistence theory and contemporary niche theory. Ecological Monographs: 557-7015. http://dx.doi.org/10.1002/ecm.1242
(This is a monograph for a reason, so I am just covering the major points Letten et al provide in the paper. It's definitely worth a careful read as well!).

Wednesday, November 16, 2016

The value of ecology through metaphor

The romanticized view of an untouched, pristine ecosystem is unrealistic; we now live in a world where every major ecosystem has been impacted by human activities. From pollution and deforestation, to the introduction of non-native species, our activity has influenced natural systems around the globe. At the same time, ecologists have largely focused on ‘intact’ or ‘natural’ systems in order to uncover the fundamental operations of nature. Ecological theory abounds with explanations for ecological patterns and processes. However, given that the world is increasingly human dominated and urbanized, we need a better understanding of how biodiversity and ecosystem function can be sustained in the presence of human domination. If our ecological theories provide powerful insights into ecological systems, then human dominated landscapes are where they are desperately needed to solve problems.
From the Spectator

This demand to solve problems is not unique to ecology, other scientific disciplines measure their value in terms of direct contributions to human well-being. The most obvious is human biology. Human biology has transitioned from gross morphology, to physiology, to molecular mechanisms controlling cellular function, and all of these tools provide powerful insights into how humans are put together and how our bodies function. Yet, as much as these tools are used to understand how healthy people function, human biologists often stay focussed on how to cure sick people. That is, the proximate value ascribed to human biology research is in its ability to cure disease and improve peoples’ lives. 


In Ecology, our sick patients are heavily impacted and urbanized landscapes. By understanding how natural systems function can provide insights into strategies to improve degraded ecosystems. This value of ecological science manifests itself in shifts in funding and publishing. We now have synthesis centres that focus on the human-environment interaction (e.g., SESYNC). The journals that publish papers that provide applied solutions to ecological and environmental problems (e.g., Journal of Applied Ecology, Frontiers in Ecology and the Environment, etc.) have gained in prominence over the past decade. But more can be done.


We should keep the ‘sick patient’ metaphor in the back of our minds at all times and ask how our scientific endeavours can help improve the health of ecosystems. I was once a graduate student that pursued purely theoretical tests of how ecosystems are put together, and now I am the executive editor of an applied journal. I think that ecologists should feel like they can develop solutions to environmental problems, and that their underlying science gives them a unique perspective to improving the quality of life for our sick patients.