Monday, November 23, 2015

Challenges for microbial ecology

It is common in ecology for promising new areas of research to grow rapidly in terms of funding, students, and papers. Sometimes, such growth outpaces supporting development. This can lead to criticisms, which, when properly dealt with, can help such burgeoning subfields to mature. These are challenges currently facing microbial ecology as well. [Note I use the term microbial ecology here to refer to the ecology of microbes, not simply ecology that happens to use microbes as a study organism (e.g. Graham Bell or Lin Jiang’s experimental work).]

Microbes are fascinating. They are a very large and important group that has been under appreciated in ecological research until recently. Now, thanks to ever-improving molecular methods, the ecology of microbes is increasingly accessible. It has formed the basis of some great citizen science and public outreach (microbes in space, your home, your cat). And scientifically, work from this emerging subfield is often excellent, with broad implications to other areas of ecology (just as a couple of cool examples). Microbes are different from other taxa for all sorts of cool reasons - horizontal transfer of genes, tiny genomes, and immense functional plasticity – and this makes for fascinating discoveries.

However, the newness of this subfield is apparent as it attempts to mesh microbiology with the existing body of ecological knowledge and approaches. The result, at times, is that existing ecological theory and methods are applied unquestioningly to microbial datasets, but may not be appropriate. Unfortunately, the assumptions behind such analyses and their limitations with respect to microbial datasets aren’t always recognized, leading to questionable interpretations. There is sometimes also an over-reliance on “pipeline” approaches to microbial research; for example: collect samples, extract DNA, sequence, run through the QIIME pipeline, and present descriptive analyses, particularly beta-diversity metrics (Unifrac), PCoA or NMDS plots, and permutation-based statistical tests (e.g. ANOSIM) to determine whether assemblages of interest differ in composition. These pipelines originally arose because of the difficulties in handling such data sets and the need for specific software for analyses.

Of course, it is important to keep in mind that microbial ecology is in an early phase, where accumulating data and cataloging diversity is a priority. Mostly, issues arise when major questions in ecology are posed but perhaps without quite having appropriate methods or data to answer them. To provide an example, I sometimes see microbial ecology papers attempting to differentiate between niche and neutral processes as the drivers of microbial community assembly. Microbes are often thought of as lacking meaningful dispersal limitation (‘everything is everywhere; the environment decides’ is a common heuristic). As a result, it may be that communities assemble in a highly stochastic fashion (random arrival) or perhaps environmental filters and interactions do matter. But the issue of “niche” versus “neutrality” is a difficult question to answer using observational data in any system. It requires considering the many assumptions that underlie “niche” and “neutral”, making predictions about the patterns that would arise from these mechanisms, and then being able to differentiate these patterns from others that you might observe. This is a tall order for any observational data set, and I think that is especially true for microbial data sets.

Below I have listed in more detail the challenges arising when attempting to integrate ecology and microbiology. These relate to all sorts of ecological questions and analyses, including but not limited to “niche versus neutral”.

a) True measures of abundance are not typically available, and 16S copy number is incorrectly used as a measure of abundance. 16S ribosomal RNA is the typical target of studies of bacterial ecology. However, counts of 16S copies per taxa are not equivalent to abundances (as say, counts of individuals in macro-systems are): instead, different taxa can have different copy numbers. Where one taxa might have 2 copies, another might have 10. 

Despite this, it is common to see it used as a proxy for abundances; for example, to calculate beta-diversity measures such as Bray-Curtis. Since neutrality predicts patterns related to species' abundance distributions, and changes in diversity through time, when conclusions rely on 16S-based ‘abundance’ metrics, they are suspect. Some attempts are being made to address this – for example, this paper from Steve Kembel et al. (2012) recognizes that copy number is a conserved trait and so could be controlled for in a phylogenetically-informed way. qPCR can also be used to measure true abundances in samples. (See comments).

b) What spatial scale is relevant to microbes? Bacteria are very small (of course). However, sampling methods often involve fairly large samples in relation to bacterial body size. 1 g of soil, although tiny compared to many ecological samples, is a massive amount of material in the context of bacteria. There can be 10^8 cells/g of soil, and by one estimate the interaction distance between individuals is ~20um, and so it is not likely that a 1g sample is equivalent in scale to a “community”.

If a typical observational sample is not representative of a community, community ecology theory, which is dependent on assumptions about local interactions and environmental filters at particular spatial scales may not be relevant. Scale issues are an ongoing problem in ecology, and defining the ‘community’ is a thorn in our sides. It is understandable that this is a problem for a new field. Thinking about the kind of data collected as relating to macroecology may be a fruitful approach (see this paper for similar ideas on the topic). 

c) Temporal scale has similar issues. Unlike in macro-scale systems, microbial time scales are very rapid, with approximately 100-300 generations per year (with some variation between taxa). The scale of environmental variation that affects these communities should be finer as well. This is a benefit and a difficulty of the system. For examples, one can potentially observe a community assemble to equilibrium in a bacterial system. But describing changes in bacterial composition observed over 1 year as succession and placing them in the context of ecological literature on plant succession seems imprecise. The scale of observation is of particular importance.

d) There can be issues in differentiating between active and inactive taxa, since microbes may be present in a sample but dormant. Methods exist to differentiate between these taxa, but when not applied, an apparently rare taxa in an assemblage may actually be an inactive taxa.

e) Sampling artifacts and other biases can arise between labs and runs, including biases related to PCR, primers, DNA extraction, storage, rarefaction, and more. This is an issue equivalent to limitations in methodological approaches in every field, and one that is actively being worked on (for example, developing standardized approaches). Further, the existing technology is pretty amazing.

f) Limitations of the current null models and statistical methods being applied. Null models are still a work in progress for ecology, and need to continue to be developed and perfected. But I think that there are specific issues that need to be considered in applying some of these methods to microbial data in particular, and there is a need for concerted research on developing statistical methods for such massive datasets.

In particular, I suspect there is an issue regarding heightened Type 1 error rates and issues with inadequately randomizing very large data sets. Ulrich and Gotelli (2012) hint at some of these possible issues:
“null model analysis may not be well-suited to such large data sets. The general statistical problem is that with very large data sets, the null hypothesis will always be rejected unless the data were actually generated by the null model process itself. So, large data sets may often deviate significantly from null models in which row and column sums are fixed, regardless of whether species occurrences are random or not (Fayle and Manica 2010). This was not a problem in the early history of null model analysis, when ecologists worried that apparent patterns in relatively small data sets might reflect random processes”
There is not enough time here to delve into most of these issues in detail, but permutation tests/Mantel test type analyses have a number of important limitations and assumptions must be tested for appropriate usage (from Pierre Legendre). From the ANOSIM website
“Recent work…has shown distance-based methods (e.g., ANOSIM, Mantel Test, BIOENV, BEST) are inappropriate for analyzing Beta diversity because they do not correctly partition the variation in the data and do not provide the correct Type-I error rates.” 
If Type I error rates are frequently high in past analyses, or inappropriate statistical models were used, data can be re-analysed as better procedures arise. But we should also recognize that there is uncertainty in past results (particularly weak or barely significant patterns). It should also suggest that we have yet to gain a true understanding of what patterns and relationships in microbial ecology are truly significant.

Microbial research produces some of the most complex and large datasets that ecology has ever had to deal with. As a result, developing specific theory and appropriate methods for this data should be a priority alongside discovery-focused research. Fortunately, this creates opportunities for ecologists to develop methods for complex systems, which should be beneficial for the entire ecological discipline. And many people are already attempting to fill these knowledge gaps, so this is not to underplay their accomplishments. Hopefully there will continue to be developments in microbe-specific theory, with appropriate assumptions regarding temporal and spatial scale. Microbial ecologists can do better than co-opt standard ecological approaches, they can improve on them (e.g. Coyte et al. (2015)).


erik Verbruggen said...

Great piece, indeed there are many difficulties but I agree with you there are also great prospects. I did want to comment on your recommendation of qpcr on 16s. The way it is stated now, it appears as though the multi-copy nature of the gene wouldn't pose a challenge to assessing abundance of taxa with this method, but it does still. Unless you mean using it as a means to calculate the ratio to a known single-copy gene, but that requires a lot of knowledge typically not available. Of course qpcr does improve quantitation of the gene over normal pcr for a given taxon, and should therefore be preferred, as you state.

Caroline Tucker said...

Hi Erik - thanks for the clarification. I was thinking of methods where, as you say, qPCR is used to compare to a single copy gene for a small number of taxa; that probably isn't very useful for environmental samples.