To be clear, I am not a virologist, nor am I a public health expert. But I do know how to analyze patterns of evolutionary diversity. Research into the SARS-CoV-2 virus that has given rise to the COVID-19 pandemic has greatly enhanced our understanding of global disease dynamics, mRNA vaccines and public health responses to a global crisis. But the COVID-19 pandemic also has the potential to provide fundamental insights into basic ecological and evolutionary processes.
While a lot has been written about how COVID-19 lock-downs have had noticeable repercussions on air quality and wildlife in cities, the virus lends itself as a microcosm into natural world dynamics. SARS-CoV-2 is now the most studied non-human organism on Earth, and we've witnessed its spread across the globe (which provides insights into invasion biology), it has spread exponentially in populations at times (showcasing the power of models to predict spread), and its rapid diversification is evolution in real time.
Understanding how SARS-CoV-2 strain diversity is generated is of fundamental importance for public health policies. And SARS-CoV-2 is evolving and diversifying. In Ontario, Canada, we have a wonderful resource from Public Health Ontario that publishes data on the evolution of strain diversity and provides a wonderful graphical interface. This interface focuses on the SARS-CoV-2 phylogeny (that is the evolutionary family tree connecting strains to their ancestors) in Ontario.
|An example phylogeny|
Using their open data, I addressed a simple question, is the evolutionary diversity (measured by the distances separating strains) increasing over time?
To test this, I calculated a statistical measure called the standardized effect size of the mean pairwise distances (SES.MPD) which quantifies the average distances separating strains standardized by random permutations (in this case 500 randomizations) so that a SES.MPD value of 0 means that the evolutionary diversity of a group of strains is no different than a same number of strains randomly selected from the phylogeny. Negative values mean that strains are more closely related on the phylogeny than you expect by chance (referred to as under-dispersed), and positive values mean strains are more distantly related (over-dispersed). I did these calculations for each month since the pandemic hit Ontario (March 2020) and for the seven different regions of Ontario.
What I found was that early on in the pandemic, the strains were under-dispersed, meaning that they were more closely related and genetically similar than expected by chance. But over time the dissimilarity between strains increases and by May 2021 (the last data in the graphs), many of Ontario's regions had significantly over-dispersed strains. This means that strains found in the populations in May 2021 were generally more dissimilar from one another than early on.
Why this matters is that vaccines and other treatments are typically developed on a single strain or from samples collected at a specific time point. If strains are relatively genetically similar, then it is highly probable that treatments will be successful across the strains. However, as strains diversify and become more dissimilar, then treatments might become less effective overall.
Had the spreading infection been dominated by single strains, with very few newer strains replacing older ones, we would expect that the SES.MPD values remain below zero, and would make it easier to track strains and adapt treatments.
These patterns are also valuable for insights into ecology and evolution. We often look at SES.MPD values to interpret how different processes structure diversity (like competition, predation, pollution, etc.), but we often don't have good evidence of how historical evolutionary processes can drive SES.MPD differences. The plots above show that rapid evolutionary diversification results in linearly increasing SES.MPD values.