Developing novel statistics and coalescent approaches for the improved study of virus evolution
Abstract
Many features of virus populations make them ideal candidates for population genetic study, including a
very high rate of mutation, high levels of nucleotide diversity, exceptionally large census population sizes,
and frequent positive selection. However, these attributes also mean that special care must be taken in
population genetic inference. For example, highly skewed progeny distributions, frequent and severe
population bottleneck events associated with infection and compartmentalization, and strong selection all
affect the distribution of genetic variation but are generally not taken into account. Thus, improved
inference of viral populations will necessarily require not only theoretical development, but also the
implementation of this developed theory into statistical inference tools capable of analyzing thousands of
viral genomes in a computationally efficient manner. Here, I propose these necessary developments (Aims
1-2), as well as present an application to two exceptionally deep datasets to which we have unique access
via our consortium affiliations (Aim 3). In total, this proposal represents not only a significant step in
forwarding our understanding of population genetics in these extreme parameter spaces, but will also
provide valuable clinical insights that are expected to improve future patient treatment strategies.