Hyper-diverse antigenic variation and resilience to transmission-reducing intervention in falciparum malaria

Stochastic agent-based modelWe extend a previous computational model28 whose key assumptions and processes are summarized below (see the supplementary Methods for further description). Model parameters and symbols are listed and explained in Supplementary Data 1. The model is an agent-based (individual-based), discrete-event, and continuous-time stochastic system in which all known possible future events are stored in a single event queue along with their putative times, which may be fixed or drawn from a probability distribution with a certain rate. We choose values for the rates of these events based on the literature of malaria epidemiology, related field studies, and in vitro or in vivo values (see supplementary Methods and Supplementary Data 1 for further details). When an event happens, it can cause the addition or removal of future events in the queue, or the modification of their rates, resulting in a recalculation of putative times. This approach is implemented with the next-reaction method77, which optimizes the Gillespie first-reaction method78.Individual human hosts die and are replaced with infants with no immunity. The age structure of the human host population follows a truncated exponential distribution with a mean age of 30 years and a maximum age of 80 years. Individual infections and the immune history of individual human hosts are tracked, and evolutionary mechanisms including mitotic/ectopic recombination and mutation are modeled explicitly. We simulate a local host population of a given size which can assume different spatial configurations (see section “Seasonality and spatial configuration of the local transmission system” below). At the beginning of each simulation, a small number of hosts are randomly selected and infected with distinct parasite genomes from the random assembly of var genes from a “regional” pool to initialize local transmission.Each parasite genome is represented as a specific combination of 45 (non-upsA) var genes, and each var gene is considered a linear combination of two epitopes (alleles) based on the empirical description of two hypervariable regions in the var tag region of the DBLα domain57. Mosquito vectors are not explicitly represented as agents in the model. Instead, we consider an effective contact rate (hereafter, the transmission rate, which under some assumptions is effectively equivalent to vectorial capacity) which determines the times of local transmission events. Donor and recipient hosts are selected at random, and transmission occurs if the former carries active blood-stage infections, and the latter has not reached the carrying capacity of its liver-stage. We model ectopic recombination among genes within the same genome during the asexual stage inside the human host. We model meiotic recombination between genomes as occurring at the time of a transmission event, as this process happens in nature during sexual replication within the mosquito vector. See section “Meiotic recombination (or outcrossing) during the sexual stage of infection” in the supplementary Methods for further details. The expression of genes in the repertoire is sequential and the infection ends when the whole repertoire is depleted. Hosts acquire transitory immune memory toward the product of expressed epitopes, and such memory precludes expression of such epitopes in future infections. The total duration of infection of a particular repertoire is therefore proportional to the number of unseen epitopes by the infected host across all individual var genes of the given repertoire. See supplementary Methods for further details of the agent-based model (ABM).Extensions of the original computational model28 implemented for this work concern the regionally-open spatial configuration, ectopic recombination events, and within-host dynamics. In particular, the regional pool for the regionally-open system now updates its genes at a rate reflecting the substitution rate of genes in individual local parasite populations within the region. Ectopic recombination events can now generate truly novel alleles instead of simply shuffling the alleles of the two parental genes. The current formulation of the model further allows ectopic recombination events to be associated with a load such that there is a certain probability of generating non-functional offspring genes which replace functional parental genes. These non-functional genes will not be expressed and hence reduce infection duration of repertoires within infected hosts.Seasonality and spatial configuration of the local transmission systemWe simulate both constant and seasonal transmission dynamics. We implemented seasonality by multiplying a scaling constant by a temporal vector of 360 days, containing the daily number of mosquitoes over a full year. In other words, the temporal vector represents the mosquito abundance, and the scaling constant encapsulates all other parameters involved in vectorial capacity. The product of both gives the effective contact rate. To obtain this temporal vector, Pilosof et al. used79 a deterministic model80 for mosquito population dynamics. The model80 was originally developed for Anopheles gambiae and consists of a set of ODEs describing the dynamics of 4 mosquito stages: eggs, larvae, pupae, and adults. Seasonality is implemented via density dependence at the egg and larva stages as a function of rainfall (availability of breeding sites). We adopt either constant effective contact rates or seasonal ones, which result in EIR or FOI values typical of high-transmission endemic regions81 (see section “Repertoire transmission” and “Within-host dynamics” in supplementary Methods).We simulate transmission dynamics for closed, semi-open, and regionally-open systems. For closed systems, after the initial seeding of local transmission from a pool of var genes, migration is discontinued. Thus, var diversity is generated intrinsically by the dynamics of the ABM for the local population. For semi-open systems, two individual populations coupled via migration are explicitly simulated. They represent empirical transmission systems which are mainly coupled to one or a few neighboring sites. Regionally-open systems represent a local transmission system that is connected with a few to many individual neighboring parasite populations within a broader regional scale. Instead of simulating those individual parasite populations explicitly, we let the focal population receive migration from a regional pool of var genes of a certain size. This regional pool acts as a proxy for regional parasite diversity, i.e., diversity of the aggregate of individual local parasite populations in the region. Because each parasite genome is a repertoire of a given number of var genes, migrant genomes are assembled from random sampling of var genes from the regional pool. Estimation of key parameters concerning the regional pool is described in the following sections.The temporal course of a simulation and the series of IRS interventionsEach simulation follows either two or three stages (Fig. 1a, b): (i) a pre-IRS period during which the local parasite community is assembled and the transmission system reaches a (semi-) stationary state before the intervention; (ii) an IRS period of either 10 or 2 years during which transmission is decreased (referred to as sustained and transient IRS, respectively); (iii) a post-IRS period when transmission rates recover to pre-IRS levels (only applicable to the transient IRS simulation, since the sustained one extends for the full decade).We simulate a series of IRS intervention efforts, which reduce transmission rates to levels that are log-linear along the gradient of transmission intensity (Fig. 1c). The lowest and highest coverage of the sustained IRS intervention efforts correspond to a reduction of 20% and slightly more than 90%, respectively. The highest reduction for transient IRS interventions can reach ~96% for certain scenarios.The IRS intervention efforts are assumed to be regional for open systems. In empirical settings, regional intervention efforts imply that similar efforts are applied to both the local parasite population of interest and the neighboring parasite populations which exchange genomes with the local population via either short-distance or long-range dispersal of mosquitoes or human hosts. All these parasite populations then experience a similar level of reduction in transmission and consequently in their prevalence. As a result, during IRS, the importation of migrant genomes from these neighboring locations also decreases. To implement regional interventions for regionally-open systems, we let migration rates be proportional to the local transmission rate and local prevalence level, both being surrogates of transmission rates and prevalence levels of neighboring populations.We assume the size of the regional pool remains unaffected during and post-intervention because the aggregate of var genes across individual parasite populations should remain high, even when intervention is regional and of high coverage. We do so because this work uses the simulated “regionally-open systems” to represent high-transmission endemic regions in empirical settings, where immigration is known to represent an important challenge to control and elimination efforts. Here, regional interventions rarely eliminate the majority of individual parasite populations.Estimation of the baseline migration rate for the implementation of open systemsWe rely on empirical data on var genes from Bongo District, a high-transmission region in northern Ghana to infer the rate of gene exchange between populations, and to obtain a reasonable estimate of migration rates to implement the open transmission systems. The fast-evolving var genes under immune selection provide a higher resolution than neutral loci for recent migration events. Following the approach by He et al.82, we calculate Jost’s D83, a measure of population divergence for highly diverse genetic markers, to quantify var gene differentiation within and between two proximal catchment areas (i.e., Vea/Gowrie and Soe) in Bongo District for which molecular sequences were previously obtained at the end of the wet/high-transmission season prior to the IRS intervention21,29. Furthermore, under the finite-island model with infinite alleles, the divergence between two populations, D, is proportional to the local innovation (primarily ectopic recombination) rate, and inversely proportional to the migration rate between the two populations83. These relationships are intuitive because local innovation events generate new genes unique to the specific local population and hence promote genetic differentiation and isolation between populations, whereas migration events promote genetic exchange and hence reduce divergence between populations. We estimate m, the percentage of contacts leading to infection due to migration relative to local transmission events, by dividing the local innovation rate, i.e., the ectopic recombination rate of the var genes, by D. This approach was initially proposed for single-locus genes with multiple alleles83. Here we apply it to the var multigene family, effectively treating it as if it were single-locus, and different variants of var as if they were different alleles of this locus. We consider a range of ectopic recombination rates representative of those in nature, although values of the ectopic recombination rate have been measured only in vitro25. We consider the daily rate of ectopic recombination, and the percentage of migration rate per day relative to daily local transmission events. The estimated baseline migration rate is around 0.0026, which translates to a percentage of migrant contacts relative to local ones of around 0.26%. We simulate transmission dynamics for both semi-open and regionally-open systems with migration rates equal to, or an order of magnitude higher than, this estimated baseline.Estimation of the baseline rate at which genes are substituted in the regional poolWe estimate a value for the rate at which genes in the regional pool are substituted. We refer to this rate as the substitution rate hereafter. As the regional pool represents an aggregate of diversity circulating in individual local populations within the region, this rate can be approximated by the substitution rate of genes in individual local populations. Specifically, we can view a regional pool as the aggregate of diversity of a certain number of equally sized individual parasite populations. Note that we consider regional pools consisting of diversity in the form of the effective number of equifrequent genes and we do not specify a frequency distribution of genes in these regional pools. The collection of this effective number of equifrequent genes, in the ideal scenario, would exploit hosts’ immune space equivalently relative to the collection of genes at their original “true” frequency. We use the inverse Simpson diversity index (or, one over the expected homozygosity)84, \(\frac{1}{{\sum}_{i}{{x}_{i}}^{2}}\), to determine the effective number of genes in individual local parasite populations of our simulated closed and semi-open systems, which is at the order of one to two thousand. Moreover, connected individual local parasite populations of these simulated semi-open systems share a significant fraction of their genes when ranked by frequency, around 50% or more. Therefore, we let the number of genes in the regional pool span from several thousand to over tens of thousands, to mimic from a few to about ten to twenty individual local parasite populations. We specifically consider two setups for the regional pool representing roughly the two extremes, referred to as “medium” versus “large” pool scenarios, respectively.The substitution rate of common genes in a local population depends on the local innovation rate, i.e., the rate at which new variants are generated from ectopic recombination and mutation, and the fixation probability, or the invasion probability, of these new variants85,86. Fixation of a new gene variant in this context refers to a single copy of the variant present at a given time point having descendants in the population after a very large amount of time has passed, which typically entails that the frequency of the new variant reaches a certain threshold and becomes sufficiently common so that it is much less susceptible to genetic drift. When the system is at, or close to, a stationary state, the fixation of a new gene variant would imply the loss or replacement of an older common gene variant, hence we term this process and its rate as substitution of genes and substitution rate respectively. The rate of ectopic recombination and mutation has been measured in vitro25. The magnitude of the fixation or invasion probability can be derived on the basis of population genetic models and the simulation output from our ABM. From the two together, we can obtain an expectation for the substitution rate as follows.We write the invasion probability of new variants, or genes, or alleles in the system. At the stationary state, malaria transmission can be described by birth-death processes according to a Moran model with selection. The invasion probability of a low-frequency variant is determined by its fitness advantage relative to that of others85.$${p}_{{inv}}=\frac{{1-(\frac{W}{{W}_{{new}}})}^{n}}{{1-(\frac{W}{{W}_{{new}}})}^{N}}=\frac{{1-(\frac{1}{1+s})}^{n}}{{1-(\frac{1}{1+s})}^{N}}$$
(1)
where n denotes the number of copies of a new gene, N, the parasite population size, \({W}_{{new}}\) and W, respectively the fitness of the low-frequency gene versus that of other genes circulating in the population. Conventionally, the fitness of other genes is normalized to be 1, and that of the new gene is denoted as 1+s. We assume that there is a large enough number of alleles so that any mutation or ectopic recombination would lead to a different allele and the probability of back mutation to the original allele would be low enough to be negligible. Hence, new variants, generated via mutation or ectopic recombination events, always start with a single copy and n = 1. We consider parasite populations in high-transmission endemic regions, hence \(N \, \gg \, 1\). The above equation simplifies to:$${p}_{{inv}} \, \approx \, 1-\frac{1}{1+s}=\frac{s}{1+s}$$
(2)
As described by He et al.87, in the context of malaria transmission, the fitness of a gene is essentially given by its reproductive number, i.e., the number of offspring genes it produces, R. The reproductive number depends on the transmission rate (b), the transmissibility of the given gene (T), and the typical infection duration (τ) of parasites that carry the given gene.$$s=\frac{{R}_{{new}}}{R}-1=\frac{{b}_{{new}}{T}_{{new}}{{{\rm{\tau }}}}_{{new}}}{{bT}{{\rm{\tau }}}}-1=\frac{{{{\rm{\tau }}}}_{{new}}}{{{\rm{\tau }}}}-1$$
(3)
$${p}_{{inv}}=\frac{\frac{{{{\rm{\tau }}}}_{{new}}}{{{\rm{\tau }}}}-1}{\frac{{{{\rm{\tau }}}}_{{new}}}{{{\rm{\tau }}}}}$$
(4)
We consider the infection duration of a strain constituted by “average” genes versus the infection duration of a strain constituted by 1 new gene and g-1 average genes where g is the genome size (i.e., the number of var genes per individual parasite or repertoire). An average gene is a conceptual entity, and the proportion of susceptible hosts to an average gene (\(\bar{S}\)) is the average of the proportion of susceptible hosts to all circulating genes. Because transmission success of a new gene depends on the total infection duration of the strain that carries it, its invasion probability depends on the overall ensemble of genes. We write here the expected invasion probability for a new gene across the different genome backgrounds it may arise in, by considering that this background consists of average genes. Let d be the duration of expression of a gene in a naïve host. Since the proportion of susceptible hosts to any new gene is 1, we have the following:$${p}_{{inv}}=\frac{\frac{{S}_{{new}}d+(g-1)\bar{S}d}{g\bar{S}d}-1}{\frac{{S}_{{new}}d+(g-1)\bar{S}d}{g\bar{S}d}}=\frac{\frac{1-\bar{S}}{g\bar{S}}}{\frac{1+(g-1)\bar{S}}{g\bar{S}}}=\frac{1-\bar{S}}{1+(g-1)\bar{S}}$$
(5)
Note that the expression of \({p}_{{inv}}\) derived here is slightly different from, but more rigorous than, the one in He et al.87 The two expressions are close under certain conditions, for example, when g is large.The values for \(\bar{S}\) typical of high-transmission endemic regions are ~0.4 for the low end and around ~0.8 for the high end, based on our ABM simulation outputs. We can use these ranges of \(\bar{S}\) to calculate the invasion probability of a new gene in an individual local parasite population, which together with the innovation rate (primarily through ectopic recombination) and the number of individual local parasite populations (a few for “medium” pools and more than ten for “large” pools) gives the baseline rate at which common genes are substituted in a regional pool with its specific configuration.For details on the parameterization of the ABM, see supplementary Methods (the section “Selection of values for other parameters”) and Supplementary Data 1.Sampling schemes, measurement error, and curative drug treatment information of empirical dataEmpirical surveys are carried out at a specific sampling depth and under sparse sampling schemes. When sampling individual hosts and their var types in our simulations, we follow the sampling depth implemented in the empirical surveys from Bongo District, namely around 15% of the total human population size21. We further set the sampling period to be once a year for both constant and seasonal transmission (with the former occurring at the end of the year and the latter occurring at the end of the wet/high-transmission season), to emulate the field sampling of hosts primarily at that time of the year, although a few additional surveys were also conducted at the end of the dry (low transmission) season as a baseline for comparison.The collection of empirical data includes a measurement error, specifically concerning the distribution of the number of non-upsA DBLα sequenced and typed per monoclonal infection, accounting for certain potential sampling errors or imperfect detection of var genes21,52. We incorporate this measurement error when sampling from the simulation output by sub-sampling the number of var genes per strain based on this empirical distribution.Moreover, only individuals with microscopy-positive P. falciparum infections (i.e., isolate) were included for sequencing of var genes in the empirical surveys from Bongo District. A subset of the longitudinal surveys from Bongo District includes in addition the submicroscopic infections detected by the more sensitive method of PCR (the 18 S rRNA PCR), which resulted in a considerably higher fraction35. Using surveys for which both microscopy and PCR detection were used, we estimated a conversion factor between the proportion of hosts that are microscopy-positive and those that are PCR-positive of 57%. To address the robustness of our results on molecular indicators to such missing data, we performed the analysis for 57% of all infected hosts originally sampled at the 15% depth.Individuals can also seek and get antimalarial treatment in response to symptoms or the perception of transmission risk. Field questionnaires were conducted along with each Ghana survey, and participants were asked whether they had received an antimalarial treatment in the previous two weeks (i.e., participants that reported they were sick, sought treatment, and were provided with an antimalarial treatment) prior to their blood samples being collected35. Such curative drug treatment can potentially impact the infection status of treated individuals, as well as their MOI and pairwise-type sharing score or PTS between isolates. The precise effects are difficult to disentangle from the expected changes, given the complex biology of the parasite. These treated individuals can be excluded from the analysis when calculating the distributions for molecular indicators, which reduces sample size. The fraction of treated individuals can be high, reaching ~25–50% across different age groups with children exhibiting the highest fraction for the pre-IRS phase21,35. However, our relatively large sample size still leaves us with enough individuals pre-IRS. During and immediately after the IRS intervention, this fraction can be much lower, reaching ~5–20% across different age groups, again with children exhibiting the highest fraction for our site21,35.Shannon Diversity IndexWe calculate and plot var gene diversity with both richness and the Shannon Diversity Index42,43. The Shannon Diversity Index is defined as follows:$${H}^{{\prime} }=-{\sum }_{i=1}^{R}{p}_{i}{\mathrm{ln}}({p}_{i})$$
(6)
where pi is the proportion of the ith type and R is the total number of types. Its expression takes the evenness of different gene types into account. It increases either by more unique gene types, or by a greater gene type evenness. Both simulated and empirical var gene frequency distributions demonstrate that most genes are rare with a few copies circulating in the host population, and a small fraction of genes are more common with more copies21. The Shannon Diversity Index considers this feature when quantifying diversity levels, whereas richness plainly counts the number of genes.Molecular indicatorsA 96% DNA sequence identity cutoff was used to define different var gene types38. Sequence reads from either the same or different isolates with ≤4% sequence differences were grouped into a single consensus or a single type21. We consider the following quantities computed on the basis of var gene types and evaluate how these relate to the rebound dynamics of prevalence across the series of simulations for the sustained interventions.Pairwise type sharing (PTS)The pairwise type sharing (PTS) index describes the degree of shared DBLα types between any two isolates, and is analogous to the Jaccard and Sørensen indices in Ecology38,45,46. We use a directional version of this statistic28 defined as the following:$${{PTS}}_{{ij}}=\frac{{n}_{{ij}}}{{n}_{i}}$$
(7)
$${{PTS}}_{{ji}}=\frac{{n}_{{ij}}}{{n}_{j}}$$
(8)
where \({n}_{{ij}}\) is the total number of types shared by isolate i and isolate j, and \({n}_{i}\) and \({n}_{j}\) are the number of unique types in isolate i and j respectively. This directionality in the formula means that the sharing of a given number of types is relative to the length of the isolate under consideration. A PTS score ranges between 0 and 1, where a score of 0 signifies no shared DBLα types, and a score of 1 indicates complete overlap between the two isolates. We consider several representative quantiles of the PTS distribution (0.01, 0.31, 0.6, and 0.9) to examine the similarity of isolates that are circulating in the population at the same time or across different times. For comparison of PTS distributions across age groups, we limit the calculation to within the same MOI groups and then aggregate across all MOI values to obtain the final distribution. In other words, within each age group, we compute the PTS between individuals of MOI equal to 1, 2, and so on, and then pool the resulting PTS values together to obtain the final PTS distribution. We do so to minimize the effect of different MOI values on PTS.Multiplicity of infection (MOI)Because Plasmodium parasites reproduce asexually as haploid stages within human hosts, signatures of polymorphic genotypes are evidence of multiclonal infection. While any polymorphic marker should thus be suitable in theory for estimating MOI, the typically high number of multiclonal infections in high-transmission endemic regions limits our ability to accurately do so in practice. This is mainly because the diversity of the polymorphic markers can be limited, with two parasites sharing the same marker but carrying distinct sets of antigen-encoding genes. The var multigene family provides one solution. Because of the large number of different genes in local populations and the effect of immune selection (negative frequency-dependent selection), repertoires of var genes in individual genomes exhibit very limited overlap23,28. As a result, a constant repertoire size or number of non-upsA DBLα types in a parasite genome can be used to convert the number of types sequenced in an isolate to an estimate of MOI29,35. In its simplest form of considering a typical repertoire length per parasite, the approach neglects the measurement error introduced by targeted PCR and amplicon sequencing of var genes in an infection. Recently, the method was extended to a Bayesian formulation that does consider this error and provides a posterior distribution of different MOI values for each sampled individual21. We have previously documented the major steps of the Bayesian approach, compared two ways of obtaining a MOI distribution at the population level (by either pooling the maximum a posteriori MOI estimates or calculating a mixture distribution)21, and examined the impact of different priors on the final MOI distribution at the population level. Here, we obtain the estimated MOI distribution at the population level from pooling the maximum a posteriori MOI estimates and from using a uniform prior for individuals.Due to the commonness of multi-genomic infections in high-transmission regions, host prevalence levels are not a good indicator of parasite population size. Summing MOI estimates from this improved “varcoding” approach over sampled individuals with infections, enables the calculation of census population size for the parasite or the total number of infections of relevance to transmission events21.Reporting summaryFurther information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Hot Topics

Related Articles