Plasmodium falciparum population dynamics in East Africa and genomic surveillance along the Kenya-Uganda border

SNPs, genome-wide population data and multi-clonalityTo assess the population structure of East African P. falciparum populations, SNP variants were identified from the WGS data of 599 P. falciparum isolates. These isolates were collected from Bungoma county in Western Kenya (n = 30/38), multiple points around the Lake Victoria basin (n = 160), Kenya (n = 74), Tanzania (n = 323), and Uganda (n = 12) (Supplementary Table S1; Supplementary Table S2). A total of 710,552 high-quality SNPs were called from non-hypervariable regions of the P. falciparum genome. The Fws metric, or mean inbreeding coefficient, was calculated for each East African subpopulation to determine the proportion of complex infections amongst isolates and assess their within-host population diversity or assumed risk of out-crossing/inbreeding. Monoclonal P. falciparum isolates exhibit “high” Fws estimates ≥ 0.95. Samples from Western Kenya (n = 30) had a mean Fws coefficient of 0.851, with only 6 isolates exhibiting “high” Fws estimates (Supplementary Figure S1). Low mean Fws estimates are generally associated with higher proportions of complex infections and a high degree of panmixis within the population, which is common in Kenyan P. falciparum isolates from high transmission regions. In general, subpopulations across East Africa had mean Fws coefficients ranging 0.738 to 0.907, with the lowest value (0.738) observed for isolates along the Kenyan Lake Victoria mainland, where transmission is high and stable throughout much of the year.
P. falciparum isolates from highland epidemic outbreaks form distinct transmission clustersAcross the isolates of Bungoma county (n = 30), we identified 168,559 high-quality SNPs in non-hypervariable regions of the P. falciparum genome. A SNP-based principal component analysis (PCA) and maximum likelihood tree of these populations revealed one general population cluster consisting of 19 isolates (Cluster 1) and two distinct clusters comprising 7 isolates (Cluster 2) and 4 isolates (Cluster 3) (Fig. 1A, B). These observations suggest that the P. falciparum isolates in Clusters 2 and 3 may be highly genetically related or clones of each other.Figure 1Population structure of P. falciparum isolates from highland epidemic outbreaks in Western Kenya. (A) Maximum likelihood tree of 30 isolates from Western Kenya (284,667 genome-wide SNPs). (B) PCA of the genetic pairwise matrix used to generate the maximum likelihood tree, with clusters coloured accordingly. (C) Pairwise identity-by-descent (IBD) connectivity plots between clusters, highlighting high levels of IBD (IBD > 47.5%) between clusters in Western Kenya.Bungoma county spans both lake-endemic and highland epidemic zones, based on malaria endemicity classifications. Highland epidemic zones are prone to malaria outbreaks that can result in genetically similar P. falciparum transmission clusters16. To assess the relatedness of the 30 isolates, pairwise Identity-by-descent (IBD) was calculated to measure the proportion of pairs identical by descent at each SNP. Parasites sharing high proportions of long, unbroken segments of their genome, or demonstrating high levels of IBD, are generally classified as clones (IBD > 99%) or siblings (IBD > 47.5%). In contrast, more distantly related parasites share very few, shorter genome fragments17,18. Out of the 30 isolates, 25 passed quality thresholds required for IBD analysis. Parasites in the main population group (n = 14) demonstrated low levels of pairwise IBD (IBD < 2.97%), consistent with complex infections typically observed in high transmission regions. In contrast, isolates in Cluster 2 (n = 7) and Cluster 3 (n = 4) exhibited high IBD (IBD > 47.5%) with one another, classifying them as siblings (Fig. 1C).Drug resistanceNon-synonymous mutations in genes associated with resistance to antimalarial drugs were analysed. The maximum Fst value for each SNP was calculated by comparing Western Kenyan isolates (n = 38) with other East African ones (n = 460) (Supplementary Table S3). Compared to the genome-wide analysis (n = 30), 3 to 8 additional isolates were potentially included in the analysis of candidate drug resistance loci, depending on whether variants were covered at least fivefold by sequence data. Variant frequencies from Western Kenyan isolates were also compared with those from the Kenyan region of Lake Victoria (n = 109) and West Africa (n = 50). A non-synonymous SNP on the PfK13 gene resulting in the variant V568I was identified in 1 out of 37 Western Kenyan isolates. While the V568G variant has been identified as an in vitro candidate marker of reduced susceptibility to artemisinin, the impact of the V568I variant on artemisinin tolerance has not yet been characterised. An in-silico protein model of Pfk13 was generated, including the wild-type position V568, the WHO candidate variant V568G, and the V568I variant observed within the Western Kenyan isolate. This analysis suggested that V568I could be a variant of concern (Supplementary Figure S2), but functional characterisation using experimental approaches is required to confirm this.Resistance markers associated with chloroquine resistance were significantly reduced in Western Kenyan isolates (n = 38) compared to other East African isolates (n = 460). Specifically, only 1 out of 36 Western Kenyan isolates (2.8%) contained the primary Pfcrt biomarker for resistance, K76T, whereas 14.1% of East African isolates (65/460) contained this variant. Regarding Pfmdr1, the N86Y variant was absent in Western Kenyan isolates but had a minor allele frequency of 5.9% in East African isolates. It is hypothesised that the reference N86 is selected for by the use of lumefantrine and may reduce susceptibility to lumefantrine, piperaquine, and mefloquine19. Additionally, the Y184F and D1246Y variants were observed at frequencies of 54.5% (18/33) and 10.8% (4/37) in Western Kenyan isolates, respectively, compared to 37.7% (173/460) and 5% (23/460) in East African isolates. Although the Y184F variant is not significantly associated with reduced susceptibility to lumefantrine, it is believed to be genetically correlated with the acquisition of a drug-resistance phenotype19.Variants associated with resistance to sulphadoxine and pyrimethamine on Pfdhps and Pfdhfr, respectively, were identified at high frequencies, consistent with other East African populations20. Variants N51I, C59R, and S108N on Pfdhfr were observed in 100% of isolates from Western Kenya, while the I164L variant was identified in only 1 out of 38 isolates (2.2%). Variants on Pfdhps were also observed at high frequencies within Western Kenyan isolates, with S436H, G437A, and K540E occurring in 21.6%, 100%, and 87.1% of isolates, respectively. Haplotype analysis was performed for variants on Pfdhfr (N51I, C59R, S108N, and I164L) and Pfdhps (S436H, G437A, K540E, and A581G) within parasites containing a read depth of at least fivefold at each position (n = 31). The wild-type haplotype, NCSISGKA, was not observed in any of the screened isolates. The quintuple mutant IRNISAEA accounted for 74.2% of isolates (23/31). The sextuple mutant IRNIHAEA was observed in 7 isolates, while another sextuple mutant, IRNLSAEA, was observed in a single isolate (1/31).East Africa has P. falciparum subpopulations with distinct genetic structureA SNP-based maximum likelihood tree of 599 isolates from East Africa (Fig. 2A) revealed several subpopulations within the larger East African P. falciparum population (Fig. 2B). Isolates from the Lake Victoria region in Kenya (i.e., Kisumu, Mfangano island, Ngodhe island, and Suba district) (n = 109) clustered with isolates from Western Kenya (n = 30) and Uganda (n = 12), forming a distinct group separate from other East African subpopulations. Isolates from the Tanzanian region of Lake Victoria and Lake Tanganyika grouped more closely with isolates from North East Tanzania than with those from the Kenyan region of Lake Victoria. This population structure, identified by the maximum likelihood tree, was supported by a PCA, which also showed the separation of Kenyan Lake Victoria isolates, along with those from Western Kenya and Uganda, from other East African populations (Fig. 2C, D, E). There were no strong temporal trends to the clustering (Chi-squared P > 0.05), though it is important to note the limitations of using aggregated data in this context15.Figure 2Genomic structure of P. falciparum isolates from East Africa form subpopulations. (A) Heatmap of P. falciparum incidence rates in 2020 across Kenya, Tanzania and Uganda with sampling sites and artemisinin resistance locations annotated (generated using malariaAtlas R-software). (B) A maximum likelihood tree for 587 isolates from Central Uganda, Eastern Kenya, Lake Victoria, Lake Tanganyika, North East Tanzania, South East Tanzania, and Western Kenya, based on 710,552 high-quality genome-wide SNPs. (C, D, and E) Principal component analysis (PCA) of East African subpopulations, showing the separation of isolates in PCs 1, 2, and 3.Ancestral admixture analysis reveals diverse ancestral origins of East African subpopulationsTo evaluate the ancestral origins of East African P. falciparum subpopulations, genome-wide SNPs from isolates collected across the African continent (640,596 SNPs; n = 365 isolates) were analysed to infer ancestral genotype frequencies. These frequencies were combined with geographical coordinates to produce spatial models of allele sharing. Isolates were sourced from East Africa (Kenya, Tanzania, and Uganda; n = 218), West Africa (Guinea and The Gambia; n = 47), the Horn of Africa (Ethiopia; n = 25), Central Africa (Cameroon; n = 25), South Central Africa (Democratic Republic of Congo; n = 25), and Southern Africa (Malawi; n = 25) (Supplementary Table S2). With the optimum number of ancestral populations (K value) estimated to be 6 (K1–K6), the resulting admixture analysis revealed that isolates from East African subpopulations have distinct ancestral origins from one another, including two ancestral populations that appear independent from the wider African populations (Fig. 3). A maximum likelihood tree, PCA, and IBD connectivity plot were generated using the same genome-wide SNPs incorporated into the ancestral admixture analysis, supporting the identified population structure (Fig. 3B; Fig. 4; Supplementary Figure S3).Figure 3Genome-wide ancestral admixture analysis of East African P. falciparum subpopulations and regional parasite populations from across Africa. (A) Geographic map displaying ancestry coefficients, where K is estimated to represent 6 distinct ancestral populations across Africa. (B) Maximum likelihood tree of 363 isolates, based on 640,596 genome-wide SNPs, coloured according to their predominant K proportion. (C) Barplot showing ancestry proportions for each isolate (rows) within each subpopulation (columns).Figure 4Connectivity between African P. falciparum ancestral populations. (A, B) Principal component analysis (PCA) generated based on pairwise genetic distance matrices of 640,596 high-quality genome-wide SNPs from 363 P. falciparum isolates. (C) Pairwise identity-by-descent (IBD) connectivity plots for isolates with an Fws value > 0.90 (n = 293).The K6 ancestral population exhibits high proportions of shared ancestry among isolates from Western Kenya (K6 proportion; 69.7%), Central Uganda (53.2%), and Lake Victoria isolates from Kenya (islands = 69.6%; mainland = 69.7%). In contrast, the K4 ancestral population, which is also observed in isolates from South Central Africa (e.g., DRC), differentiates Tanzanian Lake Victoria isolates (K4 proportion; 59.4%) from Kenyan Lake Victoria isolates (islands = 2.6%; mainland = 2.6%), Western Kenya (2.4%), and Uganda (3.0%) (Supplementary Figure S4). The K5 ancestral population links isolates from Eastern Kenya (proportion K5; 47.5%), North East Tanzania (55.5%), and Lake Tanganyika Tanzania (62.23%), distinguishing them from South East Tanzania (4.4%) and the Tanzanian Lake Victoria isolates (2.7%). The K2 ancestral population is prevalent in Southern Africa (86.4%) and is also present in South East Tanzanian isolates (25.6%) and Eastern Kenyan isolates (20.5%). South East Tanzanian isolates show a more mixed ancestry, with proportions of K1 (10.8%), K2 (25.6%), K3 (1.8%), K4 (4.4%), and K6 (8.4%). The Horn of Africa displays distinct ancestral origins from other African regions, with K3 representing the highest proportion of ancestry (91.6%).IBD analysis of ancestral populations reveals similar regions of homology across AfricaIBD statistics were calculated to characterise the structure of East African subpopulations at the chromosome level, by measuring the proportion of pairs identical by descent at each SNP across all isolates within the population. IBD was assessed according to the ancestral populations (K1–K6) identified through admixture analysis, due to the strong correlation between geographical proximity and genetic structure. As expected, isolates assigned to the K3 ancestral population (Horn of Africa) exhibited the highest fraction of pairwise IBD across the genome (mean = 0.08568, range = 0—0.85013), reflecting high genetic relatedness and genomic conservation among these isolates (Supplementary Table S4). In contrast, isolates from K6 (Western Kenya, Lake Victoria Kenya, and Central Uganda), K5 (Eastern Kenya, Lake Tanganyika Tanzania, and Lake Victoria Mainland Tanzania), and K4 (Lake Victoria Tanzania, South East Tanzania, and South Central Africa) displayed the lowest fractions of IBD, indicating lower genetic relatedness or reduced conservation of genomic regions (K6: mean = 0.00899, range = 0—0.12167; K5: mean = 0.01065, range = 0—0.09221; K4: mean = 0.00674, range = 0—0.01646). Genome-wide chromosome-level IBD values for ancestral populations (K1–K6) are presented (Supplementary Figure S5).The top 5% of IBD positions for isolates from the K6 ancestral population were distributed across 16 regions on 4 chromosomes. For the K5 population, IBD positions were spread across 24 regions on 5 chromosomes, and for the K4 population, they were found in 7 positions on 3 chromosomes (Supplementary Table 5). Notably, within the K4 population, one of the IBD regions on chromosome 7 encompassed the Pfcrt gene, which is associated with partial resistance to chloroquine, a drug no longer prescribed as treatment for P. falciparum. In the K5 population, a region on chromosome 8 included the Pfdhps gene, which confers partial resistance to sulphadoxine-pyrimethamine (SP).Selection between P. falciparum subpopulations in East AfricaTo identify variants under positive directional selection between subpopulations within East Africa, we analysed the genome-wide haplotype structure of isolates to pinpoint regions of high local homozygosity relative to neutral expectations. The integrated haplotype score (iHS) test statistic was used to detect regions with high local homozygosity, indicating positive selection within a single population (Supplementary Figure S6 A-C; Supplementary Table S6). Cross-population selection pressure was assessed using the Rsb metric, which compares extended haplotype homozygosity between populations (Supplementary Figure S6 D-F; Supplementary Table S7).As commonly observed, SNPs under within-population positive selection in parasite populations were often in genes associated with host immune response and parasite immune evasion. Within K5 associated isolates (i.e. North East Tanzania, Lake Tanganyika, and Eastern Kenya), 5 genes of interest were identified with SNPs exhibiting significant iHS values ((− log10[1–2|ΦiHS–0.5|]) > 4.0). These included the heat shock protein 40 (HSP40), which is believed to play a role in parasite pathogenicity, and CX3CL1-binding protein 1, which is linked to the cytoadherence of infected erythrocytes (Supplementary Table S6). Cross-population selection analysis revealed common loci in comparisons across ancestral groups. Notably, a region on chromosome 6 containing a putative BFR1 domain-containing protein (PF3D7_0617200) and the AP-2 complex subunit alpha gene (PF3D7_0617100) showed significant differences between isolates from K1, K2, K4, K5 and K6 (number of SNPs: range 84–105; mean Rsb values: range 0.785–1.490) (Supplementary Table S7). Comparisons of K4 and K5 with K6 also identified positive selection in regions on the merozoite surface protein 3 encoding gene (Pfmsp3), known to be an important mediator of antibody responses and a candidate for malaria vaccines. Within the K4 population, 367 markers on the Pfubp1 gene were identified with a mean Rsb of 0.419 when compared with K5, while 320 markers were identified within the K4 population when compared with K6, with a mean Rsb of 0.385.

Hot Topics

Related Articles