Inferring the natural history of HPV from global cancer registries: insights from a multi-country calibration

Country scopeFor our study, we include the 30 most populous countries in Sub-Saharan Africa: Nigeria, Ethiopia, Democratic Republic of the Congo (DRC), Tanzania, South Africa, Kenya, Uganda, Angola, Mozambique, Ghana, Madagascar, Cameroon, Côte d’Ivoire, Niger, Burkina Faso, Mali, Malawi, Zambia, Senegal, Chad, Somalia, Zimbabwe, Guinea, Rwanda, Benin, Burundi, South Sudan, Togo, Sierra Leone, and the Republic of Congo. Collectively, these countries comprise 1.1 billion people and experienced ~100,000 cases of cervical cancer in 2020, which equates to ~ 95% of the population and 90% of the estimated total burden of cervical cancer within Sub-Saharan Africa. The life expectancy at birth ranges from 53 (Nigeria) to 68 (Senegal), and HIV prevalence among women aged 15–49 ranges from 0.1% (Somalia) to 24.5% (South Africa). The burden of cervical cancer also varies significantly across these countries, with age-standardized incidence rates (ASIR) per 100,000 women ranging from 10.4 in Niger to 67.9 in Malawi (GLOBOCAN, 2020). Table S1 summarizes these and other key features of the modeled countries. Since our study is a simulation study using publicly available data, informed consent was not required.Model framework and timeframeWe use HPVsim14,15, a flexible software package that we developed for creating agent-based models of HPV transmission and cervical disease progression. A model created using HPVsim consists of a population of agents who interact with each other through sexual networks, over which multiple co-circulating HPV genotypes can be transmitted. Persistent HPV infections can progress to cervical dysplasia, and subsequently regress spontaneously or proceed to invasive cervical cancer. HPVsim can also capture the effects of interventions, including vaccination, screening, and treatment. Using HPVsim, we create models for each of the 30 countries over the period 1960–2020. In the following sections we provide more details on the specific components of the framework and how they have been customized to create models for this study.Modeling populations and networksHPVsim includes default values for all of the parameters needed to create a population of agents and the sexual networks that connect them. However, many of these parameters must be overwritten with context-specific values in order to create a model for a given country. Location-specific demographic parameters for each of the 30 countries are included as part of the standard installation of the HPVsim package, with crude birth rates by year from the World Bank17, age- and sex-specific mortality rates from the UN, and overall population size from the WPP’s mid-fertility projections18. These demographic data allow us to simulate populations of agents over time, which we can then validate against data on population age pyramids (Figure S1). We then customize each country’s sexual network parameters. Specifically, we use the DHS Statcompiler19 to extract aggregated survey data on the age of sexual debut, the proportion of the population who report being married or cohabiting, the proportion of the population who report sex with a non-marital partner, and reported age differences between partners. For each country, we adjust the mean degree distribution of the sexual network for both males and females, and partner age differences. Further parameters are adjusted during the model calibrations, with more details contained in the section on the calibration approach. Figures S2–S6 illustrate the models’ fidelity to these sexual behavior data.Modeling HPV transmissionFor this study, we simulate 4 groups of HPV genotypes. HPV16 and HPV18 are each modeled individually, and in addition we include a pooled group of high-risk types included in the nonavalent vaccine (31, 33, 45, 52, and 58) which we term “Hi5”, and a pooled collection of other high-risk oncogenic types (35, 39, 51, 56, and 59) which we term “OHR”.Modeling interventionsProphylactic vaccination, cervical screening, treatment of precancerous lesions, and treatment of cancer are all established interventions that aim to reduce the burden of HPV and cervical disease. We searched for data on coverage levels of all of these interventions for our models.The proportion of women aged 30–49 in Sub-Saharan Africa who had ever received a cervical screen has been estimated at 15%20,21,22. Across the four countries that had published survey data on precancer treatment, coverage was estimated at around 80%21. Substantial variation in both the coverage levels and the quality of screening programs has been reported. In light of the fact that data availability is so inconsistent, we make the decision not to include any screening or treatment effects in our models. This means that, for any country in which screening and treatment have been effectively operating at scale, we will overestimate the burden of cancers, since some of these cancers would not have occurred if they had been detected and removed early.Several countries within Sub-Saharan Africa had initiated routine HPV vaccination programs by 2020, including Côte d’Ivoire, Ethiopia, Kenya, Malawi, Rwanda, Senegal, South Africa, Tanzania, Uganda, Zambia, and Zimbabwe23. Overall coverage within the countries with vaccination programs was estimated at 20%. Given that these vaccination programs were introduced relatively recently, it will not affect our estimates of cervical disease over the study period (1960–2020), and for simplicity we do not include these programs in our models. This is consistent with the approach taken in other large modeling studies24.Modeling disease progressionBecause we specifically wish to understand the degree to which HPV’s natural history might vary from one country to another, our natural history model must be constructed using parameters that can be informed by country-specific data and not rely on estimates derived from other contexts. The most comprehensive source of cervical cancer data is GLOBOCAN, which publishes IARC-certified cancer registry data where available and produces estimates for countries without registry data25,26. Data on HPV prevalence are also collated by the ICO/IARC Information Centre on HPV and Cancer. Table 1 summarizes data availability by country.
Table 1 Data availability from GLOBOCAN25,26 and the ICO/IARC Information Centre on HPV and Cancer.In the settings that we are modeling, data on the rate of progression of precancerous lesions are scarce, mostly because cervical screening programs are not operating at large scales. As a result, we cannot produce reliable, context-specific, data-driven estimates of the probabilities of progressing from productive infection to low- or high-grade lesions within a given timeframe. We can, however, propose a simple yet biologically-motivated model for the probability of developing cervical cancer over time. Given that we are modeling a binary outcome (cancer/no cancer) over time which is known to depend on the extent and persistence of dysplasia in the cervical cells, we can adopt a standard geometric distribution to approximate the probability of cancer developing over time, i.e.:$$prob(cancer\; begins\; T\; years \;after \;infection|p,\; dys{p}_{T,g}) = 1-(1-{p}_{g}{)}^{dys{p}_{T,g}}$$
(1)
Here, pg can be considered as the (time-constant) probability of a given dysplastic cell becoming cancerous at any point in time, while the exponent dyspT,g represents the cumulative dysplastic cell time (i.e., the sum, at time T, of the number of years each cell has been infected with genotype g). We then need a way to approximate dyspT,g. Before proposing a model for this, we start by considering the general properties that it should have. We know the number of cells with a non-zero probability of becoming cancerous should be close to zero for the first few years of infection to allow time for precancerous lesions to develop, and that as T increases, dyspT,g should be monotonically increasing. A simple ramp function, similar to that depicted in Schiffman et al.27, meets these requirements. We use a smooth approximation of a ramp function, namely a modified softplus function, to approximate the lateral expansion of cells with a non-zero probability of becoming cancerous at T years:$${dysp}_{T,g} = \frac{2log(exp({k}_{g}T)+1)}{{k}_{g}}-\frac{2\text{log}\left(2\right)}{{k}_{g}}-T.$$
(2)
Here, kg is an ‘aggressiveness’ parameter, which varies by genotype and describes how quickly the infection progresses to a potentially oncogenic state within an individual woman. The softplus function described in Eq. 2 has the convenient property that its derivative is a logistic function. This means that our approximation of the growth rate of cells with oncogenic potential can also be considered as the cumulative severity of a lesion whose lateral expansion evolves over time according to:$${dysp}_{t,g} = \frac{2}{1+exp(-{k}_{g}t)}-1, t\in [0,T]$$
(3)
Given that the depth of a lesion is frequently taken as one of the proxy markers for clinical severity, this model of lesion depth is consistent with lesions that progress relatively quickly at first, and then more slowly.Our model so far does not account for individual variations in women’s immunity, which cause cervical disease to progress faster/slower in some individuals than others. We therefore incorporate an “immunocompetence” parameter, which does not vary by genotype but does vary between individuals. We will assume that this acts as a scale factor on the kg terms in Eqs. 2–3. Initially, we will assume that within a given setting, this immunocompetence parameter, which we refer to as relsevi for individuals i = 1…N in a population, is drawn from a normal distribution, i.e.:$$relse{v}_{i,c}\sim N({\nu }_{c}, 0.1)$$
(4)
where the mean νc varies by country c. We set νc to 1 by default, but we estimate it during calibrations later (see section below).In order to generate the durations of infection (T) for each woman, we draw samples from a log-normal distribution whose mean and variance depend on the infecting genotype and are determined by fitting to the expected clearance rates of infection over time3. At the point of infection, we calculate the probabilities given in Eq. 1 and apply them to determine the women who acquire cancer. The remaining women are then scheduled to clear their infections at the conclusion of the duration they were initially assigned.All in all, our natural history model consists of three unknown parameters: pg, kg, and νc. Because Eq. 1 is overdetermined, we fix pg and estimate kg.Calibration approachWe carry out three separate calibrations, differentiated by the number of natural history parameters that we allow to vary. Firstly, we run a constrained calibration in which a single set of the four kg parameters (one per genotype) is estimated for all 30 countries. Secondly, we run an ‘immuno-varying’ calibration, in which we fix the kg parameters to be the same for all 30 countries, but this time allow the mean relative severity (vc in Eq. 4) to vary, implying 30 parameters instead of 4 for the natural history component. Finally, we run an unconstrained calibration in which the natural history parameters are estimated separately for each country. In all three calibrations, we vary 4 sexual behavior parameters per country: the proportion of the population who have extra-marital relationships (prior for males: U(0.1, 0.7), prior for females: U(0.05,0.5)) and the proportion of the population who participate in casual relationships when not married (prior for males: U(0.1, 0.6), prior for females: U(0.01,0.6)). Our prior distributions for HPV16 and HPV18 are kg ~ U(0.25,0.45), and for Hi5 and OHR we use kg ~ U(0.15, 0.4). A given parameter set consists of a single value for each of these parameters, and will produce a given set of model outputs. We calculate the overall mismatch produced by a given parameter set as the sum of the normalized absolute differences between the outputs of the model using that parameter set and the data listed in Table 2.
Table 2 Calibration methods and the number of parameters varied for each.We use a hyperparameter optimization algorithm, as implemented in the Optuna package28, to search the parameter space using a tree-structured Parzen estimator, to find the parameter sets that minimize the overall mismatch. For the immuno-varying and constrained calibrations, we run 20,000 trials and calculate an overall mismatch as the sum of the mismatches for each country. For the unconstrained calibration, we run 3000 trials per country.

Hot Topics

Related Articles