Overcoming bias in estimating epidemiological parameters with realistic history-dependent disease spread dynamics

Development of an SEIR model incorporating realistic gamma distributions of latent and infectious periodsAmong various epidemiological compartmental models that can incorporate the latent and infectious periods of infectious diseases, an SEIR model is one of the most widely- employed models32,33,34,35. It consists of four compartments: individuals susceptible to infection (S); individuals exposed to infection (E), i.e., individuals who have been infected but do not exhibit infectiousness; individuals who exhibit infectiousness (I); and individuals who have been removed from the system (R), i.e., no longer exhibit infectiousness due to, for example, recovery or isolation. The latent period is the time from exposure to infectious pathogens to becoming infectious (i.e., from E to I), and the infectious period is the time from becoming infectious to no longer being infectious (i.e., from I to R). See Methods for a detailed interpretation of the infectious period in this work.To estimate the epidemiological parameters of the SEIR model, we fit the model with observed cumulative confirmed cases using a Bayesian MCMC method based on the Poisson likelihood function (see Supplementary Note 4). This allows the estimation of transmission rate (\(\beta\)), latent period (\({\tau }_{L}\)), infectious period (\({\tau }_{I}\)), and \({{\mathscr{R}}}\). The \({{\mathscr{R}}}\) value is given by \(\beta \times {\mu }_{{\tau }_{I}}\) where \({\mu }_{{\tau }_{I}}\) is the mean of \({\tau }_{I}\). This conventional approach relies on the assumption that \({\tau }_{L}\) and \({\tau }_{I}\) are exponentially distributed since the ODEs are the law of large numbers limits of the stochastic SEIR model with the exponentially distributed \({\tau }_{L}\) and \({\tau }_{I}\)36. Thus, we refer to this approach as the exponential model (EM) approach (Fig. 1b, top).The exponentially distributed latent period implies history-independent transitions between compartments (Fig. 1a(i)). In other words, the chance of becoming infectious for an exposed individual remains unchanged even when the time since exposure increases. However, this assumption of the exponentially distributed latent period is unrealistic because, in reality, the chance of becoming infectious after exposure increases as time since exposure increases (i.e., history-dependent transition) (Fig. 1a(ii)). Thus, it is preferable to assume that the latent period follows a gamma distribution, which implies history-dependent transitions37. Indeed, the latent periods of various diseases follow gamma distributions whose shape parameter is greater than one (Fig. 1a(iii)). Similarly, the infectious periods of various diseases also follow gamma distributions9.To adopt the more realistic assumption of gamma-distributed latent and infectious periods, we developed a new framework, called the gamma model (GM) approach (Fig. 1b, bottom). First, we formulated the DDEs that describe the SEIR model with gamma-distributed latent and infectious periods (see Methods and Supplementary Note 1 for details). Note that this is distinguished from previous works using the fixed time delay rather than distributed time delay38,39,40. Then, we fit observed data using simulated data from the DDEs and estimate \(\beta\), \({\tau }_{I}\), and \({{\mathscr{R}}}\). For parameter estimation, we used the Bayesian MCMC method as we did in the EM approach. To facilitate the use of this approach, we developed a user-friendly computational package, IONISE31, that implements the GM approach, along with a step-by-step manual (see Supplementary Note 2).During the initial phase of disease spread, the GM approach accurately estimates the infectious period distribution and reproduction numberTo compare the estimation results based on the EM and GM approaches, we used the cumulative confirmed COVID-19 cases data in Seoul, South Korea, from 20 January 2020 to 25 November 2020 (Fig. 2). Specifically, we examined three periods where noticeable increases of cases were observed: the initial phase (20 January 2020 to 3 March 2020), the second wave (1 August 2020 to 19 August 2020), and the third wave (1 November 2020 to 25 November 2020). The increases in infections during the initial phase, second wave, and third wave were caused by early disease transmission, a mass gathering on Korean Liberation Day, and a reduced social distancing measure, respectively.Fig. 2: Cumulative confirmed COVID-19 cases in Seoul, South Korea.From 20 January 2020 to 25 November 2020, there were three different periods where noticeable increases in cases were observed: the initial phase (20 January 2020 to 3 March 2020), the second wave (1 August 2020 to 19 August 2020), and the third wave (1 November 2020 to 25 November 2020).During the initial phase (Fig. 3a), contact tracing was intensively conducted to minimize under-reporting of cases. We have used this to estimate the empirical reproduction number, which represents the mean of secondary infections across all primary cases during a specified time period, following a previous study41 (see Supplementary Note 3 for details). These data enable the calculation of the average \({{\mathscr{R}}}\) value, which is 2.7 (Fig. 3b).Fig. 3: During the initial phase of the spread of infectious disease, the gamma model (GM) approach more accurately estimates \({{{\boldsymbol{\tau }}}}_{{{\boldsymbol{I}}}}\) and \({{\mathscr{R}}}\) than the exponential model (EM) approach.a Both the EM and GM approaches fit well to the cumulative confirmed cases of COVID-19 in Seoul during the period when the first confirmed case was reported. b A histogram of secondary cases obtained from the contact tracing data. The average of secondary cases, which represents \({{\mathscr{R}}}\), is 2.7 (triangle). c Boxplots of the posterior samples of β, \({\mu }_{{\tau }_{I}}\) (i.e., the mean of \({\tau }_{I}\)), and \({{\mathscr{R}}}\) estimated by the EM (red) and GM (blue) approaches. Symbols with hats (  ̂ ) represent posterior samples. The posterior mean of \({{\mathscr{R}}}\) from the GM approach is close to 2.7, which was estimated from the contact tracing data (dashed lines). d The distribution of \({\tau }_{I}\) estimated by the GM approach. The blue line represents the mean of distributions with posterior samples, and the gray region represents the 95% confidence interval of estimated distributions of \({\tau }_{I}\). e Both approaches fit well to the simulation data generated from \({\tau }_{L} \sim \Gamma (4.06,\;1.35)\) and \({\tau }_{I} \sim \Gamma (30,\;0.2)\) (inset) with varying \({{\mathscr{R}}}\) (1, 2, and 4). f Boxplots of the posterior samples of \(\beta\), \({\mu }_{{\tau }_{I}}\), and \({{\mathscr{R}}}\). The GM approach accurately estimated \(\beta\), \({\mu }_{{\tau }_{I}}\), and \({{\mathscr{R}}}\), whereas the EM approach overestimated \({\tau }_{I}\) and \({{\mathscr{R}}}\). Here, for each of \(\beta\), \({\mu }_{{\tau }_{I}}\), and \({{\mathscr{R}}}\), the posterior samples were normalized by their true values that were used to generate the simulation data. g The GM approach accurately estimated the distribution of \({\tau }_{I}\) used for generating the data. The mean of distributions with posterior samples (blue line) is close to the true distribution (black line), and the 95% confidence interval (gray region) contains the true distribution. h The same fitting to Fig. 3e was performed with additional information of \({\tau }_{I}\). i When both \({\tau }_{L}\) and \({\tau }_{I}\) are known, both approaches yielded accurate estimations for \(\beta\) and \({{\mathscr{R}}}\). Here, all box plots were depicted using 1000 posterior samples. The whiskers, boxes, and center line represent the 0th, 25th, 50th, 75th, and 100th percentiles.To assess how accurately the EM and GM approaches estimate \({{\mathscr{R}}}\) value and \({\tau }_{I}\) distribution, we fit the data using the two approaches and estimated \(\beta\) and \({\tau }_{I}\). During the estimation, \({\tau }_{L}\) was set to be known because it can be separately estimated from previously reported data16. Specifically, as \({\tau }_{L} \sim \Gamma (4.06,\;1.35)\) is previously reported, we used for the GM approach (Fig. 3a, inset). On the other hand, as the gamma distribution cannot be incorporated in the EM approach, we used \({\tau }_{L} \sim {{\rm{Exp}}}({\left(4.06\times 1.35\right)}^{-1})\), which has the same mean (=5.5 days) to the gamma distribution, as done in previous studies42,43. For the initial values of the compartments S, E, I, and R, we set them to represent that the entire population of Seoul (\(N\approx {10}^{7}\)) is susceptible except for one infectious individual, consistent with the data.While both approaches fit well to observed confirmed cases (Fig. 3a), the estimated parameters were different. The EM approach resulted in smaller estimates of \(\beta\) and larger estimates of \({\mu }_{{\tau }_{I}}\) and \({{\mathscr{R}}}\), compared with the GM approach (Fig. 3c). In particular, the \({\mu }_{{\tau }_{I}}\) value estimated by the GM approach was 5.3 days (95% confidence interval, 4.6–6.4). This estimate closely aligns with the mean \({\tau }_{I}\) value of 5.4 days (4.6–6.3) reported in a previous study that also analyzes the same phase and region44 while the EM approach provided an unrealistically high \({\mu }_{{\tau }_{I}}\) value of 14.4 days (11.8–17.2). Notably, the \({{\mathscr{R}}}\) value estimated by the GM approach, but not the EM approach, was similar to the \({{\mathscr{R}}}\) value calculated from the contact tracing data, 2.7 (Fig. 3c, right, dashed line). This significant difference in the estimations between the EM and GM approaches implies that assuming exponential distributions for the latent and infectious periods can lead to considerable bias. This overestimation of the reproduction number is consistent with previously reported results where assuming exponential distributions led to overestimated reproduction numbers9,45.Furthermore, the GM approach can estimate the distribution of \({\tau }_{I}\) with a more realistic shape compared to the EM approach. The estimated distribution is noticeably different from an exponential distribution (i.e., shape parameter \(\gg 1\)) (Fig. 3d), indicating that the distribution cannot be captured by the EM approach. The distribution of \({\tau }_{I}\) can provide valuable information for understanding the dynamics of infectious diseases, guiding public health responses, and developing effective strategies to control and mitigate their impact on populations46,47. For example, since the density is nearly zero after 10 days, it could be used to set the length of the isolation period to prevent the spread of disease.To systematically analyze the results obtained from the real-world data, we performed data fitting on simulation data generated with various \({{\mathscr{R}}}\) values (1, 2, and 4) (Fig. 3e). For data generation, we used \({\tau }_{L} \sim \Gamma ({\mathrm{4.06,\;1.35}})\), \({\tau }_{I} \sim \Gamma ({\mathrm{30,\;0.2}})\), and initial conditions that mimic the initial phase (Fig. 3a). We estimated the parameters when \({\tau }_{L}\) is known as we did with the real-world data (Fig. 3a–d). While the GM approach yielded accurate estimates, the EM approach underestimated \(\beta\) and considerably overestimated both \({\mu}_{{\tau }_{I}}\) and \({{\mathscr{R}}}\) (Fig. 3f), consistent with the estimation results obtained with the real-world data (Fig. 3c). Importantly, the GM approach accurately estimated the distribution of \({\tau }_{I}\) (Fig. 3g). In particular, the mean of distributions with posterior samples (Fig. 3g, blue line) was close to the true distribution of \({\tau }_{I}\) used to generate the data (Fig. 3g, black line). The true distribution was contained in the 95% confidence interval of estimated distributions of \({\tau }_{I}\) (Fig. 3g, gray region).We investigated whether the overestimations of \({{\mathscr{R}}}\) by the EM approach could be resolved with additional information, specifically knowing the infectious period \({\tau }_{I}\) (Fig. 3h). With this additional information, the EM approach provided accurate estimates of \(\beta\) and \({{\mathscr{R}}}\), similar to the GM approach (Fig. 3i). However, it is rare to have access to information on \({\tau }_{I}\) as it can vary across time, regions, and non-pharmaceutical intervention measures unlike \({\tau }_{L}\)26,27,28,29,30, which can be separately calculated from past contact tracing data.During the middle of the disease spread, \({{\mathscr{R}}}\) estimation with the GM approach is robust to the initial number of exposed individualsAt the beginning of the initial phase, the entire population is susceptible except for one infectious individual. However, during the middle phase of disease spread, the numbers of exposed individuals (\(E(0)\)) and infectious individuals (\(I(0)\)) at the beginning need to be specified. To compare the performance of the EM and GM approaches for estimating epidemiological parameters during the middle of the disease spread, we applied them to the second wave data from Seoul, South Korea (1 August 2020 to 19 August 2020) (Fig. 2). In these data, \(I\left(0\right)=25\) can be determined by the number of individuals who are symptomatic but not confirmed in the contact tracing data. \(R\left(0\right)=1607\) can also be determined by the number of cumulative confirmed cases at the beginning of the wave. However, determining \(E(0)\) is challenging as it requires data on the contact date. Thus, previous studies used various choices of \(E(0)\) based on reasonable assumptions and examined the sensitivity of results from varying \(E\left(0\right)\)48,49,50,51,52. Here, we approximated \(E(0)\) by the product of the mean latent period, 5.5 days, and the number of individuals who became newly infectious at the first day of the period, which is 10. We performed estimation and sensitivity analysis by perturbing the approximated initial value, \(E(0)\), considering the difficulty in knowing the accurate value of \(E(0)\) during the middle period of disease spread. If sensitivity analysis yields robust estimates to perturbed initial numbers of exposed individuals, it signifies the reliability of those estimates despite the uncertainty of the initial numbers.When using varying initial numbers of exposed individuals (i.e., \(0.5E\left(0\right)\), \(E(0)\), and \(1.5E(0)\)) to perform estimations, both approaches fit the data well regardless of the choice of the initial number of exposed individuals (Fig. 4a). However, the estimation results of the EM and GM approaches were different (Fig. 4b). Specifically, the EM approach yielded estimates of \(\beta\) and \({\mu }_{{\tau }_{I}}\) that were smaller and larger respectively than those from the GM approach. For \({{\mathscr{R}}}\), the EM approach yielded estimates (7.5–10.9) that were higher than those estimated by the GM approach (6.2–6.4). Both estimates were comparable to the \({{\mathscr{R}}}\) values (4.7–11.4) reported in super-spreading situations53. Indeed, there had been rapid transmission due to the gathering of more than 20,000 people in Seoul on Korean Liberation Day during the second wave period we analyzed54.Fig. 4: Druing the middle of the spread of the infectious disease, the gamma model (GM) approach, but not the exponential model (EM) approach, provided estimates of \({{\mathscr{R}}}\) robust to the initial number of exposed individuals.a Both the EM and GM approaches fit well to the cumulative confirmed cases data from the second wave of COVID-19 in Seoul when varying initial numbers of exposed individuals were used. b Boxplots of the posterior samples of \(\beta,\;{\mu }_{{\tau }_{I}}\), and \({{\mathscr{R}}}\) estimated by the EM (red) and GM (blue) approaches with three initial numbers of exposed individuals: \(0.5E(0)\), \(E(0)\), and \(1.5E(0)\). The numbers in red and blue are the relative sensitivities (\({S}_{{{\rm{rel}}}}\)) of the estimated values with respect to the varying initial numbers of exposed individuals. c The \({\tau }_{I}\) distributions estimated by the GM approaches have a shape far from exponential distributions (i.e., shape parameters \(\gg 1\)). d The same fitting to Fig. 4a was performed with the simulation data that mimic the real-world data (a). The shape parameter of the \({\tau }_{I}\) distribution (\({k}_{{\tau }_{I}}\)) was set to be 5 for the data generation. e The EM approach underestimated \(\beta\) and overestimated \({\mu }_{{\tau }_{I}}\) and \({{\mathscr{R}}}\), but the GM approach accurately estimated them especially when the initial number of exposed individuals used for the estimation matched the one used for the data generation, \(E(0)\). f The \({\tau }_{I}\) distributions estimated by the GM approach (blue line) accurately captured the true distribution (black line) when \(E(0)\) was used for the estimation. Here, all box plots were depicted using 1000 posterior samples. The whiskers, boxes, and center line represent the 0th, 25th, 50th, 75th, and 100th percentiles.For the sensitivity analysis, we calculated the relative sensitivities of estimates, \({S}_{{{\rm{rel}}}}\), with respect to the initial number of exposed individuals. A \({S}_{{{\rm{rel}}}}\) value of 0 represents complete robustness (i.e., an estimate is independent of the initial number of exposed individuals), and a value of 1 signifies that an estimate is doubled when the initial number of exposed individuals is doubled. Thus, a higher \({S}_{{{\rm{rel}}}}\) value indicates a more sensitive estimate. The estimates of \(\beta\) and \({\mu }_{{\tau }_{I}}\) from both approaches were sensitive to the initial number of exposed individuals (Fig. 4b). Despite such sensitive estimates of \(\beta\) and \({\mu }_{{\tau }_{I}}\), the GM approach provided more robust estimates of \({{\mathscr{R}}}\) (Srel = 0.04) than the EM approach (Srel = 0.37). Furthermore, although the \({\mu }_{{\tau }_{I}}\) values estimated by the GM approach were sensitive to the initial number of exposed individuals, the estimated distributions of \({\tau }_{I}\) consistently showed shape parameters much higher than one, clearly differentiating them from exponential distributions (Fig. 4c). This explains why the EM approach assuming the exponential distribution of \({\tau }_{I}\) leads to different estimations than the GM approach.To systematically analyze the results obtained from the real-world data (Fig. 4b), we performed estimations on simulation data generated from \({\tau }_{I}\) with the shape parameter of 5, which differentiates \({\tau }_{I}\) from an exponential distribution (Fig. 4d). For data generation, we used the DDEs with \({\tau }_{L} \sim \Gamma ({\mathrm{4.06,\;1.35}})\) and \({\tau }_{I} \sim \Gamma ({\mathrm{5,\;1.2}})\) (Fig. 4d, inset) and the initial values that represent the second wave data (Fig. 2): \(E\left(0\right)=55\), \(I\left(0\right)=25\), \(R\left(0\right)=0\), \(S\left(0\right)=N-E\left(0\right)-I\left(0\right)-R(0)\) where \(N={10}^{7}\). Although \(E\left(0\right)=55\) was used for data generation, we set the initial number of exposed individuals as \(0.5E\left(0\right)\), \(E(0)\), and \(1.5E(0)\) for the estimations to perform the sensitivity analysis.Overall estimation results were consistent with the results from the real-world data (Fig. 4b, c). In particular, the EM approach underestimated β and overestimated \({\mu }_{{\tau }_{I}}\) and \({{\mathscr{R}}}\), but the GM approach accurately estimated them especially when the initial number of exposed individuals used for the estimation matched the one used for the data generation, \(E(0)\) (Fig. 4e). Moreover, the \({{\mathscr{R}}}\) values estimated by the GM approach were robust to the initial number of exposed individuals (Srel = 0.04). In contrast, those estimated by the EM approach changed sensitively (Srel = 0.39). Also, the GM approach accurately estimated the distribution of \({\tau }_{I}\) when \(E(0)\) was used to perform the estimation (Fig. 4f).Taken together, the GM approach accurately estimates the \({\tau }_{I}\) distribution when the initial number of exposed is precisely known. Even when \(E\left(0\right)\) is not precisely known, the GM approach still accurately estimates \({{\mathscr{R}}}\).When the infectious period follows a nearly exponential distribution, the EM approach becomes comparably accurate to the GM approachWe also applied both approaches to the third wave data from Seoul, South Korea (1 November 2020 to 25 November 2020) (Fig. 2). The initial values of E, I, and R were determined as in Fig. 5a: \(E\left(0\right)=226,\;I\left(0\right)=173,\) and \(R\left(0\right)=6802\). We performed estimation and sensitivity analysis by perturbing the initial number of exposed individuals, \(E\left(0\right)\). Both approaches fit the observed data well (Fig. 5a) and yielded estimation results similar to each other (Fig. 5b), unlike the results from the second wave (Fig. 4b). Varying the initial number of exposed individuals caused sensitive changes to the estimated \(\beta\) and \({\mu }_{{\tau }_{I}}\) values. However, as the estimated \(\beta\) and \({\mu }_{{\tau }_{I}}\) values changed in opposite directions, the estimated \({{\mathscr{R}}}\) values were robust to the varying initial number of exposed individuals not only for the GM approach (Srel = 0.03) but also the EM approach (Srel = 0.12)\(.\)Fig. 5: When \({{{\boldsymbol{\tau }}}}_{{{\boldsymbol{I}}}}\) follows a nearly exponential distribution, not only the gamma model (GM) approach but also the exponential model (EM) approach resulted in estimates of \({{\mathscr{R}}}\) robust to the initial number of exposed individuals.a Both approaches fit well to the data from the third wave of COVID-19 in Seoul when the varying initial numbers of exposed individuals were used. b The estimates from both approaches are similar to each other. In particular, both approaches were robust to the varying initial numbers of exposed individuals in estimating \({{\mathscr{R}}}\). c The \({\tau }_{I}\) distributions estimated by the GM approaches resemble exponential distributions (i.e., shape parameters \(\approx 1\)). d The same fitting to Fig. 5a was performed with the simulation data that mimic the real-world data (a). The \({k}_{{\tau }_{I}}\) value was set to be 1.25 for the data generation. e Similarly to the estimation results from the real-world data (b), both approaches yielded similar estimates. f The GM approach accurately estimated the \({\tau }_{I}\) distribution when \(E(0)\) was used for the estimation.Since the estimated \({\mu }_{{\tau }_{I}}\) values were sensitive to the initial numbers (Fig. 5b), the distributions of \({\tau }_{I}\) estimated by the GM approach also changed; their right tails became thicker (Fig. 5c). Nevertheless, they had the shape parameters close to one, indicating that the distributions resembled exponential distributions. This explains why the parameter estimates from both approaches were similar to each other. Interestingly, the shape of the distributions was considerably different from the initial phase (Fig. 3d) and second wave (Fig. 4c), where \({\tau }_{I}\) follow non-exponential distributions. This difference stems from the immediate confirmation of infection and subsequent isolation during this period, unlike the initial phase and second wave, as the number of COVID-19 test centers in Seoul increased rapidly to respond to multiple outbreaks at correctional facilities, medical institutions, and religious facilities during this period54. These results are consistent with a previous finding an exponentially distributed infectious period means that the infection is more prone to early extinction55.To systematically analyze the results obtained from the real-world data (Fig. 5b), we performed estimations on simulation data generated from \({\tau }_{I}\) with the shape parameter of 1.25, which represents a near exponential distribution (Fig. 5d). For data generation, we used the DDEs with \({\tau }_{L} \sim \Gamma ({\mathrm{4.06,\;1.35}})\) and \({\tau }_{I} \sim \Gamma ({\mathrm{1.25,\;4.8}})\) (Fig. 5d, inset) and the initial values that represent the middle of the disease spread: \(E\left(0\right)=226\), \(I\left(0\right)=173\), \(R\left(0\right)=0\), \(S\left(0\right)=N-E\left(0\right)-I\left(0\right)-R(0)\) where \(N={10}^{7}\). Although \(E\left(0\right)=226\) was used for data generation, we set \(0.5E\left(0\right)\), \(E(0)\), and \(1.5E(0)\) for the estimations to perform the sensitivity analysis.Overall estimation results were consistent with the results from the real-world data (Fig. 5b, c). In particular, both approaches yielded estimates of \(\beta\) and \({\mu }_{{\tau }_{I}}\) that were sensitive to the initial number of exposed individuals but resulted in accurate and robust estimates of \({{\mathscr{R}}}\) due to the compensatory effect between \(\beta\) and \({\mu }_{{\tau }_{I}}\) (Fig. 5e). Additionally, when \(E(0)\) was used to perform the estimation, both approaches resulted in accurate estimates of \({\mu }_{{\tau }_{I}}\) and the distribution of \({\tau }_{I}\) (Fig. 5f).Taken together, when \({\tau }_{I}\) follows a nearly exponential distribution, the EM approach becomes comparably accurate to the GM approach in estimating \({\tau }_{I}\) and \({{\mathscr{R}}}\). Specifically, both approaches yielded accurate estimates of \({{\mathscr{R}}}\) regardless of the choice of the initial number of exposed individuals. Also, they resulted in accurate estimates of \({\tau }_{I}\) when \(E(0)\) was precisely known.Comparison of simulated generation time with intrinsic generation timeFor analyzing the real-world data, we used \(\Gamma ({\mathrm{4.06,\;1.35}})\) for the latent period, and the estimated mean infectious periods span from 3 to 7 days (Figs. 3–5). With these estimated infectious periods, we calculate the generation time, which is the time interval between infections in a primary case and a secondary case, using a stochastic simulation of the SEIR model with time delays56. Specifically, we tracked the exact time of infection, the infector, and the infectee for every single infection (Supplementary Fig. 1a). From this, we can directly calculate the generation in an almost fully susceptible, homogeneously mixed population. The mean generation times estimated by our simulation are 5.9–7.8 days (Supplementary Fig. 1b), aligning with previously reported intrinsic generation times of 5–7 days for COVID-19 computed under the assumption of a fully susceptible, homogeneously mixed population57,58. On the other hand, our simulated generation time is shorter than the realized generation time of 3–5 days, estimated in the realistic network of contacts.The GM approach is scalable to other distributions and an extended model with complex dynamicsSo far, we have demonstrated that the GM approach gives more accurate estimates than the EM approach. Here, we present the scalability of our framework by adopting general distributions, such as lognormal and Weibull distributions, because these other distributions are often used to describe the latent and infectious periods16,59,60. We generated data and performed inference when the latent and infectious periods in the SEIR model follow the exponential, gamma, lognormal, and Weibull distributions (Fig. 6a). In all cases, our framework provides accurate estimates of \(\beta\), \({\mu}_{{\tau }_{I}}\), \({{\mathscr{R}}}\), and the distributions of \({\tau }_{I}\) (Fig. 6b, c). These results indicate that our framework has an advantage compared to the widely used linear chain trick as the trick can describe only Erlang distributions (i.e., gamma distributions with integer shape parameters). Note that the inference for the general distributions can be easily done by simply changing the type of distribution in our user-friendly package IONISE (see Supplementary Note 2 for the detailed manual)31.Fig. 6: The SEIR model with general infectious and latent period distributions.a After assuming that the distributions of latent and infectious periods follow the exponential, gamma, lognormal, and Weibull distributions, we simulated cumulative confirmed cases (circles) with the SEIR models. Then, these data were fitted with the SEIR models by using IONISE (dashed lines). Here, for the latent period, we used \({Exp}({5.5}^{-1})\), \(\Gamma ({\mathrm{4.06,\;1.35}})\), \({Lognormal}(1.60,\;{0.47}^{2})\), and \({Weibull}({\mathrm{2.12,\;6.21}})\) based on the reported mean of 5.5 and variance of 7.416. For the infectious period, we used \({Exp}({6}^{-1})\), \(\Gamma ({\mathrm{1.67,\;3.60}})\), \({Lognormal}(1.67,\;{0.50}^{2})\), and \({Weibull}({\mathrm{1.98,\;6.77}})\), which have the mean of 6 and variance of 10. Note that the exponential distributions have the variances \({5.5}^{2}\) and \({6}^{2}\) instead of 7.4 and 10, respectively, as they have only one adjustable parameter. b, c IONISE accurately estimated the transmission rate (\(\beta\)), mean infectious period (\({\mu}_{{\tau }_{I}}\)), reproduction number (\({{\mathscr{R}}}\)), and infectious period distributions. The gray region represents the 95% confidence interval of estimated distributions (c). Here, all box plots were depicted using 1000 posterior samples. The whiskers, boxes, and center lines represent the 0th, 25th, 50th, 75th, and 100th percentiles.In the SEIR model with delays (Figs. 3–6), we assumed that confirmation is the primary factor for losing infectiousness. This allowed us to fit the confirmed case data to the compartment R (i.e., individuals who have lost infectiousness). However, this assumption may not always be valid. For instance, some infectious individuals might remain unreported or may not be isolated even after confirmation. In such cases, extended models incorporating the additional factors need to be used. Here, we demonstrate the scalability of our inference framework with an extended model with 11 compartments and 17 reactions (Fig. 7a) (see Supplementary Note 5 and Table 1 for a detailed model description). This model is capable of describing complex dynamics, such as those involved in variants, vaccinations, under-reporting of cases, and infection fatality rates. In this extended model, both the EM and GM approaches fit the data well (Fig. 7b), but only the GM approach accurately estimated \(\beta\), \({\mu}_{{\tau }_{I}}\), \({{\mathscr{R}}}\) (Fig. 7c), and the distribution of \({\tau }_{I}\) (Fig. 7d). Using the model, we investigated detailed epidemiological characteristics. Specifically, we calculated the fraction of unreported cases (Fig. 7e), the proportion of the vaccinated population (Fig. 7f), and the weekly infection fatality rate (Fig. 7g). Notably, the weekly infection fatality rates computed by both approaches were significantly different. Furthermore, other characteristics, such as the number of populations infected by the variant or the case fatality rate, could be readily computed by the model. Also, stochastic simulations of extended models can be performed to track the status of each individual (e.g., susceptible, exposed, or infectious), enabling the estimation of various epidemiological time intervals (e.g., generation time), as demonstrated in Supplementary Fig. 1. As indicated by this extended model, our framework is scalable to other epidemiological models. This extension can be done by adjusting the model specifications step in our user-friendly computational package, IONISE (see Supplementary Note 2 for details)31.Fig. 7: Extended model with vaccination, variants, asymptomatic populations, and fatality.a The model consists of 11 compartments and 17 reactions (see Supplementary Note 5 and Table 1 for a detailed model description). Among these 17 reactions, 8 reactions, which are denoted by \({\tau }_{*}\), involve delay distributions. b–d While both exponential model (EM) and gamma model (GM) approaches fit the cumulative confirmed cases data well, only the GM approach accurately estimated the parameters and infectious period distribution. The colored lines represent the mean of distributions with posterior samples, and the colored regions represent the 95% confidence interval of infectious period distribution (d). Here, all box plots were depicted using 1000 posterior samples. The whiskers, boxes, and center line represent the 0th, 25th, 50th, 75th, and 100th percentiles. e–g Using this model, we computed the fraction of unreported cases, the fraction of the vaccinated population, and the weekly infection fatality rate. Notably, the weekly infection fatality rates computed using both approaches were significantly different.

Hot Topics

Related Articles