Rationalised experiment design for parameter estimation with sensitivity clustering

Implementation of ABC-FARThe ABC-FAR method iteratively selects parameter combinations and either rejects or accepts them based on the associated \(\chi ^2\) values, which measure the difference between the data (\(R_{D}\)) and the corresponding model prediction (\(R_{P}\)). We utilise the \(\chi ^2\) statistics to enhance the goodness-of-fit as proposed earlier34. The \(\chi ^2\) is determined as: \(\chi ^2 = \sum \frac{(R_{D} – R_{P})^2}{\sigma _{D}^2}\), where the sum is taken over all data points and \(\sigma _{D}\) represents the standard deviation in data \(R_{D}\).In each iteration, a noisy version of the current estimate of the distribution is sampled. The magnitude of the added noise is reduced with each iteration. Let \(p^{(k-1)}_{\theta _j}\) be the current estimate of the marginal of the parameter \(\theta _j\) at the \(k^{th}\) iteration. The noisy distribution (\(m^k_{\theta _j}\)) is then created using a uniform distribution (\(U_{\theta _j}\)) in the following way:$$\begin{aligned} m^k_{\theta _j} = \frac{0.25}{k+1} \times U_{\theta _j} + \left( 1 – \frac{0.25}{k+1}\right) \times p^{(k-1)}_{\theta _j}. \end{aligned}$$where for k > 1, \(p^{(k-1)}_{\theta _j}\) is the posterior obtained from the (k-1)\(^{th}\) iteration and \(p^{0}_{\theta _j}\) denotes the prior or initial guess. The support of the uniform distribution matches the bounds of the prior distribution, ensuring that parameter estimates remain within these bounds.We use Latin Hypercube Sampling (LHS35) for sampling each parameter independently from its marginal. Although this approach improves robustness, it might slow down convergence. Gradient noisy sampling, as discussed above, aids in exploring the parameter space during early iterations without significantly hindering the convergence in later iterations.In every iteration, we apply a history-dependent update strategy (HDUS) that recalls all the parameter combinations selected in the prior iteration while comparing the parameter combinations sampled in the current sampling step. This aids in preserving the joint distribution to a certain degree, ensuring that advantageous parameter combinations are not overlooked, and securing a monotonic reduction in the \(\chi ^2\) statistic. Alternatively, we can use a history-independent update strategy (HIUS), which does not retain previously chosen combinations (see Supplementary Figs. S6 and S7). Although HIUS conserves computational memory, it does not guarantee convergence under weak rejection conditions (high FAR values).To identify computational artifacts and evaluate statistical significance, we apply and compare the ABC-FAR results of a dummy parameter. In addition, we evaluate the Pearson’s correlation coefficient (PRCC, Supplementary SI S5) and linear regression between pairs of chosen parameter combinations. The PRCC estimates practical non-identifiability, while the regression indicates the expected parameter adjustment needed to counterbalance changes in the \(\chi ^2\) function due to fluctuation in the corresponding partner parameter value. These properties are linear and local, depending on the values of the other model parameters.Implementation of PARSECMeasurement candidates and parameter samplingLet \({\mathcal {M}} = \{M_1, M_2, \ldots , M_n\}\) represent the set of all feasible measurement candidates, selected based on experiment design specifications. Further, let \({\mathcal {V}}\) denote the parameter space defined by the known uncertainty. A combination of parameters is a p-component vector, \(\Theta = [\theta _1, \theta _2, \ldots , \theta _p]^\text {T} \ \in {\mathcal {V}}\), where \(\theta _i\) denote the value of the \(i^\text {th}\) parameter of the model. Here again, we use Latin Hypercube Sampling (LHS) to sample this space to generate a set of training samples \({\mathcal {S}} = \{\Theta _1, \Theta _2, \ldots , \Theta _k\}\), where \(\Theta _i\) is a sampled parameter vector.Parameter sensitivity analysisFor each candidate measurement \(M_i \in {\mathcal {M}}\) and each sample \(\Theta _k \in {\mathcal {S}}\), we calculate the Parameter Sensitivity Index (PSI) vector using the extended Fourier Amplitude Sensitivity Test (eFAST)38,39. Briefly, eFAST is a global sensitivity analysis method that uses variance decomposition to assess the influence of input parameters on model outputs. It evaluates sensitivity index as the ratio of expected variance in output variable in two conditions: (a) when only the parameter of interest changes, and (b) when all model parameters vary (more details in38,39).Let \(\psi (M_i, \Theta _j, \theta _i)\) denote the PSI of candidate \(M_i\) due to fluctuations in the values of the \(i^\text {th}\) parameter, \(\theta _i\), evaluated at training sample \(\Theta _j\). The corresponding PSI vector for \(M_i\) at \(\Theta _j\), \(\Psi (M_i, \Theta _j)\), is given by\(\Psi (M_i, \Theta _j) = [\psi (M_i, \Theta _j, \theta _1), \psi (M_i, \Theta _j, \theta _2), \ldots , \psi (M_i, \Theta _j, \theta _p)]^\text {T}\)All of these vectors are combined to form the PARSEC-PSI vector,\(\text {PARSEC-PSI}(M_i) = [\Psi (M_i, \Theta _j), \Psi (M_i, \Theta _j), \ldots , \Psi (M_i, \Theta _k)]^\text {T}\)Clustering and selection of the measurement candidatesWe next apply a clustering algorithm (either k-means or fuzzy c-means) to the set of PARSEC-PSI vectors. We consider the Euclidean distance between two PARSEC-PSI vectors for clustering the measurement candidates into a pre-defined number of clusters, represented by \(C_1, C_2, \ldots , C_k\). The clustering criterion aims to minimise intra-cluster distances, formally given by$$\begin{aligned} \min \sum _{i=1}^{k} \left\{ \sum _{\text {PARSEC-PSI}(M_i) \in C_i} \text {distance}(\text {PARSEC-PSI}(M_i), \mu _{C_i}) \right\} \end{aligned}$$where \(\mu _{C_i}\) is the centroid of cluster \(C_i\). For PARSEC(c) using fuzzy c-means, we select measurement candidates based on the highest membership probability from each cluster. For PARSEC(k) using k-means, a candidate is randomly selected from each cluster.Design evaluation and parameter estimationWe evaluate designs at different parameter values \(\Theta = [\theta _1, \theta _2, \ldots , \theta _p]^T \in {\mathcal {V}}\) by simulating the model. Let \(D(G, \Theta )\) denote the dataset generated emulating design G at \(\Theta\). We employ ABC-FAR to estimate parameters using \(D(G, \Theta )\). Let \({\hat{\theta }}^m_j\) denote value of the \(j^{th}\) parameter of the \(m^{th}\) parameter combination selected in the last iteration of ABC-FAR. We calculate the estimation error as:$$\begin{aligned} \text {Estimation error} = \sqrt{ \sum _{j = 1}^{r} \left\{ \sum _{m = 1}^{M} (\theta _j – {\hat{\theta }}^m_j)^2 \right\} }. \end{aligned}$$r and M indicate the number of free model parameters to be estimated and the number of parameter combinations selected by ABC-FAR, respectively.For PARSEC, we employ ABC-FAR to refine the posterior five times, beginning with a prior uniform distribution on a logarithmic scale that spans two orders of magnitude. We use the history-dependent update strategy and a FAR value of 0.01 (unless otherwise specified). In each iteration, we sample 60,000 parameter combinations using a noisy version of the current estimate of the marginal (as described in the preceding section). More details on PARSEC implementation and design evaluations are provided in the supplementary SI S1.DoE for a three-gene repressilator system using PARSECWe investigate the behavior of a three-gene repressilator system, a synthetic genetic circuit defined by the cyclic inhibition of gene expression37. By employing an ordinary differential equation (ODE) model as described in Supplementary SI S3, we track the concentrations of the proteins produced by these genes.To mirror the constraints of the experiment, only measurements of protein B and C are permitted. Furthermore, to emulate real-world conditions, the entire duration of the experiment is limited to 72 hours, and measurements of both proteins must be taken simultaneously. As an example, we design and compare sets of six concurrent measurements of protein B and C levels over the 72 hours to determine parameter values using PARSEC and its alternatives. The algorithm pertinent to the design issue is explained below (refer to Supplementary Fig. S1 b).

1.

To begin with, we need to create a set of all potential measurement candidates. Apart from the design criteria, we restrict the time points to multiples of three hours to discretise the experiment’s time axis and limit the computations. Thus, six elements must be chosen from the set \(T_F = \{3, 6, 9, \ldots , 69, 72 \}\) hours to enhance parameter estimation accuracy. These selected time points, model equations, and variables of interest will be inputs into the algorithm. The problem specifies the variables of interest as the levels of proteins B and C.

2.

We then compute the parameter sensitivity indices for each variable of interest at each viable time point. This sensitivity analysis uses a nominal guess of the combination of parameter values. Since we need to measure the levels of proteins B and C simultaneously, we combine the parameter sensitivity profiles of each variable to create the concatenated PARSEC-PSI vector for each feasible measurement time point.

3.

Next, the measurement candidates (time points for measurement), are grouped into six clusters (corresponding to the six measurement time points) based on similarities in their PARSEC-PSI vector. We apply the k-means clustering algorithm on the Euclidean distance between the PARSEC-PSI vectors. Since k-means clustering is stochastic in nature, it can produce varying cluster groupings in different iterations, we generate design and compare results from several clustering runs.

4.

For each iteration, one measurement candidate is picked randomly from each of the six clusters to define the final designs considered for evaluation by ABC-FAR for parameter estimation (\(\hbox {PD}_{{BC}}\)).

We generate one hundred PARSEC designs and evaluate them against random designs (\(\hbox {RD}_{{BC}}\) and \(\hbox {RD}_{{AB}}\)), non-specific designs (\(\hbox {PD}_{{AB}}\)), and anti-PARSEC designs (\(\hbox {WD}_{{BC}}\)) using six concurrent measurements of the two variables. For each mechanism, we produce a hundred designs, except for \(\hbox {WD}_{{BC}}\), where only 40 designs are generated.To assess the robustness of PARSEC configurations in the face of parameter uncertainty, we employ a uniform distribution (on a logarithmic scale) covering a four-fold range of parameter values for two out of the six parameters in the repressilator model (Figs. 5b and 6c). The PARSEC-PSI vector is constructed using PSI calculated at five training values (\(\Theta ^k\)), which are chosen based on Latin Hypercube Sampling (LHS) from the uncertainty range. The experimental design is then evaluated using these five training parameter sets and an additional set of four test parameter sets (\(\Omega ^k\)), also selected through LHS. Given the breadth of uncertainties considered, we extend the duration of the experiment measurements to 120 hours in this analysis.To determine the optimal sample size (Fig. 6a, b), we produce the PARSEC-PSI vectors with a fixed set of parameter values for 72 hours. The sample size defines the number of groups for clustering the PARSEC-PSI vectors. To score the performance of PARSEC and identify a good sample size, we calculate the accuracy ratio, as the ratio of averaged estimation error for random designs to that for PARSEC designs, analysed for a given experiment design specifications. Thus higher the accuracy ratio, the more advantageous the use of PARSEC. We also define the ‘fuzzy burden ratio’ to mark the (inverse of the) computational efficiency achieved through the use of fuzzy c-means clustering as opposed to k-means clustering. The burden ratio looks at the ratio of the number of distinct designs evaluated under the PARSEC(c) framework to that under the PARSEC(k) framework. A lower burden ratio suggests higher efficiency.DoE for a viral life cycle model using PARSECOur PARSEC experiment designs for the life cycle of a positive-sense RNA virus are based on a recently published model32. This model outlines the cellular growth dynamics of various viral molecules, including different viral RNA species, viral replication intermediates, viral proteins, and the virus particles that are released from the cell. It incorporates eleven parameters to describe the interactions between these virus growth variables.Seven of these parameters (mentioned under ‘Free parameters’ in Supplementary Table S2) are directly associated with the viral life cycle stages of translation, replication, and packaging and are expected to change with the virus strain. Therefore, for implementation of PARSEC to this model, we devise experiments aimed at estimating these seven parameters. We tailor the experimental designs to include only concurrent measurements of three commonly quantified viral molecules: positive and negative sense RNA levels and viral titre. In addition, we set a fixed sample size and restrict the experiment duration to 48 hours. We discretised the experiment time axis in intervals of two hours. As a result, PARSEC selects a fixed number of time points from the set: \(T_F = \{2, 4, 6, \ldots , 46, 48 \}\) Further, we consider predetermined initial conditions and prior knowledge of the parameters (details in Supplementary Tables S2 and S3).

Hot Topics

Related Articles