Modeling relaxation experiments with a mechanistic model of gene expression | BMC Bioinformatics

Mathematical modelCase without proliferationThroughout this work, we will use the classical two-state model (Fig. 1; see [9] and references therein), a refinement of the pioneer model introduced by [10].Fig. 1The 2-state model of gene expression. The gene opens with a \(k_{\text {on}}\) rate and closes with a \(k_{\text {off}}\) rate. Similarly to [11] we only consider protein (x) production (with an s rate) and degradation (with a d rate)This is the simplest model that accounts well for the specific nature of single-cell omics data (non-poisonnian [12], well fitted by Gamma distributions [13] and displaying a high proportion of zero counts [14]). More refined models with any number of possible gene configuration have been described [15] but their mathematical complexity makes them cumbersome to use for our purpose.It is important to stress here that this is a mechanistic model, that differs from the phenomenological 2-states model described upper. Such models only considered a low and a high \(\chi\) state, without describing the protein production process. Importantly here stochasticity is described at the core of the modelling and does not need to be introduced as a additional term in the model. We recently proposed a piecewise deterministic Markov process (PDMP) version of that model which rigorously approximates the original molecular model [9]. Furthermore, a moment-based method has been proposed for estimating parameter values from a given experimental distribution assumed to arise from the functioning of a 2-states model [11]. We recall here the mathematical description of the model through the PDMP (piecewise deterministic Markov process) formalism$$\begin{aligned} \frac{d}{dt} \chi =s.E(t) -d\chi (t), \end{aligned}$$where \(E(t)\in \{0,1\}\) switching from 0 to 1 (resp. 1 to 0) at a rate \(k_{\text {on}}\) (resp. \(k_{\text {off}}\)). In this process, the protein quantity \(\chi (t)\) is structurally bounded by \(X_{max}=s/d\). From this process, we can derive the Chapman Kolmogorov or master equation in the form$$\begin{aligned} {\left\{ \begin{array}{ll} \partial _tn_{\text {on}}(t,\chi )+\partial _\chi J_{\text {on}} (t,\chi )=-k_{\text {off}}n_{\text {on}}+k_{\text {on}}n_{\text {off}} \quad \chi \in ]0,s/d[,\\ \partial _tn_{\text{off}}(t,\chi )+\partial _\chi J_{\text {off}} (t,\chi )=+k_{\text {off}}n_{\text {on}}-k_{\text {on}}n_{\text {off}} \quad \chi \in ]0,s/d[,\\ J_{\text{on}}(t,\chi )=(s-d\chi )n_{\text{on}}(t,\chi ),\; J_{\text {off}}(t,\chi )=-d\chi ,\\ J_{\text {on}}(t,0)=J_{\text {off}}(t,0)=J_{\text {on}}(t,s/d)=J_{\text {off}}(t,s/d)=0. \end{array}\right. } \end{aligned}$$
(1)
see [16,17,18] for similar derivations. master equation of the process in the absence of proliferation reads. We recall that the boundary conditions simply reflects the no-flux boundary conditions stating \((s-d\chi ) n_{\text {on}}(t,\chi )=-d\chi n_{\text {off}}(t,\chi )=0\) for \(\chi =0,s/d.\) Moreover, because \(s-ds/d=-d.0,=0\), we only specify the boundary conditions when they give constraints on the densities. We define \(X_{\max }=s/d\) as the maximum value for the quantity of \(\text {CD34}\) in a cell. Scaling the space by \(X_{\max }\) allows us to consider the following system$$\begin{aligned} {\left\{ \begin{array}{ll} \partial _t n_{\text {on}}+ d\partial _x((1-x)n_{\text {on}})=-k_{\text {off}}n_{\text {on}}+k_{\text {on}}n_{\text {off}}, \quad x\in ]0,1[,\\ \partial _t n_{\text {off}}+ d\partial _x((-x)n_{\text {off}})=k_{\text {off}}n_{\text {on}}-k_{\text {on}}n_{\text {off}}, \quad x\in ]0,1[,\\ n_{\text {on}}(t,0)=n_{\text {off}}(t,1)=0. \end{array}\right. } \end{aligned}$$
(2)
with \(n_{\text {on}}(t,x)\) being the number of cells with a promoter in the on state at time t, with a (scaled) CD34 level x and \(n_{\text {off}}(t,x)\) being the number of cells with a promoter in the off state. The total number of cells, denoted as n(t, x), is given by \(n_{\text {on}}(t,x)+n_{\text {off}}(t,x)\). This is the quantity we considered to be measured.Steady state of the model. The system is mass preserving and it converges to a steady state \(N_{\text {on},\text {off}}\) which is characterized by$$\begin{aligned} {\left\{ \begin{array}{ll} d \partial _x((1-x)N_{\text {on}})=-k_{\text {off}}N_{\text {on}}+k_{\text {on}}N_{\text {off}}, \quad x\in ]0,1[,\\ d \partial _x((-x)N_{\text {off}})=-k_{\text {off}}N_{\text {on}}-k_{\text {on}}N_{\text {off}}, \quad x\in ]0,1[,\\ N_{\text {on}}(0)=N_{\text {off}}(1)=0. \end{array}\right. } \end{aligned}$$
(3)
And the solution if nonnegative. An interesting feature of this system is the fact that we have an explicit solution. We recall here the computations that can be found in [16] because they might help for understanding the computations for the model with proliferation. Indeed, summing up the equations, we get$$\begin{aligned} \partial _x((1-x)N_{\text {on}}-xN_{\text {off}})=0. \end{aligned}$$Therefore, this quantity is constant on ]0, 1[. Using the boundary condition, we can see that 0 is the only admissible constant. Therefore, in this precise case, we have necessarily$$\begin{aligned} (1-x)N_{\text {on}}=xN_{\text {off}}. \end{aligned}$$Injecting in the equation we get$$\begin{aligned} d \partial _x((1-x)N_{\text {on}})=-k_{\text {off}}N_{\text {on}}+k_{\text {on}}\frac{(1-x)N_{\text {on}}}{x}= (1-x)N_{\text {on}}\left( -\frac{k_{\text {off}}}{1-x}+\frac{k_{\text {on}}}{x}\right) . \end{aligned}$$From this, we get$$\begin{aligned} N_{\text {on}}=C (1-x)^{\frac{k_{\text {off}}}{d}-1}x^{\frac{k_{\text {on}}}{d}},\quad N_{\text {off}}=C (1-x)^{\frac{k_{\text {off}}}{d}}x^{\frac{k_{\text {on}}}{d}-1}. \end{aligned}$$and quite remarkably, we have$$\begin{aligned} N(x)=N_{\text {on}}(x)+N_{\text {off}}(x) = C(1-x)^{\frac{k_{\text {off}}}{d}-1} x^{\frac{k_{\text {on}}}{d}-1}. \end{aligned}$$
(4)
If we choose \(C= \frac{\Gamma ((k_{\text {on}}+k_{\text {off}})/d)}{\Gamma (k_{\text {on}}/d)\Gamma (k_{\text {off}}/d)}\) we normalize this to 1 and get a \(\beta\) law \(B(k_{\text {on}}/d,k_{\text {off}}/d)\), so that we end up with$$\begin{aligned} (n_{\text {on}}(t,x),n_{\text {off}}(t,x))\rightarrow \left( \int _0^1 n_{\text {on}}^0(x)+n_{\text {off}}^0(x)\right) (N_{\text {on}}(x),N_{\text {off}}(x)). \end{aligned}$$Case with proliferationHSCs mostly reside in a quiescent state, although they can occasionally divide during homeostasis [19]. We therefore will consider \(\text {CD34}^+\) TF1-BA cells as immature slowly proliferating cells and \(\text {CD34}^-\) TF1-BA cells as mature highly proliferating cells. Therefore the proliferation rate will depend on the x variable representing the \(\text {CD34}\) content but not on the \(\text {on}/\text {off}\) status.Moreover, we consider that cells keep their \(\text {on}/\text {off}\) status during a division. This is in line with the demonstration of a memory of transcriptional activity in mammalian cells [20, 21].$$\begin{aligned} {\left\{ \begin{array}{ll} \partial _t n_{\text {on}}+ d \partial _x((1-x)n_{\text {on}})=-k_{\text {off}}n_{\text {on}}+k_{\text {on}}n_{\text {off}}+r(x)n_{\text {on}}(t,x), \quad x\in ]0,1[,\\ \partial _t n_{\text {off}}+ d \partial _x((-x)n_{\text {off}})=k_{\text {off}}n_{\text {on}}-k_{\text {on}}n_{\text {off}}+r(x)n_{\text {off}}(t,x), \quad x\in ]0,1[,\\ n_{\text {on}}(t,0)=n_{\text {off}}(t,1)=0. \end{array}\right. } \end{aligned}$$
(5)
Since the system is structurally non-conservative, it makes no sense to look for steady state here. However, one can investigate for stable exponential profile, that is to look for positive solutions with shape$$\begin{aligned} e^{\lambda t} (N_{\text {on}}(x),N_{\text {off}}(x)). \end{aligned}$$Such solution satisfy the system$$\begin{aligned} {\left\{ \begin{array}{ll} \lambda N_{\text {on}}+ d \partial _x((1-x)N_{\text {on}})=-k_{\text {off}}N_{\text {on}}+k_{\text {on}}N_{\text {off}}+r(x)N_{\text {on}}(x), \quad x\in ]0,1[,\\ \lambda N_{\text {off}}+ d \partial _x((-x)N_{\text {off}})=k_{\text {off}}N_{\text {on}}-k_{\text {on}}N_{\text {off}}+r(x)N_{\text {off}}(t,x), \quad x\in ]0,1[,\\ N_{\text {on}}(0)=N_{\text {off}}(1)=0, \quad N_{\text {on},\text {off}} >0. \end{array}\right. } \end{aligned}$$
(6)
In the sequel, we will focus on the normalized representant so that we will assume$$\begin{aligned} \int _0^1 N_{\text {on}}(x)+N_{\text {off}}(x) dx=1. \end{aligned}$$We also introduce the adjoint eigenprofile. It can be obtained as exponentially growing solutions for the adjoint differential operators, this is the continuous equivalent of left and right eigenvector for the same eigenvalues in matrix analysis (\(MU=\lambda U, v^T M =\lambda V\) or equivalently \(M^T V=\lambda V\)).$$\begin{aligned} {\left\{ \begin{array}{ll} \lambda \phi _{\text {on}}- d(1-x)\partial _x \phi _{\text {on}}=-k_{\text {off}}\phi _{\text {on}}+k_{\text {off}}\phi _{\text {off}}+r(x)\phi _{\text {on}}, \quad x\in ]0,1[,\\ \lambda \phi _{\text {off}}- d(-x)\partial _x(\phi _{\text {off}})=-k_{\text {on}}\phi _{\text {off}}+k_{\text {on}}\phi _{\text {on}}+r(x)\phi _{\text {off}}, \quad x\in ]0,1[,\\ \phi _{\text {on},\text {off}}>0, \quad \int _0^1 N_{\text {on}}\phi _{\text {on}}+N_{\text {off}}\phi _{\text {off}} dx=1. \end{array}\right. } \end{aligned}$$
(7)
We emphasize in particular the following property, for any initial condition of the system, one has$$\begin{aligned} \int _0^1 n_{\text {on}}(t,x)\phi _{\text {on}}(x) +n_{\text {off}}(t,x)\phi _{\text {off}}(x)dx =e^{\lambda t} \int _0^1 n_{\text {on}}(0,x)\phi _{\text {on}}(x) +n_{\text {off}}(0,x)\phi _{\text {off}}(x)dx =C^0 e^{\lambda t},\end{aligned}$$and moreover,$$\begin{aligned} \int _0^1 |e^{-\lambda t}n_{\text {on}}(t,.)-C^0 N_{\text {on}}|\phi _{\text {on}}(x)dx +\int _0^1 |e^{-\lambda t}n_{\text {off}}(t,.)-C^0 N_{\text {off}}|\phi _{\text {off}}(x)dx\rightarrow 0. \end{aligned}$$$$\begin{aligned} e^{-\lambda t}\left( n_{\text {on}}(t,.),n_{\text {off}}(t,.)\right) \rightarrow C^0\left( N_{\text {on}},N_{\text {off}}\right) . \end{aligned}$$In the weighted norm \(\Vert (f_\text {on},f_\text {off})\Vert _{\phi }=\int _0^1 |f_\text {on}|\phi _\text {on}+|f_\text {off}|\phi _\text {off}\). In case \(\phi _{\text {on},\text {off}}\) is lower bounded, this implies classical \(L^1\) convergence. This lower bound is established below. For more details on this, we refer to [22] for an introduction to eigenvectors in this context. In particular, thanks to our normalization, the triplet \((\lambda ,N,\phi )\) is uniquely defined. Note that in the conservative case (\(r=0\)), \(\lambda =0\), N is given by the renormalized steady state and the adjoint eigenvector is simply the constant vector (1, 1). Note also that this guarantees that for any initial data, we have in case \(\lambda \ge 0\) and \(\phi _{\text {on},\text {off}}\) are lower bounded (as it will be established below)$$\begin{aligned} \frac{1}{\int _0^1 n_{\text {on}}(t,x) +n_{\text {off}}(t,x)dx}\left( n_{\text {on}}(t,.),n_{\text {off}}(t,.)\right)&= \frac{1}{e^{-\lambda t}\int _0^1 n_{\text {on}}(t,x) +n_{\text {off}}(t,x)dx}e^{-\lambda t}\left( n_{\text {on}}(t,.),n_{\text {off}}(t,.)\right) \\&\rightarrow \frac{1}{C^0}C^0\left( N_{\text {on}},N_{\text {off}}\right) =\left( N_{\text {on}},N_{\text {off}}\right) . \end{aligned}$$And regarding the observations of the steady profile in Fig.  2, our normalized asymptotic profile should be \(N: x\longmapsto N_{\text {on}}(x)+N_{\text {off}}(x)\).We assume that, for the initial dimensional system, the proliferation rate is linear, i.e. \(r:\chi \longmapsto \tilde{r}_0+\tilde{r}_1 \chi\). Scaling again the space by \(X_{\max }\), the proliferation term becomes, \(r:x \rightarrow r_0 + r_1 x\) with \(r_0 = \tilde{r}_0\) and \(r_1 = \tilde{r}_1 X_{\max }\). We assume that the constant proliferation rate is positive, \(r_0>0\). Conversely, to model the fact that \(\text {CD34}^-\) cells divide more frequently than \(\text {CD34}^+\) cells, we assume that the linear proliferation term, \(r_1\), is negative. However, to preserve the positivity of the proliferation rate, the constant \(r_1\) must satisfy the following condition, \(r_1 \in [-r_0,0]\).We show in the Results section that, under this proliferation assumption, it is theoretically possible to derive the normalized asymptotic profiles \((N_{\text {on}},N_{\text {off}})\).The biological settingRelaxation experimentsChronic Myeloid Leukemia (CML) is a myeloproliferative disorder arising at the hematopoietic stem cell (HSC) level. It is associated with the recurrent chromosomal (Philadelphia) translocation t(9;22)(q34;q11) which leads to the oncogenic chimeric gene that fuses Bcr and Abl genes and results in the expression of a constitutively active unique tyrosine kinase named BCR-ABL [23].Véronique Maguer-Satta’s group has developed the TF1-BA cell line, a unique model of immature human hematopoietic cells (TF1) transformed with BCR-ABL by lentiviral infection. This model was shown to recapitulate early alterations following the BCR-ABL oncogene appearance as identified using primary samples of CML patients at diagnosis and in chronic phase [24].We decided to analyze the relaxation dynamics for the CD34 antigen at the surface of those TF1-BA cells (Fig. 2). \(\text {CD34}\) is a transmembrane phosphoglycoprotein which is predominantly regarded as a marker of Haematopoietic Stem Cells (HSCs) [25]. We reasoned that \(\text {CD34}\) surface expression could therefore be seen as a proxy for stemness of our TF1-BA cells. Interestingly, one observes a relaxation in both directions: \(\text {CD34}^-\) cells are regenerated from \(\text {CD34}^+\) cells, as biologists would expect, but one also see that \(\text {CD34}^+\) cells are regenerated from \(\text {CD34}^-\) cells, establishing that stemness is not a fixed quality but the result of an underlying dynamical system as previously shown in other cellular systems ( [5, 6]; see upper)Fig. 2The relaxation experiment. TF1-BA cells well labelled with an anti-CD34 antibody and FACS-sorted. The 10 percent most CD34 positive and the 10 percent most CD34 negative cells were sorted, grown in culture for the indicated period of time, where the distribution of cell-surface CD34 expression was assessed. KT: the modified Kantorovich–Rubinstein distance, defined by the Eq. (14), between the two distributions [26]Data processingTwo types of data were collected on days 2, 5, 9, 13, 19, 23, 26 and 30: cell counts and fluorescence distribution. The cell counts allow us to quantify proliferation whereas the fluorescence measure the distribution of CD34 expression.Gating.As usual for cytometric data, we initiate the analysis with a gating step. We use SSC-H and FSC-H data to distinguish viable and debris cells. FSC (Forward Scatter) data are generally assimilated to the size of the cells analyzed, and correspond to the light scattered along the laser path. SSC (Side Scattered) data, on the other hand, are usually linked to the granularity and correspond to the light scattered at a 90-degree angle. The “H” stands for Height and is one component of this type of data. Cell debris are characterized by relatively low size and high granularity relative to their size.To select viable cells, we plot the values of SSC-H along those of FSC-H. An example of such a graph is provided in Fig. 3 using data from day 2 of the \(\text {CD34}^+\) cell experiment. Visually, viable cells can be identified as the cluster of points with high FSC-H and SSC-H values. Using the “FlowCal” python package [27], we draw an ellipse as a filter to select only these viable cells. At the bottom of Fig. 3, we represent fluorescence data with and without the gating phase (Ungated and Gated, respectively). Note that fluorescence distribution is only slightly affected by the removal of debris cells.Fig. 3Example of flow cytometry gating. Top. Example of SSC-H along FSC-H plot for raw data from the plus subpopulation on day 2. As the data contain a high proportion of debris cells, we select only those viable cells lying within the black ellipse. Bottom. Fluorescence data before gating (Ungated) and after gating (Gated). For the figures and the ellipse, we used the python package “FlowCal” [27]Shifting.Even after gating, some cells exhibit a negative fluorescence level, which is inconsistent as these values are intended to represent the amount of proteins in each cell. To avoid this problem, we added a shifting step. This step occurs immediately after the gating process and consists in subtracting the minimum value of each distribution (for each sub-population and for each day) from all the values, bringing the minimum to 0. This transformation, once again, does not distort the fluorescence distribution. This corresponds to the interpretation of negative values as compensation of a baseline value.Numerical simulationsLinking data to mathematical model: The cell counts are interpreted as snapshots of the total population \(\int _0^{ X_{\max }} (n_{\text {on}}(t,x)+n_{\text {off}}(t,x) )dx\). The fluorescence distribution is considered as a sample from the distribution \(\frac{n_{\text {on}}(t,x)+n_{\text {off}}(t,x)}{\int _0^{ X_{\max }} (n_{\text {on}}(t,x)+n_{\text {off}}(t,x) )dx}\). As we have no information on the repartition \(\text {on}/\text {off}\) for the initial data, we apply the following rule: for \(t_0\), starting of our simulation (DAY 2), we choose the repartition to be the same as in the steady distribution N. More precisely, we fix the proportion with the equation$$\begin{aligned} n_{\text {on}/\text {off}}(t_0,x)= \underbrace{\frac{N_{\text {on}/\text {off}}(x)}{N_{\text {on}}(x)+N_{\text {off}}(x)}}_{\text {parameter dependent}}\underbrace{(n_{\text {on}}(t_0,x)+n_{\text {off}}(t_0,x)}_{\text {data}}. \end{aligned}$$
(8)
Numerical scheme. For the Eq. (5), we use an explicit upwind scheme. Setting \(a_{\text {on}}: x \longmapsto d(1-x)\) and \(a_{\text {off}}: x \longmapsto -dx\), the scheme is given by$$\begin{aligned} {\left\{ \begin{array}{ll} \frac{{n_{\text {on}}}_j^{n+1}-{n_{\text {on}}}_j^{n}}{\Delta t} + \frac{{a_{\text {on}}}_{j+1/2}^{n} {n_{\text {on}}}_j^{n} – {a_{\text {on}}}_{j-1/2}^{n} {n_{\text {on}}}_{j-1}^{n}}{\Delta x} = – k_{\text {off}} {n_{\text {on}}}_j^{n} + k_{\text {on}} {n_{\text {off}}}_j^{n} + r_j {n_{\text {on}}}_j^{n}, \\ \frac{{n_{\text {off}}}_j^{n+1}-{n_{\text {off}}}_j^{n}}{\Delta t} + \frac{{a_{\text {off}}}_{j+1/2}^{n} {n_{\text {off}}}_{j+1}^{n} – {a_{\text {off}}}_{j-1/2}^{n} {n_{\text {on}}}_{j}^{n}}{\Delta x} = – k_{\text {on}} {n_{\text {off}}}_j^{n} + k_{\text {off}} {n_{\text {on}}}_j^{n} + r_j {n_{\text {off}}}_j^{n}, \end{array}\right. } \end{aligned}$$
(9)
with \({n_{\text {on}/\text {off}}}_j^{n} = n_{\text {on}/\text {off}}(j \Delta x, n \Delta t)\), \(r_j = r(j \Delta x)\), \({a_{\text {on}/\text {off}}}_{j+1/2}^n = (a_{\text {on}/\text {off}}((j+1) \Delta x,n \Delta t)+a_{\text {on}/\text {off}}(j \Delta x, n \Delta t))/\Delta x\) and \({a_{\text {on}/\text {off}}}_{j-1/2}^n = (a_{\text {on}/\text {off}}(j \Delta x, n \Delta t)+a_{\text {on}/\text {off}}((j-1) \Delta x,n\Delta t))/\Delta x\). In the Results section, Fig. 7 illustrates a comparison between the result of the numerical scheme and the theoretical asymptotic profile of Eq. (5).Estimation of the distance to the data.To calibrate the parameter values of our system, we use our experimental data. Initially, in order to estimate the exponential growth rate of cells, we perform a linear regression analysis on the temporal evolution data of the cell count. To determine the values of other system parameters, we seek values that make our numerical results as close as possible to the experimental data. To characterize this notion of closeness between our numerical results and the data, we introduce the Kantorovich–Rubinstein distance. Given two probability distribution \(p_1,p_2\) on \(\mathbb {R}_+\), we define their cumulative distribution function \(P_i(x)=Pr(X<x, \text {under distribution }p_i)=\int _0^x p_i(dx).\) Using these functions we can define the Kantorovich Rubinstein (also known as Wasserstein) distance by$$\begin{aligned} dist_{\textrm{KT}}(p_1,p_2) = \int _{0}^{\infty } \left| P_1(x)-P_2(x) \right| \textrm{d}x. \end{aligned}$$
(10)
In our specific case, we want to compare at each step the (normalized) distribution generated by the model at time \(t_i\) (hereby denoted as \(model(t_i,dx)\) with cumulative distribution \(M(t_i,.)\)) and the corresponding distribution of the data at time \(t_i\) denoted as \(data(t_i,dx)\)with cumulative distribution \(D(t_i,.)\). We would therefore compute$$\begin{aligned} dist_{KT}(t_i)=dist_{\textrm{KT}}(model(t_i),data(t_i))=\int _0^\infty |M(t_i,x)-D(t_i,x)|dx. \end{aligned}$$Note that in our case the integral is in fact taken on the finite interval [0, 1] for scaled variables.Considering the distribution profile of the data, we prefer to study this distance on a logarithmic scale. We therefore make the following change of variables \(y = \log (xX_{\max } +1)\), and we define the modified Kantorovich–Rubinstein distance as follows$$\begin{aligned} dist_{\textrm{KT}}^{\log }(t) = \int _{0}^{\log (X_{\max }+1)} \left| \tilde{D}(y,t) – \tilde{M}(y,t) \right| \textrm{d}y, \end{aligned}$$
(11)
with \(\tilde{D}\) and \(\tilde{M}\) the two cumulative distribution functions, defined below, in the new logarithmic scale. Note that, following this change of variable, this “distance” can be greater than 1.We are looking for a function \(\tilde{m}\) that satisfies the following relation, for all \(b\in [0,\log (1+X_{max})]\)$$\begin{aligned} {\tilde{M}}(b,t) = M\left( \frac{e^b -1}{X_{max}},t\right) . \end{aligned}$$
(12)
In particular, the link between the corresponding densities is immediately given by$$\begin{aligned} {\tilde{m}}(b,t)=m\left( \frac{e^b -1}{X_{max}},t\right) \frac{e^b}{X_{max}}. \end{aligned}$$
(13)
The space [0, 1] is discretized uniformly with \(J+1\) points, and this sequence is denoted \((x_j)_j\).$$\begin{aligned} (x_j)_{j \in \{ 0,1,\ldots ,J\}}: \; x_j = j \Delta x, \qquad \text {with} \; \Delta x = 1/J. \end{aligned}$$We also define the sequences \((y_j)_{j \in \{ 0,1,\ldots ,J\}}\) and \((\ell _j)_{j \in \{ 0,1,\ldots ,J-1 \}}\) as follows$$\begin{aligned} (y_j)_{j \in \{ 0,1,\ldots ,J\}}: \; y_j = \log (X_{\max }x_j +1 ) = \log \left( 1 + \frac{X_{\max }j}{J} \right) , \end{aligned}$$and$$\begin{aligned} (\ell _j)_{j\in \{ 0,1,\ldots ,J-1 \}}: \, \ell _j = y_{j+1} – y_j. \end{aligned}$$Consequently, the estimator of the cumulative distribution function M is given by$$\begin{aligned} \hat{M}(y_j,t) = \frac{1}{\sum _i \tilde{m}(y_i,t) \ell _i} \sum _{i< j} \tilde{m}(y_i,t) \ell _i =\frac{ \sum _{i < j} m(x_i,t) \left( \frac{1}{X_{\max }} + x_i \right) \ell _i}{\sum _{i } m(x_i,t) \left( \frac{1}{X_{\max }} + x_i \right) \ell _i}, \quad \forall j \in \{ 0,1,\ldots ,J\}. \end{aligned}$$where we have used (13) to estimate \(\tilde{m}\). We need to renormalize to ensure we are comparing probability distributions.The estimator of the cumulative distribution function D is$$\begin{aligned} \hat{D}(y_j,t) = \frac{1}{\# \{d_k(t), \; \forall k \}} \sum _{i < j} h_i, \quad \forall j \in \{ 0,1,\ldots ,J\}, \end{aligned}$$with \(h_j(t) = \#\{d_i: \; \log (d_i(t) +1) \in [y_{j-1},y_j] \}\), where the operator \(\#\) corresponds to the cardinal of a set and the data \(d_i\) correspond to the fluorescence data obtained after data processing. These data, \(d_i\), are real numbers between 0 and \(X_{\max }\). The term \(\# \{d_k(t), \; \forall k \}\) corresponds to the number of cells present in the data on day t after the gating operation.Therefore, the distance between the experimental data and the mathematical model is as follows$$\begin{aligned} \widehat{dist}_{\textrm{KT}}^{\log }(t;\text {parameters}) = \sum _{j=0}^{J-1} \left| \hat{D}(y_j,t) – \hat{M}(y_j,t;\text {parameters}) \right| \ell _j. \end{aligned}$$
(14)
To calibrate the parameters of our model, we will minimize the sum of the modified Kantorovich–Rubinstein distance for the different days at our disposal and for the two experiments. The distance associated with \(\text {CD34}^+\) data is denoted \(\widehat{dist}_{\textrm{KT}}^{\log ,+}\), while that associated with \(\text {CD34}^-\) data is denoted \(\widehat{dist}_{\textrm{KT}}^{\log ,-}\). We also introduce the distance, denoted \(\widehat{dist}_{\textrm{KT}}^{\log ,\pm }\), corresponding to the sum of these two distances. Thereby we look for one set of parameters for fitting the two datasets jointly. Mathematically, the optimization problem is given by the following formula$$\begin{aligned} \text {parameters}^{\text {opt}} = \mathop {\mathrm {arg\,min}}\limits _{\text {parameters}} \left( \sum _{t \in \text {Days}} \widehat{dist}_{\textrm{KT}}^{\log ,\pm }(t;\text {parameters}) \right) . \end{aligned}$$
(15)
The results of this optimization work are detailed in the Results section.

Hot Topics

Related Articles