Data-driven discovery of chemotactic migration of bacteria via coordinate-invariant machine learning | BMC Bioinformatics

We constructed data-driven models for two different data sources (which, however correspond to the same general chemotaxis scenario):

(i)

PDE simulation data: The Chemotactic PDE described and used in [1] was simulated and data were collected for both bacterial density and nutrient concentration fields. Details about this PDE and its simulation can be found in Materials and Methods.

(ii)

Real-world data from chemotaxis experiments were used. These experiments were performed by Bhattacharjee et al. and described in [1].

The results are presented separately for each of these two categories.Models for simulation dataPDE simulations included the integration of two coupled PDE fields, one describing the bacterial density b and one describing the nutrient density c (System 10). The PDE solution can be seen in Fig. 1.Fig. 1PDE simulations representing the ground truth of the simulations: (left) b(r, t) field and (right) c(r, t) field. For clarity, arrows are added to denote the direction of timeTable 1 Listing of all data-driven models explored in the manuscript.Table 2 Summary of all data-driven models for learning the nutrient field (denoted as c(x, t)). Only the second  model will be presented in detail in this section. The other model is included in the Supplementary InformationTo learn from simulation data, we considered a variety of machine-learning-enabled data-driven models; they are listed in Tables1 and 2; the relevant notation is summarized in the Table captions. In the text that follows, a representative selection (see the captions in Tables 1, 2) will be described in more detail; the remaining ones are relegated to the Supplementary Information.Black box data-driven modelsConsider a system described by d macroscopic scalar variable fields (\(u^{(1)},…, u^{(d)}\)). Assuming a one-dimensional domain along the vector \(\hat{x}\) (for an example in cylindrical coordinates see Fig. 12) discretized through m points in space (x) and n points in time (t), we are given \(m\cdot n\) data points in \(\mathbb {R}^{d}\). Using interpolation/numerical differentiation, we can estimate the temporal derivatives \(u^{(1)}_t,…, u^{(d)}_t\), as well as various order derivatives in space (first, \(u^{(1)}_x,…, u^{(d)}_x\), second \(u^{(1)}_{xx},…, u^{(d)}_{xx}\), etc). We assume that we know a priori the largest order of relevant spatial derivatives (here, two) [32], the coordinate system, and the boundary conditions (here, zero flux). We also assume that the spatiotemporal discretization satisfies the necessary criteria for a numerically converged PDE solution. Given these derivatives, we can compute all relevant local operators, such as: \(\mathbf {\nabla u^{(i)}}, \Delta u^{(i)}, i \in \{1,…, d\}\).Our objective is to learn functions \(f_i: \mathbb {R}^{3d \cdot m} \rightarrow \mathbb {R}^m, i \in \{1,…, d\}\) such that:$$u_t^{(i)} = {f_i}({u^{(1)}},…,{u^{(d)}},\nabla {u^{(1)}} \cdot \hat x,…,\nabla {u^{(d)}} \cdot \hat x,\Delta {u^{(1)}},…,\Delta {u^{(d)}})$$
(2)
This is a black box expression for the time derivative of a macroscopic field expressed as a function of the relevant lower order coordinate-independent local spatial operators, operating on the fields. The coordinate-independent nature of the input features, allows the use of such data-driven laws across different coordinate systems, as well as fusion of data from different coordinate systems in the training. This property, we believe, is of great importance as it results in versatile, generalizable models. After training (after successfully learning this function based on data) integrating this model numerically can reproduce spatiotemporal profiles in the training set, and even hopefully predict them beyond the training set. The arguments of \(f_i\) will be the features (or input vectors) and \(u^{(i)}_t\) will be the target (or output vector) of our data-driven model. Note that, usually, not all features are informative in the learning of \(f_i\) (in other words, only some orders of the spatial derivatives appear in the PDE right-hand-side). Also, note that not all macroscopic variables \(u^{(i)}\) are always necessary for learning \(f_j, j \ne i\). In the spirit of the Whitney and Takens embedding theorems [33, 34], short histories of some of the relevant variable profiles may “substitute” for missing observations/measurements of other relevant variables.Coordinate invariant learningIn Cartesian coordinates the operators used as inputs to learn right-hand sides (or, here, chemotactic terms) are simply related to the spatial derivatives; but in curvilinear coordinates, or when the evolution occurs on curved manifolds, the relation between spatial derivatives and local operators needs more care. We consider physical Euclidean space \(\mathbb {R}^3\) (regarded as a Riemannian manifold with Euclidean metric expressed as \(g = (\textrm{d}x)^2 + (\textrm{d}y)^2 + (\textrm{d}z)^2\) in Cartesian coordinates (x, y, z)). The gradient, \({{\,\textrm{grad}\,}}f\), of a smooth function \(f :\mathbb {R}^3 \rightarrow \mathbb {R}\) is a vector pointing in the direction at which f grows at its maximal rate and whose length is said maximal rate —note that this definition is independent of the system of coordinates on \(\mathbb {R}^3\). The phase flow of a smooth vector field \(v :\mathbb {R}^3 \rightarrow \mathbb {R}^3\) can be regarded as the motion of a fluid in space. The divergence of v, denoted by \(\mathrm{div}\hspace{0.1cm}v\), is the outflow of fluid per unit volume at a given point. Again, the definition is coordinate frame independent; the expression \(\mathrm{div}\hspace{0.1cm} v = \frac{\partial v_1}{\partial x} + \frac{\partial v_2}{\partial y} + \frac{\partial v_3}{\partial z}\) is valid in Cartesian coordinates. The Laplacian is the composition of the gradient followed by the divergence (in other words, \(\Delta = \mathrm{div} {{\,\textrm{grad}\,}}\)), again independently of the choice of coordinates.Note that in our specific case of cylindrical coordinates with only the component along the radial dimension (with unit vector denoted \(\hat r\)) being important due to domain-specific symmetries, the right-hand-side of a PDE will explicitly depend on the local radius as well. In the above formulation, this is incorporated in the Laplacian term \(\Delta u^{(i)} = \frac{1}{r} \frac{\partial }{\partial r} (r \frac{\partial u^{(i)}}{\partial r} )\). For the construction of all relevant differential operators in any coordinate system, it is possible to use ideas and tools from Exterior Calculus (see Supplementary Information). In the Supplementary Information we also include some case studies that demonstrate the coordinate-invariance of our proposed algorithmic framework.Black box learning of both PDEs with an ANN (from known fields b(r, t), c(r, t))$$\left[ {\begin{array}{*{20}{c}} {{b_t}} \\ {{c_t}} \end{array}} \right] = \left[ {\begin{array}{*{20}{c}} f \\ h \end{array}} \right] = {F_{NN}}(b,\nabla b \cdot \hat r,\Delta b,c,\nabla c \cdot \hat r,\Delta c)$$
(3)
Fig. 2Black box learning of both PDEs with an Artificial Neural Network: (left) Integration results for the first data-driven PDE (for b(r, t)) and (right) relative error (%)Fig. 3Black box learning of both PDEs with an Artificial Neural Network: (left) Integration results for the second data-driven PDE (for c(r, t)) and (right) relative error (%)With training data from both b(r, t) and c(r, t) fields, we learned a neural network of the form described in Eq. 3. Figures 2, 3 show how the data-driven PDEs were able to learn the laws of time evolution of both b(r, t) and c(r, t). The data-driven models were used to reproduce the training data and could successfully extrapolate as far as we attempted (here, up to \(t=290\textrm{s}\)).Black box learning of \(b_t\) with ANN – \(c_t\) known (with fields b(r, t), c(r, t) known)$${b_t} = {f_{NN}}(b,\nabla b \cdot \hat r,\Delta b,c,\nabla c \cdot \hat r,\Delta c)$$
(4)
Fig. 4Black box learning with a Neural Network: (left) Integration results for the data-driven PDE and (right) % relative errorFigure 4 showcases learning of one of the two data-driven PDEs when the second PDE is known. Here, the chemonutrient (c(r, t)) PDE is assumed known, and the bacterial density evolution PDE is learned with a neural network, similar to Eq. 3. The trained ANN is able to approximate the ground truth PDE right-hand-side in the proximity of the training dataset. To assess the ANN’s approximation ability, one can look at the norm of the difference between ground truth (“functional”) and ANN approximated outputs for the same inputs: \(||b_t^{functional} -b_t^{ANN}||_2^2\). For a data cloud close to the training data (\(\pm 10\%\) in each direction) this norm represents an average relative error in the order of 2%. For comparison purposes, we also trained a Gaussian Process Regression algorithm for the same task (see. Fig. 5).Fig. 5Black box learning with Gaussian Process Regression: (left) Integration results for the data-driven PDE and (right) % relative errorBlack box ANN learning of a single field evolution equation, with only partial information (only the field b(r, t) is observed)After the initial success of the previous section, we decided to attempt a computational experiment, based on the spirit of the Takens embedding for finite-dimensional dynamical systems [33,34,35,36,37,38].The idea here is that, if only observations of a few (or even only one) variables involved are available, one can use history of these observations (“time-delay” measurements) to create a useful latent space in which to embed the dynamics -and in which, therefore, to learn a surrogate model with less variables, but involving also history of these variables [39, 40]. There are important assumptions here: finite (even low) dimensionality of the long-term dynamics, something easier to contemplate for ODEs, but possible for PDEs with attracting, low dimensional, (possibly inertial) manifolds for their long-term dynamics. There is also the assumption that the variable whose values and history we measure is a generic observer for the dynamics on this manifold.One can always claim that, if a 100-point finite difference discretization of our problem is deemed accurate (so, for two fields, 200 degrees of freedom), then the current discretized observation of one of the two fields (100 measurements) plus three delayed observations of it (\(3 \times 100\)) plus possibly one more measurement give us enough variables for a useful latent space in which to learn a surrogate model. Here we attempted to do it with much less: at each discretization point we attempted keeping the current b(r, t) field measurement and its spatial derivatives, and added only some history (the values and spatial derivatives at the previous moment in time). The equation below is written in discrete time form (notice the dependence of the field at the next time step from two previous time steps); it can be thought of as a discretization of a second order in time partial differential equation for the b(r, t) field, based on its current value and its recent history.$$b({t_{k + 1}}) = b({t_k}) + \Delta tf_{NN}^{partial}(b({t_k}),(\nabla b \cdot \hat r)({t_k}),(\Delta b)({t_k}),;\quad b({t_{k – 1}}),(\nabla b \cdot \hat r)({t_{k – 1}}),(\Delta b)({t_{k – 1}})),$$
(5)
with \(\Delta t = t_{k+1} – t_k\), for any time point \(t_k, k \geqslant 1\).Fig. 6Black box partial-information learning with a Neural Network: (left) Integration results for the data-driven PDE and (right) % relative errorFigure 6 demonstrates learning a data-driven (discrete in time here) evolution equation for the bacterial density b(r, t) when only data for b(r, t) are at hand (partial information). Even though we knew the existence of another, physically coupled field, we assumed we cannot sample from it, so we replaced its effect on the b(r, t) field through a functional dependence on the history of b(r, t). Simulation of the resulting model was successful in reproducing the training data and extrapolating beyond them in time.Gray box data-driven modelsA similar approach can be implemented when we have knowledge of a term/ some of the terms but not of the rest of the terms of the right-hand side. In the specific context of chemotaxis, we are interested in learning just the chemotactic term, i.e. functions \(g_i: \mathbb {R}^{3d\cdot m} \rightarrow \mathbb {R}^m, i \in \{1,…, d\}\) such that:$$u_t^{(i)} – {D^{(i)}}\Delta {u^{(i)}} = {g_i}({u^{(1)}},…,{u^{(d)}},\nabla {u^{(1)}} \cdot \hat x,…,\nabla {u^{(d)}} \cdot \hat x,\Delta {u^{(1)}},…,\Delta {u^{(d)}}),$$
(6)
where \(D^{(i)}\) is an a priori known diffusivity. This is now a gray box model for the macroscopic PDE, and is particularly useful in cases where an (effective) diffusion coefficient is easy to determine, possibly from a separate set of experiments or simulations [9]. Again, as for black box models, gray boxes can also be formulated in the case of partial information, i.e. when not all fields \(u^{(i)}\) are known, by leveraging history information of the known variables. Note that this framework (but also the black box one) can be adjusted to account for parametric dependence, which enables further downstream objectives, such as bifurcation studies or parameter estimation. See Eq. 8 for a demonstration of a gray box model used to identify the upper and lower characteristic bounds of logarithmic sensing [1].Gray box learning with ANN – \(c_t\) known (with fields b(r, t), c(r, t) known)$${b_t} – {D_b}\Delta b = {g_{NN}}(b,\nabla b \cdot \hat r,\Delta b,c,\nabla c \cdot \hat r,\Delta c)$$
(7)
Fig. 7Gray box learning with a Neural Network: (left) Integration results for the data-driven PDE and (right) % relative errorFigure 7 shows the performance of gray box models, where only the chemotactic term of the bacteria density PDE was considered unknown. For this gray box model, the effective diffusion coefficient for the bacterial density was considered known. In principle, one could also hardwire the knowledge of the functional form of this term in the loss of the neural network, and thus obtain an estimate of this diffusivity in addition to learning the chemotactic term in the PDE.As mentioned before, the gray box framework can be adjusted to account for parametric dependence:$${b_t} – {D_b}\Delta b = {g_{NN*}}(b,\nabla b \cdot \hat r,\Delta b,c,\nabla c \cdot \hat r,\Delta c,p),$$
(8)
where \(\mathbf {p=(c_+, c_-)}\) here describes the upper and lower characteristic bounds of logarithmic sensing. After generating an appropriate dataset and training an ANN (see Methods for details), it was possible to recover the parameter vector \({\textbf{p}}\) used in our simulation, by minimizing:$$\mathcal{L}(p) = ||{b_t} – {D_b}\Delta b – {g_{NN*}}(b,\nabla b \cdot \hat r,\Delta b,c,\nabla c \cdot \hat r,\Delta c,p)||_2^2.$$
(9)
Specifically, the ground truth parameter vector was \({\textbf{p}}=(30,1) \mu M\) while the recovered \({\textbf{p}}=(31.63, 1.48)\mu M\) (average value).Estimating the chemonutrient field–computational data.Following up the above success in using Takens’ embeddings (and more generally, Whitney embeddings) [33, 34] for low-dimensional long-term dynamics, we attempted to estimate (i.e. create a nonlinear observer – a “soft sensor” of) the chemonutrient field from local measurements of the bacterial fields and its history [41, 42]. More specifically, we attempted to train a neural network to learn (in a data driven manner) \(c(r_i,t_k)\) as a function of some local space time information:$$c({r_i},{t_k}) = {C_{NN}}(b({r_i},{t_k}),(\nabla b \cdot \hat r)({r_i},{t_k}),(\Delta b)({r_i},{t_k}),b({r_i},{t_{k – 1}}),(\nabla b \cdot \hat r)({r_i},{t_{k – 1}}),(\Delta b)({r_i},{t_{k – 1}})),$$
(8a)
for any discretization point is space \(r_i\) and time point \(t_k, k \geqslant 1\).Indeed, if the long-term simulation data live on a low-dimensional, say \(m-\)dimensional manifold, then \(2m+1\) generic observables suffice to embed the manifold, and then learn any function on the manifold in terms of these observables. Here we attempted an experiment with a neural network that uses a local parametrization of this manifold, testing if such a local parametrization can be learned (in the spirit of the nonlinear discretization schemes of Brenner et al [43]).Fig. 8Transformation on the inputs: (left) representative profiles of both fields b(r, t), c(r, t), (middle) visualization of the singularity, (right) transformed variableThere is, however, one particular technical difficulty: because the long-term dynamics of our problem appear in the form of travelling waves, both the front and the back of the wave appear practically flat – the spatial derivatives are all approximately zero, and a simple neural network cannot easily distinguish, from local data, if we find ourselves in the flat part in front of the wave or behind the wave. We therefore constructed an additional local observable, capable of discriminating between flat profiles “before” and flat profiles “after” the wave. Indeed, when the data represent the spatiotemporal evolution of a traveling wave (as in our training/testing data set), we expect a singularity close to \(\mathbf {b(r,t)=0}\). Clearly, however, the c field is dramatically different on the two “flat bacteria” sides (see the left panel of Fig. 8). When learning such a function locally, to circumvent this singularity, we proposed using a transformation of two of the inputs: \((b,\nabla b \cdot \hat r) \to \left( {b,\arctan (\frac{{\overline {(\nabla b \cdot \hat r)} }}{{\bar b}}} \right)\), where the bar symbol denotes an affine transformation of the respective entire feature vector to the interval \([-1,1]\). This transformation brings points at different sides of the aforementioned singularity at different ends of a line, exploiting their difference in sign (see Fig. 8).Then, the Neural Network was trained to learn the estimator (nonlinear observer) of the chemonutrient field as:$$\begin{aligned} c({r_i},{t_k}) = & {C_{NN}}(b({r_i},{t_k}),\arctan \left( {\frac{{\overline {(\nabla b \cdot \hat r)({r_i},{t_k})} }}{{\overline {b({r_i},{t_k})} }}} \right), \\ & (\Delta b)({r_i},{t_k}),b({r_i},{t_{k – 1}}),\,\,(\nabla b \cdot \hat r)({r_i},{t_{k – 1}}),(\Delta b)({r_i},{t_{k – 1}})), \\ \end{aligned}$$
(8b)
for any discretization point is space \(r_i\) and time point \(t_k, k \geqslant 1\).Fig. 9Learning the c-field with a Neural Network: (left) Field prediction and (right) % relative errorAs it can be seen from Fig. 9 through the model in Eq. 8b it was possible to provide reasonable predictions for the chemonutrient field.Black box model – experimental dataThe chemotactic motion of bacteria can also be studied through laboratory experiments. As shown in [1], chemotactic motion can be tracked using confocal fluorescence microscopy of E. coli populations; thus, we used the data from these prior experiments here. As detailed further in [1], we 3D-printed a long, cylindrical inoculum of densely-packed cells within a transparent, biocompatible, 3D porous medium comprised of a packing of hydrogel particles swollen in a defined rich liquid medium. Because the individual particles were highly swollen, their internal mesh size was large enough to permit free diffusion of oxygen and nutrient (primarily L-serine, which also acts as a chemonutrient), but small enough to prevent bacteria from penetrating into the individual particles; instead, the cells swam through the interstitial pores between hydrogel particles.We hypothesized that the spatiotemporal behavior of cell density observed in the experiments results from a PDE similar to the one used in simulations (Eq. 10). However, as the authors of [1] examine, several corrections are necessary in order to account for cell crowding, nutrient depletion and cell growth. In addition, spatiotempotal observations of the nutrient concentrations were not experimentally feasible to measure. Having no measurements of the spatiotemporal evolution of the chemonutrient, we turned to the methodology described earlier for data-driven models with partial information.Interpolation in time allowed for well-approximated time derivatives as we could choose data along t as dense as necessary. In fact, it was possible to estimate second order in time derivatives, which could be used to learn a second order in time continuous-time PDE in lieu of a delay model [39], such as that used in Eq. 5:$${b_{tt}} = f_{NN}^{\exp }(b,\nabla b \cdot \hat r,\Delta b,{b_t})$$
(9a)
This was treated as two coupled first-order PDEs, using the intermediate “velocity” field v(r, t):$${v_t} = f_{NN}^{\exp }(b,\nabla b \cdot \widehat r,\Delta b,u), \hspace{0.2cm}{\text{ }}{b_t} = v{\text{ }}$$
(9b)
Given the nutrient-starved/hypoxic conditions at \(r \approx 0\) (see Materials and methods), our training data were selected away from the origin. We prescribed bilateral boundary corridors to provide data-informed boundary conditions when integrating the learned PDE.Fig. 10Segmentation of the pre-processed data into: boundary corridors/ discarded data, training subset (the complement), subset chosen for reproduction (red rectangle)Fig. 11Reproduction of experimental observations using a Data-driven Neural Network for the traveling wave regime: (left) Ground Truth, (middle) Neural-Network PDE integration results, (right) % Relative ErrorThe model was validated by integration in the spatiotemporal domain of the formation and propagation of the traveling wave (shown in red in Fig. 10). Integration results can be seen in Fig. 11.

Hot Topics

Related Articles