Generative machine learning produces kinetic models that accurately characterize intracellular metabolic states

Parameterizing biologically relevant kinetic modelsWe developed RENAISSANCE, a machine-learning framework for parameterizing biologically relevant kinetic models. These models are consistent with experimentally observed steady states and produce dynamic metabolic responses with timescales37 that match experimental observations in cellular organisms. The input to RENAISSANCE is a steady-state profile of metabolite concentrations and metabolic fluxes computed by integrating structural properties of the metabolic network (stoichiometry, regulatory structure and rate laws) and available data (metabolomics, fluxomics, thermodynamics, proteomics and transcriptomics) into the model (Fig. 1b,c and Methods).RENAISSANCE uses feed-forward neural networks (generators) to parameterize kinetic models, with the size of generator networks dictated by the complexity of the kinetic model. Using NES, it optimizes the weights of generators in four iterative steps until they produce biologically relevant models (Fig. 1d and Methods). The iterative process starts by initializing a population of generators with random weights (step I). The use of multiple generators facilitates a more thorough and more efficient exploration of parametric space. Each generator takes multivariate Gaussian noise as input and produces a batch of kinetic parameters consistent with the network structure and integrated data. These parameter sets are then used to parameterize the kinetic model (step II). Next, we evaluate the dynamics of each parameterized model by computing the eigenvalues of its Jacobian and the corresponding dominant time constants (Methods). These quantities allow us to assess if the generated kinetic models have dynamic responses corresponding to experimental observations (valid models) or not (invalid models). Based on this evaluation, we assign a reward to the generator (step III). NES repeats steps II and III for every generator in the population, followed by normalizing all rewards. The weights of the parent generator for the next generation are then obtained by using the weights of all the members of the previous generation, weighted by their normalized rewards. Although high-performing generators have a greater impact on the weight of the parent generator in the next generation, lower-performing individuals also contribute. NES subsequently mutates this parent generator by injecting a pre-defined noise level into its weights, thus recreating a population of generators (step I). We iterate steps I–IV until we obtain a generator that meets the user-defined design objective, such as maximizing the incidence of biologically relevant kinetic models (Methods).The generated kinetic models are versatile and applicable to a broad range of metabolism studies (Fig. 1e).Generating large-scale kinetic models of E. coli metabolismWe studied the anthranilate-producing E. coli strain W3110 trpD9923 to test and validate RENAISSANCE. The kinetic model structure for this strain, adopted from Narayanan et al.38, consisted of 113 nonlinear ordinary differential equations parameterized by 502 kinetic parameters, including 384 Michaelis constants, KMs (Methods and Supplementary Fig. 4). It encompasses 123 reactions and describes core metabolic pathways, including glycolysis, the pentose phosphate pathway (PPP), the tricarboxylic cycle (TCA), anaplerotic reactions, the shikimate pathway, glutamine synthesis and a lumped reaction for growth (Methods and Supplementary Fig. 5). The objective was to find kinetic parameters resulting in dynamic models consistent with an experimentally observed doubling time of 134 min for the studied E. coli strain39. A valid kinetic model satisfying this requirement should produce metabolic responses with the dominant time constant of 24 min, corresponding to having the largest eigenvalue λmax < −2.5 (Methods).We used thermodynamics-based flux balance analysis13,40 to integrate experimental data39 and compute 5,000 steady-state profiles of metabolite concentrations and fluxes (Methods). We selected one of these profiles as input for RENAISSANCE (Methods) and identified a set of hyperparameters yielding the best framework performance with a three-layer generator neural network (Methods and Supplementary Notes 2–4). RENAISSANCE was then executed for 50 evolution generations using the optimized settings. We repeated the optimization process ten times with a randomly initialized generator population to obtain statistical replicates. For each generation, we generated 100 kinetic parameter sets for every generator in the population and computed the maximum eigenvalue, λmax, for each parameter set. To evaluate and rank the generators, we used the incidence of valid models, defined as the proportion of the generated models that are valid (with λmax < −2.5; Methods). We observed that the incidence of valid models steadily increases with the number of generations, with the mean incidence converging around 92% after 50 generations (Fig. 2a, thick black line and Supplementary Figs. 14 and 16). For some repeats, we could achieve incidence up to 100% (Fig. 2a, green-shaded region).Fig. 2: Generation, validation and application of RENAISSANCE-parameterized kinetic models.a, The incidence of models exhibiting the desired dynamic properties increases with the number of generations, as indicated by the mean incidence (black line) and the maximum and minimum incidence (green-shaded region) observed over ten statistical repeats for every generation. The dashed line indicates the incidence of a repeat with fast convergence. The black diamonds indicate the generators selected for subsequent analysis from that repeat. b, The distribution of the maximum eigenvalues (λmax) for the generated models over generations. The vertical dashed lines indicate λmax = −2.5 (left) and λmax = 0 (right). c, Robustness analysis. The time evolution of the normalized perturbed biomass, \(v(t)/{v}_{{{\mathrm{ref}}}}\), and concentrations, \(\left(X\left(t\right)-{X}_{\mathrm{ref}}\right)/{X}_{\mathrm{ref}}\), of nicotinamide adenine dinucleotide reduced (NADH), adenosine triphosphate (ATP) and nicotinamide adenine dinucleotide phosphate reduced (NADPH). Shown are the mean response (dashed black line), the 25th–75th percentile (dark-orange region) and the 5th–95th percentile (light-orange region) of the ensemble of responses. The vertical dashed line corresponds to t = 24 min. d, Bioreactor simulations. The time evolution of biomass, glucose concentration and anthranilate concentration in the bioreactor runs of the 13 models closely fitting the experimental data39. The black dots and error bars represent the mean and standard deviation from triplicate experiments, respectively.For further analysis of the generated models, we selected a statistical repeat with fast convergence (Fig. 2a, dashed line) and chose ten generators from that repeat with monotonically increasing incidence over generations (Fig. 2a, black diamonds). For each of the ten chosen generators, we generated 500 kinetic parameter sets and examined the distribution of the resulting maximum eigenvalues (Fig. 2b). Remarkably, the generated models gradually shifted over the optimization process from having slow dynamics (λmax > −2.5) to having fast dynamics, with the metabolic processes settling before the subsequent cell division, indicating that RENAISSANCE-generated models could capture the experimentally observed dynamics.Since cellular organisms maintain phenotypic stability when faced with perturbations41, the generated models that describe cellular metabolism should possess the same property. To test the robustness of the models, we perturbed the steady-state metabolite concentrations up to ±50% and verified if the perturbed system returned to the steady state. For this purpose, we generated 1,000 relevant kinetic models using the final of 10 selected generators (Fig. 2a, generation 45), chosen for yielding the highest incidence of valid models. Inspection of the time evolution of the normalized biomass showed that the biomass returned to the reference steady state (v(t)/vref = 1) within 24 min for 100% of the perturbed models (Fig. 2c). Similarly, the perturbed time responses of a few critical metabolites, namely NADH, ATP and NADPH, returned to their steady-state values within 24 min for 99.9%, 99.9% and 100% of the 1,000 generated kinetic models, respectively (Fig. 2c). Examining every cytosolic metabolite collectively revealed that 75.4% of the models returned to the steady state within 24 min and 93.1% returned within 34 min, demonstrating that the generated kinetic models are robust and obey imposed context-specific observable biophysical timescale constraints.Next, we tested the generated models in nonlinear dynamic bioreactor simulations closely mimicking real-world experimental conditions38,39. The temporal evolution of biomass production showed similar trends as typical experimental observations with clear exponential and stationary phases of E. coli growth (Fig. 2d, Supplementary Note 6 and Supplementary Fig. 6). Similarly, glucose uptake and anthranilate production also reproduce trends observed in experiments with glucose consumption halted and anthranilate production saturating at around 20 h. This study indicates that the RENAISSANCE models can accurately reproduce the physiologically observable and emergent properties of cellular metabolism, even without implicit training to reproduce fermentation experiments.Characterizing the intracellular metabolic states of E. coli
Accurately determining the intracellular levels of metabolite profiles and metabolic reaction rates is crucial for associating metabolic signatures with phenotype. Yet, our capabilities to establish the intracellular metabolic state are limited. Even with the ever-increasing availability of physiological and omics data, a considerable amount of uncertainty in the intracellular states remains. We propose using kinetic models to reduce this uncertainty because of their explicit coupling of enzyme levels, metabolite concentrations and metabolic fluxes. Moreover, kinetic models allow us to consider dynamic constraints in addition to steady-state data, thus allowing us further uncertainty reduction.After integrating available physiology and omics data39,42,43,44 using the constraint-based thermodynamics-based flux balance analysis40, substantial uncertainty was present in the intracellular metabolic state as indicated by the wide ranges of metabolite concentrations and metabolic fluxes. We sampled 5,000 steady-state profiles of metabolite concentrations and metabolic fluxes from this uncertain space and deployed RENAISSANCE to find the fastest possible dynamics (maximum negative eigenvalues, λmax) for each steady state (Methods and Supplementary Fig. 7). We visualized the steady-state profiles by performing dimension reduction with principal component analysis (PCA)45 and t-distributed stochastic neighbour embedding (t-SNE)46 (Methods) and coloured each steady-state profile according to the obtained λmax (Fig. 3a). We observed a high variation in the dynamics (λmax) of the studied steady-state profiles (Fig. 3c, blue distribution). Of 5,000 steady-state profiles, 918 (18.4%) had λmax larger than −2.5, meaning these intracellular metabolic states could not correspond to the experimental observations. Indeed, the dynamic responses corresponding to these states have a time constant superior to 24 min, that is, slower than the experimental observations.Fig. 3: Dynamic characterization reduces uncertainty in intracellular metabolic states.a, The two-dimensional representation of the fastest linearized dynamic modes (corresponding to the maximum eigenvalue λmax) of 5,000 intracellular steady states (reaction fluxes and metabolite concentrations) obtained with PCA and t-SNE46 (Methods). Each point represents a steady state, with its colour indicating the corresponding λmax value computed by RENAISSANCE. b, Magnified view of 22 neighbouring steady states with fast dynamics (−3.8 ≤ λmax ≤ −8.5). The colour scheme is the same as in a. c, Distributions of the fastest linearized dynamic (λmax) for all 5,000 steady states (blue) and the 22 steady states shown in b (green). d, The linearized dynamics landscape of the 22 fast steady states in the reduced space (Methods). The triangles represent the location of the steady states in the landscape. e, The landscape in d is enhanced by sampling 90 additional steady states in the neighbourhood of the initial 22 steady states. The circles represent the location of the newly sampled steady states in the same landscape as in d. f, Distributions of the fastest linearized dynamic (λmax) for all steady states (blue) and 90 steady states sampled in e (pink). g, The concentration of 3-phosphoglyceric acid (3pg) (mM) versus the fastest linearized dynamic in every steady state. The colour scheme is the same as that in a. The horizontal black line indicates the cut-off for valid models (λmax = −2.5). The peach-shaded region shows the range of 3-phosphoglyceric acid concentration that does not allow fast dynamics. h, The dynamic landscape of 40 steady states sampled by constraining the metabolite concentrations of 30 metabolites to ranges that do not support fast dynamics. The diamonds represent the location of the steady states. i, Distributions of the fastest linearized dynamic (λmax) for the 40 steady states sampled in g (peach), compared with all steady states (blue) and those sampled in d and e (pink).As t-SNE optimizes the preservation of local distances between points when projecting them from a high-dimensional space to a lower-dimensional one46,47, we hypothesized that sampling from a region containing closely positioned steady-state profiles associated with fast dynamics (Fig. 3a, blue dots) would yield steady-state profiles that satisfy dynamic requirements. Conversely, sampling from a region around adjacent profiles corresponding to slow dynamics (Fig. 3a, yellow dots) would probably result in profiles not meeting dynamic requirements.To test this hypothesis, we selected one of these local regions (Fig. 3b), which contained 22 steady states with fast dynamics with −3.8 ≤ λmax ≤ −8.5 (Fig. 3c, green distribution), and analysed its neighbourhood (Fig. 3d). We sampled 90 additional steady states within this neighbourhood from the Gaussian distribution with a mean and standard deviation estimated on the initial 22 steady states. The sampled steady states allowed us to improve the resolution of the initial dynamic landscape (Fig. 3e, circles). Crucially, the sampled steady states had linearized dynamics in the same range as the initial 22 states (Fig. 3d–f), confirming our hypothesis. Therefore, RENAISSANCE allows us to select subsets of intracellular states consistent with experimentally observed dynamics and generate additional ones with the same characteristics. Moreover, it allows us to discard subregions with experimentally inconsistent states, thus reducing uncertainty. Indeed, sampling from a region containing closely positioned steady-state profiles associated with slow dynamics yielded steady states corresponding to similarly slow dynamics (Supplementary Fig. 13).We next examined individual metabolite concentrations of the 5,000 steady-state profiles to identify patterns corresponding to the experimentally observed phenotype. We observed a clear bias in the dynamics depending on the concentrations for some of the metabolites (Fig. 3g and Supplementary Fig. 7). For example, in the case of 3-phosphoglyceric acid, we obtain models with relevant dynamics only when the concentration of this metabolite is less than ∼0.002 mM. In contrast, steady-state profiles with 3-phosphoglyceric acid concentrations between 0.002 and 0.003 mM do not have relevant dynamics (Fig. 3g). To investigate this further, we identified 30 cytosolic metabolites that showed such concentration biases by visual inspection (Supplementary Fig. 8) and sampled 40 new steady states from the same Gaussian distribution as before (Fig. 3d) but constrained the selected 30 metabolites to concentration ranges that do not support relevant dynamics (for example, the peach-shaded region in Fig. 3g). As expected, almost all of these new intracellular states did not yield models with relevant dynamics (Fig. 3h,i). This result demonstrates that information stemming from the dynamic responses can be used to constrain values of intracellular metabolites to specific ranges.Overall, the dynamic characterization of a broad range of intracellular states allows us to reduce uncertainty at the level of steady-state profiles, as well as individual metabolite concentrations and metabolic fluxes.Integration and reconciliation of experimental informationExperimentally measured Michaelis constants, KMs, are curated in comprehensive databases containing functional and molecular information of enzymes such as BRENDA48. However, as we transition to large genome-scale kinetic models, a vast majority of the associated kinetic parameters remain unknown. Integrating experimental results from in vivo and in vitro studies, despite the disparities in their parameter values, can help further constrain uncertainty and lead to a more accurate description of intracellular metabolic states. To this end, we retrieved experimentally measured values for 108 out of 384 KMs in our model from BRENDA (Methods).To investigate how the integrated kinetic data constrain unknown kinetic parameters, we started by integrating four KM values of aconitase (ACONTa,b) from TCA (Fig. 4a and Methods), obtained generators with a high incidence of valid models (>99%) and generated 500 valid kinetic models (Supplementary Fig. 9). To quantify the effect of integrating one experimental KM value on the generated values of the other kinetic parameters, we compared the estimates of the other KMs and maximum velocities, vmax, with ones obtained when no kinetic parameters were integrated. Integration of KM values of aconitase at a reaction level restricted the estimates of \({v}_{\max }^{{{\mathrm{ACONTa}}}}\) (Fig. 4b). Due to the correlation in the vmax values throughout the network, restricting \({v}_{\max }^{{{\mathrm{ACONTa}}}}\) estimates through KM integration constrained the estimated ranges of other maximal velocities, such as \({v}_{\max }^{{{\mathrm{ICDHyr}}}}\) (Fig. 4b). This restriction further affected downstream KM values in the network, such as \({K}_{{\mathrm{M}},{{\mathrm{akg}}}}^{{\,{\mathrm{ICDHyr}}}}\) and \({K}_{{\mathrm{M}},{{\mathrm{succoa}}}}^{{{\mathrm{AKGDH}}}}\) (Fig. 4b). These results suggest that integrating only a small amount of experimental data, localized to one enzyme (ACONTa,b), propagates throughout the metabolic network and alters the rest of the kinetic parameters.Fig. 4: Integrated experimental KM values for aconitase affect parameters of neighbouring reactions.a, RENAISSANCE allows direct integration of KM values from literature and reconciling them with unknown parameters that collectively lead to valid kinetic models. b, Propagation of the integrated KM experimental data around ACONTa,b through the metabolic network. Comparison of RENAISSANCE-generated values for \({K}_{{\mathrm{M}}{{,}}{{\text{acon-C}}}}^{{{\mathrm{ACONTa}}}}\) versus \({v}_{\max }^{{{\mathrm{ACONTa}}}}\), \({v}_{\max }^{{{\mathrm{ICDHyr}}}}\) versus \({v}_{\max }^{{{\mathrm{ACONTa}}}}\), \({v}_{\max }^{{{\mathrm{ICDHyr}}}}\) versus \({K}_{{\mathrm{M}},{{\mathrm{akg}}}}^{{\,{\mathrm{ICDHyr}}}}\), and \({K}_{{\mathrm{M}},{{\mathrm{succoa}}}}^{{{\mathrm{AKGDH}}}}\) versus \({K}_{{\mathrm{M}},{{\mathrm{akg}}}}^{{\,{\mathrm{ICDHyr}}}}\) when (i) no kinetic parameters are integrated (grey circles) and when (ii) four parameters are integrated (orange circles; Methods). ICDHyr, isocitrate dehydrogenase; AKGDH, 2-oxoglutarate dehydrogenase; SUCDi, succinate dehydrogenase (irreversible); SUCOAS, succinyl-CoA synthetase (ADP-forming); CS, citrate synthase; MDH, malate dehydrogenase; MALS, malate synthase; FUM, fumarase; q8h2_c, ubiquinol-8; q8_c, ubiquinone-8; succ_c, succinate; coa_c, coenzyme A; pi_c, phosphate; succoa, succinyl-CoA; nadh_c, nicotinamide adenine dinucleotide-reduced; nad_c, nicotinamide adenine dinucleotide; nadph_c, nicotinamide adenine dinucleotide phosphate-reduced; nadph, nicotinamide adenine dinucleotide phosphate; icit_c, isocitrate; cit_c, citrate; accoa_c, acetyl-CoA; oaa_c, oxaloacetate; mal-L_c, L-Malate; fum_c, fumarate; glx_c, glyoxylate; acon-C, cis-aconitate; akg, 2-oxoglutarate; succoa, succinyl coenzyme-A.We next enquired if RENAISSANCE improves its KM estimates as the number of integrated experimental KM values increases. We also examined how the localization of integrated KM values, such as the integration of KM values from TCA, affects the estimation of KM values in other subsystems of the metabolic network. Specifically, we integrated 10 random combinations of half (9) of the 17 available experimentally measured KM values associated with the TCA of E. coli, one combination at a time. For each of the 10 combinations, we obtained generators with a high incidence of valid models (>90%) and generated 2,000 of these models. In total, we generated 20,000 models containing 10 distinct combinations of the remaining 8 Michaelis constants to be estimated. This process ensured that each of the 17 Michaelis constants was integrated at least once and estimated at least once within the 10 combinations.The comparison between the experimentally observed and the RENAISSANCE estimated range of TCA KM values, quantified through the overlap score (OS) between these two ranges (Fig. 5g), showed that integrating KMs improves the estimates of the non-integrated individual KMs within the same subsystem (Fig. 5a, red bars and Supplementary Fig. 9), compared with when no KM values are integrated (Supplementary Fig. 15, black diamonds). Indeed, noteworthy improvement was observed in the predictions of 16 out of the 17 KM values in TCA when experimental values of KM were integrated. The average prediction accuracy for the entire subsystem also increased (Fig. 5b, red bars) compared with the case with no integration of experimental KMs (blue bars). A similar analysis was conducted for other subsystems, PPP, glycolysis, anaplerotic reactions, shikimate pathway and pyruvate metabolism, and consistently, estimates of KM values within the same subsystem improved upon the integration of experimental information for all the cases (Fig. 5c,d and Supplementary Fig. 10). These findings indicate that integrating experimental information may improve prediction accuracy beyond the subsystem level.Fig. 5: Integrating experimental kinetic information improves other parameter estimates.a, The OS (defined in d) of RENAISSANCE estimates for 17 experimentally available Michaelis constants, KMs, from TCA (red). The estimation process involved integrating experimental values for 9 out of the 17 parameters and estimating other model parameters including the remaining 8 parameters from TCA. The procedure was repeated ten times with a different random combination of nine parameters. The black diamonds represent the OS of the estimates without integrating experimental data. The error bars indicate the standard error in the OS over ten combinations of integrated parameters. b, The mean OS for the estimated KMs within metabolic subsystems before (blue) and after (red) integrating experimentally measured KMs. The estimation process for each subsystem was conducted in a similar manner as in a. c, The OS of RENAISSANCE estimates for 91 experimentally available KMs not belonging to TCA when 10 different combinations of 9 (out of 17) KMs from TCA were integrated. The black diamonds represent the OS without integration. The error bars represent the standard error in OS. d, The mean OS of the estimated KMs from all metabolic subsystems except for the subsystem labelled on the x axis before (blue) and after (red) integrating KMs from the labelled subsystem. e, The metabolic subsystems of the top 15 KMs with the highest increase in OS when Michaelis constants from the labelled subsystem were integrated. The donut plots indicate the count of integrated experimental KM values from each labelled subsystem. f, The two-component PCA48 representation of the 276 (out of 384) experimentally unverified kinetic parameters with and without integrating experimentally measured Michaelis constants. g, The OS of an estimated KM is calculated as the ratio between (x) the overlap of the RENAISSANCE predicted (teal) and the experimentally observed range (violet), and (y) the RENAISSANCE predicted range. anpl, anaplerotic reactions; glu, glutamate metabolism; gly, glycolysis/gluconeogenesis; glse, glycine/serine metabolism; hist, histidine metabolism; nsp, nucleotide salvage pathway; oxp, oxidative phosphorylation; pyr, pyruvate metabolism; shkk, shikimate pathway; oth, other reactions.Inspecting the distributions of the generated KMs that were not part of the TCA subsystem revealed that the predictions for a vast majority of these KMs (85 out 91) improved upon the integration of TCA KMs (Fig. 5c, coloured bars) compared with the case where no KMs were integrated (black diamonds). Similarly, the mean OS of the entire set increased (Fig. 5d). We then examined the top 15 KMs that exhibited the most marked improvement in their estimates and determined the metabolic subsystem in which they are located. The integration of experimental KM values from TCA yielded the most notable improvement in the estimates of the shikimate pathway (6 in the top 15), followed by glycolysis (3 out of 15) and anaplerotic reactions (2 out of 15) (Fig. 5e, leftmost donut plot). Interestingly, a similar analysis conducted by integrating KMs from other subsystems showed that the estimates from these three subsystems (shikimate pathway, glycolysis and anaplerotic reactions) consistently yielded the most notable improvement (Fig. 5e and Supplementary Fig. 11). These results provide evidence that RENAISSANCE effectively incorporates experimental kinetic data from a specific subsystem of the metabolic network, resulting in improved parameter estimates across the entire network.We further examined the impact of integrating experimental kinetic data on parameters that lack verifiable experimental measurements, which accounted for 276 out of 384 KMs. To obtain a qualitative assessment of the effects of integration, we employed PCA45 to visualize the RENAISSANCE predictions for these unknown KMs (Fig. 5f). The analysis revealed notable shifts in the estimates of these KMs when experimental data were integrated compared with the case where no data were integrated (Fig. 5f, blue cluster). Additionally, the estimates for the cases with integrated experimental data exhibited greater similarity than those without integration. We provide the generator, trained to incorporate all 108 KMs available from BRENDA (Supplementary Fig. 12), in Supplementary Information.These results suggest that integrating experimental kinetic information reduces quantitative uncertainties in the intracellular metabolic state of the cell, allowing RENAISSANCE to make more informed predictions on the dynamic properties of the entire metabolic network. We anticipate that the inclusion of new experimental data and their subsequent integration will enhance the predictive capabilities of RENAISSANCE even further.

Hot Topics

Related Articles