Constraint-based modelling predicts metabolic signatures of low and high-grade serous ovarian cancer

Determining the appropriate range of media constraintsecGEMs were constructed for three low-grade (59M, HEYA8 and OV56) and three high-grade serous ovarian cell lines (CAOV3, COV318 and OAW28). To direct the selection of cell lines for sample groups, the entire subset of CCLE LGSOC and HGSOC cell lines was visualised using PCA and labelled according to their subtype (Fig. 1), media conditions and site of origin (Supplementary Fig. 1). The purpose of this visualisation was to deduce whether or not the contrasting media conditions and site of origin directly affected gene expression, and if so, the acceptable range of these variables for the cell lines to be compared in this study. The constrained models had their media completely standardised to DMEM, then one single cell line (CAOV3) had its media systematically changed to assess how this affected the spread of points on PCA (Supplementary Fig. 2). These comparisons showed that whilst DMEM remained a main component of the media, the spread of metabolic flux results did not change.Fig. 1: Visualisation of selected cell lines.2D Principal components analysis visualising the spread of the gene expression (CCLE) of LGSOC and HGSOC cell lines, with chosen cell lines highlighted. ‘True’ labels were assigned according to Barnes et al. 32. Teal are high-grade, and pink are LGSOC.Defining media constraints within the Human1 modelPrior to the integration of transcriptomics data, the media conditions were defined for each cell line model (Supplementary Table 1). Media conditions were integrated using the MEWpy ‘get_simulator’ function, setting the final media dictionary as the ‘envcond’ argument37. A limitation of defining media for human GEMs is the inclusion of foetal bovine serum (FBS), which is a complex, difficult-to-define solution. Therefore, FBS components were estimated by reopening specific reactions according to predicted essentiality (relaxing bounds back to original −1000 mmol/gDW/h or 0 mmol/gDW/h and +1000 mmol/gDW/h); for example, the exchange reaction ‘MAR09107’, which imports haem into the cytosol, was reopened in the media dictionary for the HEYA8 low-grade cell line. Haem is suggested to be present in FBS preparations at a concentration between 1.1 μM and 2.0 μM40, which validates the assumption that this metabolite is imported into the cellular models, despite it not being specified in DMEM ingredients. Final copies of media-constrained models were saved and utilised in the transcriptomics integration stages described below (details available on https://github.com/katemeeson/repository_to_accompany_paper_2023).Integrating transcriptomics data with Human1 gene-protein-reaction rulesThe novel transcriptomics integration method presented here is based on the incorporation of expression data based on cross-reference with Human1 gene-protein-reaction rules, which define the enzymes and their genes required for reaction catalysis. Every gene reaction rule contained within Human1 has a lower and upper bound, with default excess reaction rate limits of −1000 mmol/gDW/h and +1000 mmol/gDW/h, or 0 mmol/gDW/h and 1000 mmol/gDW/h, based on reaction reversibility. Whether or not the expression data is applied directly, as a sum or as a minimum value, depends on the category of the reaction rule (Table 1). Reaction rules using the ‘andor’ term have not been included in this integration method, due to the fact they only represent 1.6% of the total annotated portion of the model.Table 1 Method of rule-dependent expression data integrationTranscriptomics data was integrated by reopening and constraining reactions, based on the individual gene’s inclusion in the CCLE expression dataset, media conditions, essentiality predictions and experimentally defined growth thresholds. The workflow for transcriptomics integration has been summarised as a decision tree (Fig. 2). To begin with, the algorithm asked whether the reaction had already been defined in media conditions, and whether it was essential. If the answer to these queries was negative, then the algorithm searched the input transcriptomics data for the presence of a gene(s) described in the gene-protein-reaction rule. If the gene(s) for this reaction were present in the input dataset, then the reaction was provisionally constrained according to gene expression (Table 1). It has been reported that gene and protein expression within the CCLE dataset have low correlation31, and also that many omics integration algorithms, such as iMAT, do not accurately predict growth rates as they do not account for metabolic deviation or adaptation from a transcriptomics signature41. Therefore, here, experimental growth rates have been used as a limit for model-predicted growth, and we reopened any reactions which forced growth rates beyond this limit. This means experimental growth thresholds are serving to estimate where gene expression does not accurately represent enzyme abundance and predict experimentally derived growth rates. This growth-directed ‘omics integration aims to compensate for poor correlation between the expression of some genes and proteins, by comparing growth predictions to experimental measurements with each subsequent expression value which is integrated, therefore this should be an indicator of gene and protein expression correlation. However, in general, we advise future algorithm users that a Pearson correlation value of 0.5 (and a significant associated p-value) should indicate a good correlation.Fig. 2: The transcriptomics integration workflow.First, the algorithm checks whether the target reaction has already been defined in the media conditions, and if so, it leaves the reaction bounds as they are. Next, if the target reaction has been defined as essential according to a MEWpy simulation, the algorithm leaves the reaction bounds as they were originally (−1000 mmol/gDW/h or 0 mmol/gDW/h and 1000 mmol/gDW/h, depending on reversibility). Then, if the gene(s) for this reaction are contained within the input transcriptomics dataset, then the reaction is provisionally constrained according to the corresponding reaction rule described in Table 1. Finally, FBA is run and if the solution is not below the experimentally defined minimum growth rate, the constraints may remain, however, if a growth rate is predicted which is below this experimental threshold, then the reaction’s bounds are relaxed.Using experimental maxima to tailor model-predicted growth ratesPrior to data integration, the flux through biomass production was approximately 187 g/gDW/h. Following the application of media constraints described above, the doubling time dropped to between 4.5 h and 9 h for every model (the equivalent of between 0.11 g/gDW/h and 0.22 g/gDW/h) (Fig. 3). Although a lot more realistic than the original, unconstrained solution, these doubling times were still considered too dissimilar to experimental values and needed supplementing with gene expression constraints to reach a more realistic value and construct cell-line-specific models.Fig. 3: Visualisation of experimental growth thresholds.Experimental doubling times and their origin have been shared in Supplementary Table 2. A Media constraints were applied to unconstrained Human1 using MEWpy (black bars), and then gene expression was integrated (grey bars) using our novel integration algorithm. Here, no growth thresholds were applied. B Experimental growth measurements have been translated to thresholds for model growth predictions.Once gene expression values had further constrained models, FBA-predicted growths were compared, both before and after the application of experimental growths as model thresholds. Before the application of thresholds, models predicted a maximum doubling time of between approximately 35 h and 41 h, across all cell lines (Fig. 3A). Experimental thresholds were most effective at tailoring model growth predictions towards experimental values in the quicker proliferating cell lines, HEYA8 and OV56 (Fig. 3B). On the other hand, no reactions had to be reopened for 59M, CAOV3, COV318 and OAW28, because the FBA-predicted doubling time never reached the experimental limit. The fact that no reactions were reopened in these four models means that constrained reaction bounds directly reflect gene expression data and media conditions. It has been concluded that using the experimental growth values as model thresholds helped generate experimentally representative metabolic predictions regarding biomass production. Thus, this investigation was allowed to proceed to flux analysis.An analysis visualising the FBA-predicted growth before any model constraints, after media constraints, and finally after gene expression and experimental growth threshold constraints have been included in Supplementary Data 1 to illustrate the impact of ‘omics integration (Supplementary Fig. 3).Reopening reactions to satisfy experimental growth thresholds does not skew downstream metabolic predictionsOf the 13,096 reactions contained within the Human1 model, only six unique reactions had to have their constraints relaxed (bounds forced to −1000 mmol/gDW/h or 0 mmol/gDW/h, and +1000 mmol/gDW/h, depending on reversibility) across the cell lines to satisfy experimental growth thresholds (Table 2). Therefore, the majority of fluxes analysed were representative of gene expression data. Several reactions were reopened in HEYA8 and OV56, to bring the model-predicted doubling time back within the experimental thresholds (Table 2). For HEYA8, there were four reactions reopened: three of these were involved in oxidative phosphorylation, whilst reaction ID MAR07663 occurred within the biotin metabolism subsystem. Concerning OV56, there were three reactions reopened: two of which represented a stage of oxidative phosphorylation, and the remaining MAR00786 which was involved in sphingolipid metabolism. There are only ten reactions in the Human1 model which constitute oxidative phosphorylation, meaning that a large proportion of reaction bounds within this subsystem have been reopened to excess (−1000 mmol/gDW/h or 0 mmol/gDW/h, and 1000 mmol/gDW/h), and in turn are not directly representative of cell line-specific gene expression. Flux predictions for these reopened reactions have been listed in Supplementary Data 2, and show that as expected, there were increased flux predictions for the cell lines for which these reactions were reopened. Therefore, although these relaxed reaction rate boundaries do not necessarily mean the FBA-predicted reaction flux is not realistic, conclusions made during this investigation will not involve oxidative phosphorylation.Table 2 Reactions reopened to keep model-predicted doubling times below experimental thresholdsThe fluxes through reactions immediately downstream of the reopened reactions (Table 2) have been described in Supplementary Data 2, to check if the reopening of model reactions to satisfy experimentally predicted growth rates skewed overall flux results. Not all downstream fluxes were analysed, as some reopened reactions produced metabolites involved in hundreds of reactions, for example, MAR06916, which produces mitochondrial ATP, is connected to 1114 reactions via its products. Results show that in general, there is no overarching trend of downstream reactions to match the flux pattern of those reopened immediately upstream. For example, MAR06918 was reopened in HEYA8, and is connected to twenty-eight reactions immediately downstream via the metabolites ferricytochrome C and ubiquinone. Of these downstream reactions, only eight showed an increased flux in HEYA8, whilst the remaining twenty reactions show either zero flux across all cell lines or are increased in cell lines other than HEYA8. Furthermore, another reaction reopened in HEYA8 was MAR13081, and of the five reactions immediately downstream of MAR13081, four either showed zero flux across all cell lines or had a higher predicted flux in another cell line than HEYA8. This indicates that the relaxing of a few reactions’ bounds to tailor growth rates towards experimental measurements does not skew the overall metabolic flux through cell line-specific models, and fluxes could be more dependent on the transcriptomics measurement—or lack of—which constrains them specifically, or the flux through other upstream reactions.Identifying subtype-specific differences in central metabolismFollowing analysis of growth predictions, individual reaction fluxes were compared. In order to compare fluxes between low- and high-grade samples, we specified an initial criterion for difference between models, comprising multiple factors: there must be a 10% increase or decrease between the mean flux value for the given reaction; at least one flux value above 0.5 mmol/gDW/h for unidirectional reactions, or −0.5 mmol/gDW/h for reversible reactions, or reaction flux has changed direction between low- and high-grade models. Of the 13,096 total reactions, 25.5% of the non-exchange/demand reactions had a predicted non-zero flux through at least one of the cell lines. This meant 74.5% of metabolic activity was predicted to be switched off across all of the cell line-specific models. Of the non-zero subset, there were 1081 reactions which satisfied the criteria for difference—corresponding to 8.25% of active metabolism which initial criteria suggested may display a subtype-specific flux. This initial subset of reactions has been provided in Supplementary Data 2.The next step of this analysis was to gain an overview of the central metabolic subsystems which were enriched by these 1081 potentially differentially-regulated reactions. Central metabolism was broadly defined according to the ‘global and overview maps’ listed in KEGG (KEGG PATHWAY Database, 2022)42,43,44. Specifically, the following subsystems were included in this subset: nucleotide metabolism, lipid metabolism, fatty acid metabolism, carnitine shuttle, central carbon metabolism (including glycolysis/gluconeogenesis, tricarboxylic acid and glyoxylate/dicarboxylate metabolism and the pentose phosphate pathway), and amino acid metabolism. A database annotation search was performed in Metabolic Atlas, and of the initial 1081-reaction subset described above, 710 were annotated with gene-protein-reaction rules, specifying the enzyme(s) which were required for catalysis; of these, there were 313 reactions belonging to central metabolic processes. Nucleotide metabolism was the subsystem containing the greatest number of reactions satisfying our initial criteria for subtype-specific flux, with 115 reactions showing differences in their mean fluxes (Table 3).Table 3 Overview of the central metabolic subsystems containing top model-predicted differencesThe metabolic subsystems described above have been analysed at the individual reaction level, to further understand what subtype-specific differences models might predict (Fig. 4). Results have been organised to present model-predicted flux through individual reactions across central subsystems and smaller branches of these subsystems according to Metabolic Atlas19 (Fig. 4). For individual and grouped reactions, flux has been averaged across (n = 3) low- and (n = 3) high-grade models to help reveal metabolic patterns which are consistent across different models of the same subtype.Fig. 4: Reaction-level flux differences between low- and high-grade ovarian models.A Glycolysis subsystem. Reactions in glycolysis which models have predicted to be differentially regulated across between low- and high-grade serous ovarian cancer. B TCA cycle subsystem, low- versus high-grade metabolic predictions. C Pentose phosphate pathway subsystem, low- versus high-grade metabolic predictions. D Amino acid metabolism subsystem, low- versus high-grade metabolic predictions. E Nucleotide metabolism subsystem, low- versus high-grade metabolic predictions. F Fatty acid metabolism subsystem, low- versus high-grade metabolic predictions. G Lipid metabolism subsystem, low- versus high-grade metabolic predictions. Plots to show model-predicted fluxes (mmol/gDW/h) through central metabolism, averaged across n = 3 models per subtype. Some of these reactions are reversible, however, for clarity of visualisation, the direction shown is such that positive reaction flux corresponds to the simplified reaction equation on the x-axis. The colour of the bar corresponds to the subtype, as indicated in the key. The mean value is plotted, and the standard error margin is shown on each bar. Some subsystems, e.g. amino acid metabolism, have had multiple reaction fluxes grouped into subcategories and averaged; individual reaction fluxes are available in Supplementary Data 3. Compartments: all glycolytic reactions are cytosol unless specified (‘c’ = cytosol; ‘x’ = peroxisome); all TCA cycle occurs in mitochondria; all pentose phosphate pathways in cytosol unless specified (‘r’ = ribosome); amino acid metabolism, nucleotide metabolism, fatty acid metabolism and lipid metabolism occurs across a range of subsystems. (G glucose, G6P glucose-6-phosphate, F6P fructose-6-phosphate, F1,6BP fructose-1,6-bisphosphate, DHAP dihydroxyacetone phosphate, GAP glyceraldehyde 3-phosphate, E4P erythrose-4-phosphate, S1,7BP sepoheptulose-1,7-bisphosphate, 1,3BPG 1,3-bisphosphoglycerate, 2,3BPG 2,3-bisphosphoglycerate, 3PG 3-phosphoglycerate, 2PG 2-phosphoglycerate, PEP phosphoenolpyruvate, Pyr pyruvate, Lac l-lactate, OAA, oxaloacetate, R1P ribose-1-phosphate, PRPP phosphoribosyl pyrophosphate, X5P xylulose 5-phosphate, S7P sedoheptulose 7-phosphate, G1,5L6P glucono-1,5-lactone-6-phosphate).Initially, there were eleven glycolytic reactions included in the subset of 313 reactions, which satisfied our initial criteria for differential regulation. Models indicate that the oxidation of lactate to pyruvate in the cytosol (MAR04388) is likely to be increased in low- compared to high-grade models, as indicated by a mean flux value which is over 3× higher, and non-overlapping standard error (Fig. 4A). Similarly, low-grade models predict an increased rate of the reduction of pyruvate to lactate via the peroxisomal lactate-pyruvate redox shuttle (MAR04281), which regenerates NAD+ for β-oxidation45 (Fig. 4A).There are eleven reactions across the TCA cycle which have a predicted reaction flux at least 10% different between low- and high-grade models (Fig. 4B). A futile cycle is a set of two simultaneous reactions, which flow in opposite directions, and result in no net gain or loss of reactants or products46. Although potentially an artefact of FBA, futile cycles are a true phenomenon in biology. Models suggest the presence of a futile cycle between reactions MAR04588 and MAR04586, which involve the dehydrogenation of isocitrate to oxalosuccinate and the re-conversion of this back to isocitrate, respectively (Fig. 4B). On a separate note, models predict some upregulation of the TCA cycle in high-grade models, specifically MAR08743 and MAR04652, which catalyse the reversible oxidation of succinate to fumarate via reduction of FAD and ubiquinone, respectively (Fig. 4B). These TCA reactions do not occur in a futile cycle, because the production of fumarate via MAR04652 occurs at a slightly higher rate than MAR08743, which reduces fumarate to succinate.Initially, there were eleven reactions in the pentose phosphate pathway which were predicted to have at least 10% change in flux between low- and high-grade models (Fig. 4C). Results indicate that of these, there were four which showed the strongest subtype-specific regulation (Fig. 4C). Ribose-1-phosphate comes from nucleotide metabolism and is isomerised by phosphoglucomutase to generate ribose-5-phosphate (MAR04354), which feeds into the non-oxidative phase of the pentose phosphate pathway. Flux analysis predicts that this process is upregulated in the forward direction in the low-grade models (Fig. 4B). Furthermore, models predict the upregulation of reactions MAR04501 and MAR04404 in low-grade models, which would lead to the increased production of xylulose-5-phosphate and ribose-5-phosphate, and xylulose-5-phosphate and erythrose-4-phosphate, respectively (Fig. 4C). Finally, models predict that the conversion of ribose-5-phosphate to phosphoribosyl diphosphate (MAR04052) is predicted to be upregulated in low-grade models (Fig. 4B).Regarding amino acid metabolism, there is an increase in the predicted fluxes across both lysine and phenylalanine metabolism in low-grade models (Fig. 4D). Considering high-grade models, there is an increased average predicted flux across valine, leucine and isoleucine reactions (Fig. 4D). Amino acid metabolism connects to nucleotide metabolism via metabolites such as fumarate, which feed into nucleotide, purine and pyrimidine synthesis. However, despite results indicating amino acid metabolism as being differentially regulated between low- and high-grade models, results suggest no subtype-specific patterns of nucleotide metabolism (Fig. 4E).Fatty acid metabolism can be subdivided into its reaction categories, namely oxidation, activation and biosynthesis (Fig. 4F). Interestingly, results indicate potential low- and high-grade signatures within this subsystem. Across low-grade models, reactions involved in fatty acid oxidation and activation (in the endoplasmic reticular) are upregulated (Fig. 4F). In contrast, there is an increased mean flux in high-grade models through fatty acid biosynthesis, to produce even-chain fatty acids (Fig. 4F).Finally, lipid metabolism may be divided into multiple subcategories, depending on the class of lipid molecule being metabolised. Flux analysis predicts a higher flux across reactions of glycerolipid and glycerophospholipid metabolism for low-grade models, each of which contains two and six reactions upregulated in low-grade models, respectively (Fig. 4G).A proposed mechanism for low- vs HGSOC metabolismCBM has predicted multiple subtype-specific differences which might exist for serous OC, and these have been summarised as a proposed mechanism (Fig. 5). Overall, results indicate an upregulation of the biosynthesis of even-chain fatty acids in high-grade models, as well as increased flux through valine, leucine and isoleucine metabolism. In addition, results highlighted a futile cycle within the TCA cycle of high-grade models – comprising the interconversion of isocitrate and oxalosuccinate in the TCA cycle.Fig. 5: Proposed mechanism for model-predicted low- and HGSOC metabolic differences.Compartments have been labelled (and where they are not labelled, it is because there are multiple compartments per subsystem), and the colour of the arrow denotes upregulation in LGSOC (pink) and HGSOC (teal) models. Human1 reaction IDs annotated where an individual reaction has been described. A solid arrow corresponds to a single reaction or general one-step conversion, and a dotted arrow denotes multiple reactions simplified. (X5P xylulose-5-phosphate, GAP glyceraldehyde 3-phosphate, E4P erythrose-4-phosphate, F6P fructose-6-phosphate, S7P sedoheptulose 7-phosphate, R5P ribose-5-phosphate, R1P ribose-1-phosphate, PRPP phosphoribosyl pyrophosphate, FA fatty acid, CPT I/II carnitine palmitoyltransferase I/II, IMM inner mitochondrial membrane, OAA oxaloacetate).However, what was clearer, were the multiple reactions and subsystems suggested to be upregulated in low-grade models. Results indicate an increased reaction flux beginning with fatty acid activation in the cytosol, passing via the carnitine shuttle into the mitochondria, where fatty acid oxidation occurs. As a result, there could be an increased rate of release of acetyl-CoA and NADH which feed into the TCA cycle (Fig. 5). In low-grade models, there is also a predicted increased rate of metabolic flow through the non-oxidative phase of the pentose phosphate pathway, leading to an increase in production of ribose-5-phosphate, which is then converted to phosphoribosyl pyrophosphate—an intermediate in the production of pyrimidine and purine nucleotides. Results predict that there is an increased rate of oxidation of lactate to pyruvate in the cytosol of low-grade cell lines, as well as an increased reduction of pyruvate to lactate via the peroxisomal lactate-pyruvate redox shuttle, and this regenerates NAD+ which may feed into fatty acid oxidation (Fig. 5). Flux analysis also predicts an increase in flux through glycerolipid and glycerophospholipid metabolism, which contributes to the production of mono-, di- and triacylglycerols, as well as phosphatidic and lysophosphatidic acids (Fig. 5).Gene dependency experimental data supports model predictionsIn order to validate model predictions, an ORA was performed in WebGestalt, using experimental CCLE CRISPR gene dependency data as an input29. This analysis suggested the metabolic subsystems which low-grade cell lines rely on for growth. Of the mean gene dependency scores (low-grade average, and high-grade average), the 250 gene IDs which showed the highest dependency in low-grade cell lines were used as input for the ORA. Of these 250 IDs, 246 could be mapped to Entrez gene IDs, and 116 of these IDs could be annotated with functional categories in WebGestalt. The most enriched pathway was the pentose phosphate pathway, with an FDR value of 0.0099033, a p-value of 8.7 × 10−5 and an enrichment ratio of 10.73. There were five genes identified from the mapped input which were involved in the pentose phosphate pathway, including: glucose-6-phosphate dehydrogenase (G6PD), glucose-6-phosphate isomerase, 6-phosphogluconolactonase, ribose 5-phosphate isomerase A and transketolase (TKT).Assuming that genes with high dependency scores upregulate their corresponding model reactions to a greater extent, the CCLE gene dependency data agrees with model predictions. To begin with, G6PD catalyses the oxidation of glucose-6-phosphate to glucono-1,5-lactone-6-phosphate (MAR04304), a reaction with a higher average flux in low-grade models (Supplementary Data 2). Similarly, the two TKT-catalysed reactions of the Human1 model (MAR04501 and MAR04404) showed upregulation in low-grade models, which matched the relatively higher gene dependency scores listed on the CCLE.As well as the ORA, we performed an in silico KO simulation to determine whether models could replicate the growth patterns reported in the CCLE CRISPR growth dependency dataset. Here, single genes were knocked out, and growth was calculated before and after, to show which genes were essential for cell lines to reach optimal growth rates. This analysis was performed on the essential genes of the ovarian models, namely those which had a predicted effect on biomass production once knocked out. This left between 263 and 391 essential genes predicted per cell line, and there was a general pattern of negative correlation (Pearson correlation coefficients between −0.343 and −0.462; https://github.com/katemeeson/repository_to_accompany_paper_2023), showing that as experimental growth dependency scores increased, model simulations predicted that KO of the same gene would be increasingly detrimental for cell growth.Considered together, these validations show the models can predict features of metabolism which support the growth of OC, in a subtype-specific manner, and in silico simulations show agreement with experimental gene dependency data, for predicted essential genes.

Hot Topics

Related Articles