An automated network-based tool to search for metabolic vulnerabilities in cancer

Our research complies with all relevant ethical regulations and has been approved by the committee of ethics of the research of the University of Navarra and follows their prescribed ethical guidelines. Written informed consent was obtained from each participant. Men and women were included in the analyses.Calculation of genetic Minimal Cut Sets (gMCSs) in Human1The reference metabolic network Human1 (version 1.4.0)1 was obtained from https://github.com/SysBioChalmers/Human-GEM. Human1 encompasses 13,416 reactions, 8458 metabolites and 3628 genes. Importantly, Human1 defines 56 basic metabolic tasks for any human cell in addition to biomass production, typically included in other genome-scale metabolic reconstructions10,42,43. Each of these basic metabolic tasks is defined by a list of output metabolites that must be produced from a list of input metabolites, and if necessary, an artificial equation is included in the task in order to support this transformation, e.g., a reaction that simulates the energy expenditure (ATP) by reactions involved in the task. Lower and upper bounds are also fixed for a specific subset of reactions involved in each metabolic task.With this information, a different metabolic model for each basic task can be built and used to assess the effect of genetic perturbations via linear programming, specifically by assuming the mass balance condition and thermodynamic constraints44. Essential genes correspond to those single gene knockouts leading to an infeasible linear problem (at least one of the required lower/upper bounds in the metabolic task is violated). However, this strategy is not efficient in calculating higher-order essential gene knockout perturbations (synthetic lethals) due to the combinatorial nature of the problem. Fortunately, the gMCS approach is suitable for this task9.For the calculation of gMCSs, we used the calculateGeneMCS function available in the COBRA Toolbox45, available at https://github.com/opencobra/cobratoolbox/. This function requires a metabolic model and a single target reaction to be blocked as input data. We constructed this input data for each metabolic task following its associated information about inputs, outputs, artificial equations and lower bounds. A detailed illustration as to how the target reaction was derived for each metabolic task can be found in Supplementary Note 1.The adaptation of Human1 to the calculateGeneMCS function from COBRA leads to slightly different metabolic models to the ones originally built Human1. However, we checked for each metabolic task that the essential genes in our metabolic models, obtained with the singleGeneDeletion function available in COBRA, and the original models in Human1, derived from the checkTasksGenes function from RAVEN46, were the same.In order to reduce the computational cost of calculating gMCSs, the resulting metabolic models for each essential task were simplified using the fastFVA function from COBRA (without any requirement in the objective function). Finally, all calculated gMCSs were checked using the checkTasks function available in RAVEN using the original metabolic models in Human1. This task was performed to remove possible false positive gMCSs that could arise due to the time limit being fixed in the Mixed-Integer Linear Programming solver used in the calculateGeneMCS function9.The database of gMCSs was computed with the University Navarra cluster, using Intel(R) Xeon(R) Silver 4110 CPU @ 2.10 GHz processors, limiting to 8 cores and 8 GB of RAM computer. A time limit of 120–300 s was set for each solution derived from the function calculateGeneMCS.Gene categorization based on RNA-seq dataIn our computational approach, we denote a gene as essential in a given sample when it is the only gene expressed in at least one gMCS. Thus, as a means to identify essential genes for each sample, we need to systematically decide which genes are highly (ON) or lowly (OFF) expressed. The same logic was systematically applied to lethal DKOs. To that end, we developed our own methodology using RNA-seq expression data.Our approach exploits our database of gMCSs by assuming that all of them should involve at least one highly expressed gene to guarantee that their associated metabolic tasks are feasible. In particular, we assume that the gene with the highest expression in each gMCS is the one that should be highly expressed. Consequently, we extracted the maximum expression level in every gMCS and every essential task and generated an empirical distribution of highly expressed genes for each sample (XH). Note here that repetitions are not taken into account, and each gene contributes exactly one value to the distribution. We considered as highly expressed those genes expressed above the Xth percentile, referred to here as gmcsTHX, and lowly expressed otherwise. We fixed the threshold of expression at the 5th percentile (gmcsTH5) to alleviate possible inconsistencies and incomplete metabolic pathways in Human1.In addition, to compare our innovative thresholding method with a literature standard, we used the localT2 methodology16. Briefly, it considers a gene as MAYBE ON in a specific sample whenever its expression level is greater than its mean expression level across the samples of the cohort, MAYBE OFF otherwise. Additionally, two global expression thresholds are applied in localT2: genes whose expression is below the 25th percentile of the distribution of expression for all samples and genes are considered as OFF, whereas those above the 75th percentile of the same distribution are considered as ON. This global expression threshold dominates the categorization obtained from the relative expression threshold based on the mean expression value. Remarkably, our gmcsTH5 approach is independently applied to each sample, i.e., we identified gmcsTH5 for each sample and defined the subset of highly and lowly expressed genes as those having an expression value higher and lower than gmcsTH5, respectively. Instead, localT2 is dependent on the cohort to categorize genes, as they consider all samples to establish global and relative thresholds. Both thresholding methods are available in gmctool, and they can be applied to categorize the 1244 genes that participate in all the calculated gMCSs.A sensitivity analysis was performed in Supplementary Fig. 19, which shows the same analysis presented in Fig. 3 for DepMap data with several percentiles of gmcsTHX (gmcsTH0, gmcsTH1, gmcsTH5, gmcsTH10), finding robust results for lower expression thresholds. Moreover, we also tested the implication of normalization of TPM for localT2 and the effect of the gene-set selection for the 25th-75th threshold of localT2. Overall, we found more accurate results using gmcsTHX than with localT2.Efficient calculation of essential genes using gMCSs and gene expression dataOnce all gMCS have been calculated and gene expression values are classified as highly or lowly expressed, gene essentiality analysis can be carried out. In particular, we search for gMCSs that have exclusively one gene defined as highly expressed, and the rest of genes lowly expressed. This can be easily accomplished with the following procedure.Assume that L is an ${nxg}$ binary matrix that defines for each row i (i = 1,…,n) whether a gene j (j = 1,…,g) is involved in a particular gMCS, being n the number of gMCSs and g the number of genes involved in the complete list of gMCSs. In addition, the binary vector x of dimensions ${gx}1$ defines whether the gene is highly expressed in a particular sample s. Finally, the integer vector y of dimension ${nx}1$ defines the number of highly expressed genes for each gMCS i (i = 1,…,n), and it can be calculated as follows:$${{\bf{L}}}\cdot {{\bf{x}}}={{\bf{y}}}$$
(1)
Essential genes can be extracted from those rows in L where only one gene is highly expressed. The indices of these rows from L are denoted as L1, namely:$${{{\rm{L}}}}_{1}{{=}}\left\{{i}\, |{{{\rm{y}}}}_{{{\rm{i}}}}{{=}}{1}\right\}$$
(2)
Essential genes are obtained by conducting the intersection between x and each row of L whose indexes are in L1, leading to the binary vector of essential genes K1 of dimension ${gx}1$. gmctool can create a list of essential genes and associated gMCSs for each basic metabolic task in Human1. Note here that gMCSs that comprise only one gene are essential genes in all human cells.Efficient calculation of lethal DKOs using gMCSs and gene expression dataTo calculate lethal DKOs, we search for gMCSs that have two genes defined as highly expressed, and the rest of genes lowly expressed for a particular sample. However, we need to ensure that essential genes are not part of these gMCSs. Starting from L and K1 from the previous subsection, the binary vector u of dimension ${nx}1$ defines whether (or not) a gMCS involves an essential gene, namely:$${{\bf{L}}}{{\cdot }}{{{\rm{K}}}}_{1}{{=}}{{\bf{u}}}$$
(3)
Thus, lethal DKOs can be extracted from those rows in L where two genes are highly expressed and essential genes are not involved, denoted here L2, namely:$${{\rm{L}}}_{2}{=}\left\{{{\boldsymbol{\ {i}}}} | {{\rm{y}}}_{{\rm{i}}}=2 \, {{\rm{and}}} \, {{\rm{u}}}_{{\rm{i}}}={0}\right\}$$
(4)
Lethal DKOs are obtained by conducting the intersection between x and each row of L whose index is in L2, leading to the binary vector of lethal DKOs K2 of dimension (g*(g-1))x1. gmctool can create a list of lethal DKOs and associated gMCSs for each basic metabolic task in Human1. Note here that gMCSs that comprise only two genes are lethal DKOs in all human cells.Note here that for the study of essential genes and lethal DKOs in CCLE cell lines, detailed in the “Results” section and Fig. 3b, an Intel(R) Xeon(R) CPU E3-1270 v6 @ 3.80 GHz (8 CPUs), ~3.8 GHz processor was used.Multiple myeloma case studyRNA-seq data from our group consists of 37 MM patients (GSE151063)22,23 and 35 samples from different B-cell subpopulations (GSE114816)21. Sample 57802 was removed from the study for being detected as an outlier in a PCA analysis. RNA-seq from the MMRF-CoMMpass has 767 samples from MM patients at diagnosis time, available in IA18 release. RNA-seq from 7 MM cell lines was obtained from DepMap, release 21Q112. These cell lines are NCI-H929, JJN-3, KMS-11, KMS-12-BM, KMS-28-BM, MM.1S, RPMI-8226 and OPM-2. RNA-seq data was downloaded in TPM and binarized using the gmcsTH5 thresholding approach.Metabolite essentiality predictionIn order to identify essential metabolites whose production is disrupted through gMCSs, we only focused on the biomass production task, due to its underlying complexity. The rest of the basic metabolic tasks are well-defined and highly specific, depending only on a reduced number of metabolites.To that end, we first defined a list of metabolites directly related to the biomass production. As in the biomass reaction of Human1 there are several metabolite pools, we also included their precursors in such target list. Then, we created a sink exchange reaction for each target metabolite and assessed their production through an FBA analysis44 after the removal of genes involved in each gMCS. We extracted for each gMCS the essential metabolites whose production is disrupted (maximal production is zero).Q-PCRThe expression of CTPS1 and CTPS2 were analyzed by Q-PCR in MM KMS-11, KMS-12 and NCI-H929 cell lines. Similarly, the expression of UAP1 and UAP1L1 were analyzed by Q-PCR in OPM-2 and JJN-3 cell lines. RNA was extracted with TRIzol Reagent (Invitrogen) according to the manufacturer’s instructions. First, cDNA was synthesized from 1 µg of total RNA using the PrimeScript RT reagent kit (Perfect Real Time) (Cat No RR037A, TaKaRa) following the manufacturer’s instructions. The quality of cDNA was checked by a multiplex PCR that amplifies PBGD, ABL, BCR and β2-MG genes. Q-PCR was performed in a QuantStudio 5 Real-Time PCR System (Applied Biosystems), using 20 ng of cDNA in 2 µL, 1 µL of each specific primer at 10 µM (CTPS1 F: TTATTGAGGCCTTCCGTCAG; CTPS1 R: GGGAAAGCCCAAGTCCTCTA; CTPS2 F: GCTGTCCAGGAGTGGGTTAT; CTPS2 R: CGCCTTAAACTGGAATTGTCT; UAP1 F: GGGATCAAGATCAGCTCCAG; UAP1 R: GATATGCAACGCCGAGTCTT; UAP1L1 F: TGCTGTACAATGCAGGCAAC; UAP1L1 R: AGCGGCTTTACCAGATTCC), 5 µL of SYBR Green PCR Master Mix 2X (Cat No 4334973, Applied Biosystems) in 10 µL reaction volume. The following program conditions were applied for Q-PCR running: 50 °C for 2 min, 95 °C for 60 s following by 45 cycles at 95 °C for 15 s and 60 °C for 60 s; melting program, one cycle at 95 °C for 15 s, 40 °C for 60 s and 95 °C for 15 s. The relative expression of each gene was quantified by the Log 2(−ΔΔCt) method using the gene GUS as an endogenous control.Cell cultureKMS-11, KMS-12, NCI-H929 and OPM-2 cell lines were maintained in culture in RPMI1640 medium (Gibco, Grand Island, NY) supplemented with 10% fetal bovine serum (Gibco, Grand Island, NY) and penicillin/streptomycin (BioWhitaker, Walkersville, MD) at 37 °C in a humid atmosphere containing 5% CO2. JJN-3 cell line was cultured in 40% IMDM (Gibco, Grand Island, NY), 40% DMEM (Gibco, Grand Island, NY) and 20% fetal bovine serum (Gibco, Grand Island, NY) supplemented with penicillin/streptomycin (BioWhitaker, Walkersville, MD). All cells were obtained from the DSMZ or the American Type Culture Collection (ATCC) or from the Japanese Collection of Research Bioresources Cell Bank. All cell lines were authenticated by performing a short tandem repeat allele profile and were tested for mycoplasma (MycoAlert Sample Kit, Cambrex), obtaining no positive results.Small molecule synthesis of CTPS1 inhibitorSynthesis of compound 1 from Rao et al.27 was performed by Wuxi Apptec and consisted of three steps. First, to a solution of EDCI (172.07 mg, 897.57 µmol, 1.5 eq) and DMAP (73.10 mg, 598.38 µmol, 1 eq) in DCM (5 mL) were added 3-methoxyaniline (110.54 mg, 897.57 µmol, 100.49 µL, 1.5 eq). This solution was then added to 3-nitrobenzoic acid (0.1 g, 598.38 µmol, 1 eq), and the solution was stirred at 25 °C for 3 h. TLC (Petroleum ether: Ethyl acetate = 0:1) indicated the reaction was completed and one new spot formed. The reaction was clean according to TLC. The reaction mixture was quenched by the addition of water 5 mL. The organic layer was separated and washed with 1 M aqueous HCl 5 mL, dried over Na2SO4, filtered and concentrated under reduced pressure to give a residue. Compound N-(3-methoxyphenyl)-3-nitro-benzamide (150 mg, 550.95 µmol, 92.07% yield) was obtained as a yellow solid. Second, a solution of N-(3-methoxyphenyl)-3-nitro-benzamide (150 mg, 550.95 µmol, 1 eq) in THF (8 mL) was added Pd/C (50 mg, 5% purity) under N2. The suspension was degassed under vacuum and purged with H2 several times. The mixture was stirred under H2 (15 psi) at 25 °C for 2 h. TLC (Petroleum ether/Ethyl acetate = 3:1) showed the starting material was consumed completely. The reaction mixture was filtered, and the filtrate was concentrated. Compound 3-amino-N-(3-methoxyphenyl) benzamide (100 mg, 412.76 µmol, 74.92% yield) was obtained as a white solid. Finally, to a solution of 3-amino-N-(3-methoxyphenyl) benzamide (100 mg, 412.76 µmol, 1 eq) in DCM (5 mL) was added DIEA (106.69 mg, 825.52 µmol, 143.79 µL, 2 eq) and 2-chloroacetyl chloride (46.62 mg, 412.76 µmol, 32.83 µL, 1 eq). The mixture was stirred at 0 °C for 1 h. LC-MS showed the reaction was completed, and one main peak with the desired m/z was detected. The reaction mixture was quenched by the addition of H2O 10 mL, and extracted with DCM 10 mL (5 mL × 2). The combined organic layers were washed with brine 10 mL, dried over Na2SO4, filtered and concentrated under reduced pressure to give a residue, which was washed by MeCN (10 mL), then filtered to collect the solid. Compound 3-[(2-chloroacetyl)amino]-N-(3-methoxyphenyl) benzamide (50.49 mg, 155.04 µmol, 37.56% yield, 97.879% purity) was obtained as an off-white solid. ESI-MS m/z: calcd for C16H15ClN2O3 318.08, m/z found 319.1 [M + H]+. 1H-NMR (DMSO, 400 MHz): δ ppm 10.50 (br s, 1H), 10.25 (br s, 1H), 8.09 (br s, 1H), 7.82 (br d, J = 8.4 Hz, 1H), 7.66 (br d, J = 7.2 Hz, 1H), 7.54–7.42 (m, 2H), 7.36 (br d, J = 8.0 Hz, 1H), 7.30–7.21 (m, 1H), 6.69 (br d, J = 8.0 Hz, 1H), 4.28 (s, 2H), 3.75 (s, 3H).CTPS1 inhibitor treatment and cell proliferation assayKMS-11, KMS-12 and NCI-H929 cell lines were treated with 2 µM of the CTPS1 inhibitor for 24, 48, 72 and 96 h. After the indicated times of treatment, cell proliferation was analyzed using the CellTiter 96 Aqueous One Solution Cell Proliferation Assay (Promega, Madison, W) following the manufacturer’s instructions. This is a colorimetric method for determining the number of viable cells in proliferation. For the assay, 100,000 cells by triplicate were plated in 96 wells plates after CTPS1 inhibitor treatment. Plates were centrifuged at 800×g for 10 min, and the medium was removed. Then, cells were incubated with 100 μL per well of medium and 20 μL per well of CellTiter 96 Aqueous One Solution reagent. After 1–3 h of incubation at 37 °C, the plates were incubated for 1–4 h, depending on the cell line, at 37 °C in a humidified, 5% CO2 atmosphere. The absorbance was recorded at 490 nm and at 650 nm as a reference wavelength using 96-well plate readers until the absorbance of control cells without treatment was around 0.8. The background absorbance was measured in wells with only cell line medium and solution reagent. First, the average of the absorbance from the control wells was subtracted from all other absorbance values. Data were calculated as the percentage of total absorbance of treated cells/absorbance of non-treated cells.CRISPR-Cas9 CTPS1 knockoutTo perform CRISPR-Cas9 experiments, we first generated three stable Cas9 cell lines: KMS-11, RPMI-8226 and KMS-12, infecting them with lentiviruses carrying the Cas9-2A-Blasticidin expressing cassette (Lenti-Cas9-2A-Blast plasmid, Addgene #73310) following standard protocol, described before, and selecting them with 17.5 μg/mL blasticidin (Invitrogen) for 7 days. For CTPS1 knockout, we designed three different gRNAs (sgCTPS1-1 F: GCTGATGAAATGGAAAGAGA, sgCTPS1-1 R: TCTCTTTCCATTTCATCAGC; sgCTPS1-3 F: CCTCGAACACCAAATCCTCC, sgCTPS1-3 R: GGAGGATTTGGTGTTCGAGG; sgCTPS1-4 F: ACTTGGGCTTTCCCCAGATC, sgCTPS1-4 R: GATCTGGGGAAAGCCCAAGT), and we cloned them into CRISPseq-BFP-backbone (Addgene #85707) harboring a BFP reporter gene. Corresponding lentiviruses were produced for all the constructs and constitutively Cas9 expressing cell lines were infected. BFP+ cells in the case of single gRNAs were determined by flow cytometry (FACS Canto II, BD Biosciences) 72 h after infection. As a control, cells infected with empty CRISPseq-BFP-backbone were used (scramble, Scr). To assess efficiency in the CTPS1 knockout experiment, we checked CTPS1 protein level by western blot 7 days after lentiviruses infection. The detailed gating strategy for flow cytometry is shown in Supplementary Fig. 20a.Western blotCells protein extracts were obtained by lysis buffer containing 1% Triton X-100, 150 mM NaCl, 50 mM Tris [pH 8], supplemented with 1x protease inhibitor cocktail (Complete Mini, Roche), 10 mM NaF and 1 mM sodium orthovanadate for 30 min at 4 °C and quantified using BCA (Thermo Fisher Scientific) colorimetric assays. A 30-µg amount of protein was separated on 10% sodium dodecyl sulfate-polyacrylamide gel electrophoresis and transferred to a nitrocellulose membrane. The membrane, after being blocked with Tropix I-block blocking reagent (Cat No AI300, Tropix) in PBS with 0.1% Tween 20 and 0.02 NaN3 was incubated with the primary antibody against CTPS1 (rabbit polyclonal antibody, Cat No 98287, Cell Signaling) diluted 1:1000 o/n at 4 °C. Bound antibodies were revealed by a chemiluminescent reagent (Tropix) and visualized using the Chemidoc Imaging Systems (Bio-Rad Laboratories). β-actin was used as a loading control (diluted 1:4000 o/n at 4 °C or for 1 h at room temperature) (Anti-β-actin, A5441; Sigma).Cells transfectionCells were passaged 24 h before nucleofection. The transfection of siRNAs was done with the Nucleofector II device (Amaxa GmbH, Köln, Germany) following the Amaxa guidelines. Briefly, 1 × 106 of JJN-3 and OPM-2 were resuspended in 100 µL of supplemented culture medium with 100 nM of UAP1 siRNAs or Silencer Select Negative Control-1 siRNA (Ambion, Austin, TX) and nucleofected with the Amaxa nucleofector apparatus using programs A-033, A-020 and G-016, respectively. We used two different siRNAs against UAP1 target (siUPA1 A: GUACUUUGGUUUAAAAAAA; siUPA1 B: AGAGCAUUAUGAACAUUAA) to demonstrate that the results obtained with UAP1 siRNA nucleofection are not due to a combination of inconsistent silencing and sequence-specific off-target effects. Silencer Select Negative Control-1 siRNA was used to demonstrate that the nucleofection did not induce non-specific effects on gene expression. Nucleofection was performed twice with a 24 h interval. After 48 h of the second nucleofection, the UAP1 mRNA expression was analyzed by q-PCR (GUS was employed as the reference gene). Cell proliferation was analyzed 0, 2, 4 and 6 days after two repetitive transfections using the CellTiter 96 Aqueous One Solution Cell Proliferation Assay (Promega, Madison, W) as described above. Transfection efficiency was determined by flow cytometry using the BLOCK IT Fluorescent Oligo (Invitrogen Life Technologies, Paisley, UK). The detailed gating strategy for flow cytometry is shown in Supplementary Fig. 20b.Statistics and reproducibilityStatistical analyses were performed in R (v4.0.5). Each statistical method used is specified in the “Results” section, including correlation tests for DepMap Analysis, the hypergeometric test for assessing the association of gene paralogous pairs with synthetic lethality, unpaired one-sided Wilcoxon tests for comparing mean differences of variables without normal distribution and Fisher’s test for the validation of predicted lethal DKOs with paralogous genes. The number of human samples or cells analyzed in each figure is indicated in their associated legends. For in vitro experiments, no statistical method was used to predetermine sample size, and no data were excluded from the analyses performed. All experiments and measures were made in triplicate obtaining in all of the cases similar results.Implementation and availabilitygmctool has been developed using R47 and Shiny48. gmctool is hosted using the Amazon Web Services cloud environment service, and it can be publicly accessed at: https://biotecnun.unav.es/app/gmctool. A full tutorial and example of our own cohort of samples corresponding to B-cell differentiation and MM can be found on the ‘Help’ tab of gmctool.Reporting summaryFurther information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

An automated network-based tool to search for metabolic vulnerabilities in cancer

A Jurassic Park-inspired method can safely store data in DNA

An open-access lumbosacral spine MRI dataset with enhanced spinal nerve root structure resolution

An automatic tracking method to measure the mandibula movement during real time MRI

AI has dreamt up a blizzard of new proteins. Do any of them actually work?

Catalyst uses electricity to convert carbon dioxide into methane | Research

Hot Topics

A Jurassic Park-inspired method can safely store data in DNA

An open-access lumbosacral spine MRI dataset with enhanced spinal nerve root structure resolution

An automatic tracking method to measure the mandibula movement during real time MRI

Related Articles

Balancing Act: Pregnancy and Bipolar Disorder

Cohesion at the cellular level: flexible yet stable

Gut bacteria influence responses to immunotherapy in patients with asbestos related cancer

Quick Links

Must Read

A Jurassic Park-inspired method can safely store data in DNA

An open-access lumbosacral spine MRI dataset with enhanced spinal nerve root structure resolution

An automatic tracking method to measure the mandibula movement during real time MRI

AI has dreamt up a blizzard of new proteins. Do any of them actually work?

Popular Articles

A Jurassic Park-inspired method can safely store data in DNA

An open-access lumbosacral spine MRI dataset with enhanced spinal nerve root structure resolution

An automatic tracking method to measure the mandibula movement during real time MRI