A network-based trans-omics approach for predicting synergistic drug combinations

Construction of the human molecular interaction networkHuman molecular interactions were constructed from seven databases (Supplementary Data S1): (i) Yeast-two-hybrid high-throughput datasets were retrieved from the yeast two-hybrid database (HuRI)17 (accessed on March 2, 2020), (ii) Protein complexes were retrieved from the CORUM database18 (accessed on September 3, 2018), (iii) Kinase–substrate pairs were retrieved from the PhosphositePlus database19 (accessed on September 7, 2018), (iv) Metabolic enzyme-coupled interactions were retrieved from the KEGG Rpair database20 (accessed on March 12, 2016), (v) Signaling interactions were retrieved from the Signalink v.2.0 database21 (accessed on December 3, 2018), (vi) Innate immune response interactions were retrieved from the InnateDB database22 (accessed on June 2, 2018), (vii) 3D structurally resolved protein-protein interactions were retrieved from the Instruct database23 (accessed on March 3, 2020). We used molecular interactions with biological annotations. We did not include interactions extracted from gene expression data or evolutionary considerations. All interactions from these databases were combined, and the union yielded a network of 13,524 proteins and 311,888 interactions (Supplementary Data S1). Duplicated interactions were excluded using the simplify function in the igraph library (1.2.6) of R. The giant component was used as a human molecular interaction network, and it consisted of 235,123 interactions involving 13,377 proteins (Supplementary Data S2).This newly established human molecule interaction network offers two advantages. First, it facilitates easy comparison with prior work on network-based drug combination prediction methods. Certain steps in our proposed method align with the findings made by Cheng et al.13. To enhance comparability in predictive performance, we adhered to the network creation procedure outlined in the previous work. Second, the network allows for the selection of experimentally validated interaction types. In this study, our focus was solely on molecular interactions with biological annotations (e.g., physical interactions, phosphorylation and substrate–enzyme associations). This emphasis has the potential to enhance the reliability of the network.Construction of disease-specific gene expression profilesThe CREEDS database provides gene expression signatures for 79 diseases with 14,804 genes24, derived from transcriptome data registered in Gene Expression Omnibus (GEO)25. We retrieved disease-specific gene expression signatures of AML, CML, colorectal cancer, asthma, and type 2 diabetes from CREEDS24 (accessed on April 22, 2020).CREEDS lacked the gene expression signature for hypertension. Thus, we constructed the gene expression signature for hypertension according to the procedure in CREEDS24. The detail is written in Supplementary Note 1 (Construction of disease-specific gene expression profiles). Gene expression data for hypertension were retrieved from GEO (GSE24752, and GSE75360 (accessed on November 9, 2022). The disease-specific gene expression levels were determined relative to a healthy cohort. Finally, we obtained the disease-specific gene expression profiles with the same gene set in CREEDS (14,804 genes) for hypertension.Identification of disease-specific genes from disease-specific gene expression profilesWe identified disease-specific genes for 79 diseases registered in CREEDS. Genes in CREEDS have nonzero scores indicating disease specificity. Therefore, disease-specific genes were defined as those with expression levels (not zeros) for each disease. For hypertension, we selected the top 5% of genes with positive fold change values and the top 5% of genes with negative fold change values compared to healthy control as disease-specific genes. Disease-specific genes were used to calculate network-based distance between diseases.Construction of disease modules using disease susceptibility genesA set of disease susceptibility genes on the human molecular interaction network were referred to as “disease modules”13. To investigate the susceptibility genes of our target diseases (AML, CML, colorectal cancer, asthma, type 2 diabetes, and hypertension), we sourced relevant genes from six databases as delineated in a prior study13, including (i) the Online Mendelian Inheritance in Man (OMIM) database26, (ii) ClinVar database27, (iii) the genome-wide association studies (GWAS) database28, (iv) the phenome-wide association study database (PheWAS)29, (v) the GWASdb database30, and (vi) the DisGeNET database31. The accessed date and specific search keys are detailed in Supplementary Data S1. The gene symbols (HGNC symbols) were converted into Entrez IDs using the biomaRt library32 in R (version 4.0.3). The number of the genes in the disease module of AML, CML, colorectal cancer, asthma, type 2 diabetes, and hypertension are 1075, 51, 408, 929, 1173, and 909, respectively. A comprehensive list of genes in the modules can be found in Supplementary Data S3.Construction of drug response gene expression profilesDrug-induced gene expression profiles were obtained from the LINCS Program L1000 mRNA profiling assay (http://www.lincsproject.org), where the gene expression levels for 978 landmark genes, termed L1000 genes. L1000 is highly reproducible, comparable to RNA sequencing, and suitable for computational inference of the expression levels of 81% of non-measured transcripts33. The gene expression profiles of L1000 were measured at various post-treatment intervals—3, 6, 24, 48, and 144 h—and across a range of concentrations using diverse human cell lines. Within the level 5 dataset, we extracted the gene expression profiles of 1488 drugs, and averaged the gene expression profiles of the same drug across experimental conditions, such as post-treatment intervals, the concentration of the drug, and cell lines. The details on drug names, efficacies, and procedures are provided in Supplementary Data S4 and Supplementary Note 1 (Construction of drug response gene expression profiles).Construction of drug modules using drug response genesThe top 5% of genes with positive fold change values and the top 5% of genes with negative fold change values in the drug-induced gene expression profiles were considered drug response genes and were used to construct the drug module for each drug (Supplementary Data S4).Evaluation of network-based distance between disease modules and drug modulesThe network-based proximity between a query disease module and a drug module was evaluated using a network-based distance measure. The path length between genes constituting a disease module and drug response genes constituting a drug module was calculated. Q represents the set of genes ${q}_{1},\cdots ,{q}_{\left|Q\right|}$ in the query disease module, A represents the set of genes ${a}_{1},\cdots ,{a}_{\left|A\right|}$ in the drug A module, and $d(q,a)$ represents the shortest path length between nodes q and a in the human molecular interaction network15 and was defined as:$$d\left(Q,\,A\right)=\frac{1}{{{{{\rm{||}}}}}A{{{{\rm{||}}}}}}{\sum}_{a\in A}{\min }_{q\in Q}d\left(q,a\right)$$
(1)
To determine the significance of the network-based proximity measure between query disease Q and drug A, a reference distance distribution was created. First, a set of genes with the same size and degree of the query disease module was randomly selected. Second, a set of genes with the same size and degree of the drug module was randomly selected. Then, the proximity between the two sets of genes in the human molecular interaction network was calculated15. After 100 repetitions, the mean ${\mu }_{d(Q,A)}$ and the standard deviation ${\sigma }_{d(Q,A)}$ were calculated. The normalized network-based proximity measure was defined as$$z\left(Q,A\right)=\,\frac{d\left(Q,A\right)-{\mu }_{d(Q,A)}}{{\sigma }_{d(Q,A)}}$$
(2)
For a more efficient calculation of the permutation process, parallel computing using the PAR function was performed34. Finally, the $z\left(Q,A\right)$ sign was inverted, scaled in the range of 0 to 1, and defined as PQA. The code for calculation of network-based distance between disease modules and drug modules is deposited in figshare35.Evaluation of the network-based distance between diseasesThe network-based proximity between query disease Q and disease α was evaluated using a network-based distance measure. The shortest path lengths between susceptibility genes and disease-specific genes were calculated according to formula (1). The normalized network-based proximity measure z(Q, α) was calculated according to formula (2). Although Sections 7 and 8 share similarities, they differ in focus. Section 7 elucidates the calculation of distance between a disease and a drug, while section 8 delves into the calculation of distance between different diseases. The code for calculation of the network-based distance between diseases is deposited in figshare35.Evaluation of network-based distance between drug modulesTo evaluate the network-based distance between drug A module and drug B module, the network-based separation measure of the drug response genes between drug A module and drug B module was calculated13. Briefly, the network-based separation measure 〈sAB〉 was defined as follows:$${s}_{{AB}}\equiv \left\langle {d}_{{AB}}\right\rangle -\frac{\left\langle {d}_{{AA}}\right\rangle +\left\langle {d}_{{BB}}\right\rangle }{2}$$
(3)
〈dAA〉 is the mean shortest distance between the response genes of drug A in a human molecular interaction network, 〈dBB〉 is the mean shortest distance between the response genes of drug B in a human molecular interaction network, and 〈dAB〉 is the mean shortest distance between the response genes of drug A and drug B.If drug A had only a single response gene, the average shortest distance between the response genes of drug A (denoted as 〈dAA〉) would be 0. The computation of the distance between two drug modules was based on the response genes of drugs A and drug B. When both drugs A and drug B possessed only one response gene, and if that gene was identical, the distance between the drug modules (denoted as 〈sAB〉) would be 0. In this study, drug response genes were identified as the top 5% of genes exhibiting the positive or negative expression changes in the drug-induced gene expression profiles. Since the number of response genes for drug A (or drug B) ranged from 198 to 200, the associated distance was not 0.The formulas for calculating the drug–disease distance (S-score using formula [1]) and the network-based separation measure of the drug response genes between drug A module and drug B share similarities, but they exhibit a distinction. Therefore, we used different notations. The determination of the distance between a drug module and a disease module relies on generating a random distribution of S-scores. In contrast, the distance between drugs is determined by the topological distance between one drug and another on the molecular network. This choice is motivated by the substantial number of drug pairs, where the computational expense of generating a random distribution is notably high. The code for calculation of network-based distance between drug modules is deposited in figshare35.Evaluation of the network-based disease similarity based on disease-specific gene expression profilesThe similarity between query disease module Q and the other disease a was evaluated using the network-based proximity measure R(Q, a)36. The normalized network-based proximity measure between query disease module Q and the other disease a was evaluated as z(Q, a). The network-based similarity was calculated by sign inversion and scaling of the proximity z(Q, a) in the range of 0 to 1. The similarity of 79 diseases in the CREEDS database and six diseases (AML, CML, colorectal cancer, asthma, type II diabetes, and hypertension) was calculated.Evaluation of drug-similarity based on the chemical structuresThe structure-based similarity between drug A and drug B was evaluated as R(A,B) based on the chemical structures. Chemical structures (MOLfiles) for 8,287 drugs were retrieved from the KEGG DRUG37. KCF-S fingerprints38 for each drug were computed using kcfconvoy (https://github.com/KCF-Convoy/kcfconvoy). The generalized Jaccard similarity coefficient between drugs was calculated based on the fingerprints39.Evaluation of transcriptional correlations between a disease module and drug modulesThe transcriptional correlation of the gene expression profiles between the query disease Q module and the drug A module was evaluated by the cosine correlation coefficient represented as CQA. Similarly, the transcriptional correlation between the query disease Q module and the drug B module was evaluated as CQB. The cosine coefficient was calculated using the cosine function in the lsa library (version 0.73.2) in R. The code for calculation of transcriptional correlations between a disease module and drug modules is deposited in figshare35.Amplification of component genes in the disease and drug modules by network propagationThe size of the query disease module and drug modules was increased by network propagation. A　network propagation method with prior knowledge, called PRINCE, was leveraged40. The algorithm is implemented in https://github.com/kztakemoto/network_propagation.The network propagation with prior knowledge calculates the probability of genes belonging to the query disease module or the drug module. Neighbor nodes of a query disease module were identified as candidates for new genes of the query disease module. The network-based similarity between diseases was used as prior knowledge in the network propagation procedure. Neighbor nodes for a drug module were identified as candidates for new genes of the drug module. The structure-based similarity between drugs was used as prior knowledge for the network propagation procedure.Network propagation without prior knowledge was also performed as follows: Neighbor nodes of a query disease module were identified as candidates for new genes of the query disease module. Similarly, neighbor nodes of a drug module were identified as candidates for new genes of the drug module. The neighbors function in the igraph library (v.1.2.6) was used to identify the neighbor nodes41. The parameters of network propagation are summarized in Supplementary Supplementary Data S5.Design of the prediction score for synergistic drug combinationsThe prediction score of the synergistic effect of drug A and drug B for a query disease was designed using three components (Fig. 1). The first component was the network-based localization relationship score between a query disease module, drug A module, and drug B module, which is referred to as (TQAB). The second component was the network-based proximity score between a query disease module, drug A module, and drug B module, which is referred to as (PQAB). The third component was the transcriptional correlation coefficient between a query disease module, drug A module, and drug B module (CQAB).Fig. 1: Overview of our network-based trans-omics approach to predict synergistic drug combinations.A comprehensive human molecular interaction network was constructed. Susceptibility genes and drug response genes were mapped onto the network. The network-based relationships between a query disease module and drug modules were calculated. The network-based proximities between a query disease module and drug modules were calculated. For the method without network propagation (Syndrum), the transcriptional correlations between a query disease module, and drug modules were calculated using the overlapping genes between a query disease module and the drug modules. The network-based similarities of diseases are calculated based on network-based proximity between diseases, referred to as disease similarity. The structural similarity of drugs was calculated based on the chemical structure, referred to as drug similarity. Network propagation using the similarities was performed to identify overlap genes between a query disease module and disease modules. For the method with network propagation (SyndrumNET), the transcriptional correlations between a query disease module, and drug modules were calculated using the newly identified overlapping genes. Finally, the network-based localization relationships score, the network-based proximities, and the transcriptional correlation coefficients between a query disease module and the drug modules were integrated. Q indicates ‘Query disease’, A indicates ‘Drug A’, and B indicates ‘Drug B’.Network-based localization relationship score (TQAB) was determined based on the topological classes of the query disease module, drug A module, and drug B module. Six types of topological classes (Class I ~ Class VI) were defined based on a previous study13. Within the six classes, class II is designated as Complementary Exposure and it tends to have synergistic effects. This class represents situations where two separated drug modules that overlap individually with a query disease module ($\normalsize z\left(Q,A\right) \, < \, 0, z \left(Q,B\right) \, < \, 0 \, {and} \; {s}_{{AB}} \, < \, 0$). Based on the topological classes, the score of the network-based localization relationship was assigned as follows:$${T}_{{QAB}} = \left\{\begin{array}{ll}0 \hfill & \left( {when}\, {other}\, {class}\right) \\ 2 & \left({when} \, {class}\, {II}\right)\hfill\end{array}\right.$$
(4)
The network-based proximity score (PQAB) was calculated by averaging the network-based proximity between the query disease Q module and drug A module (PQA) and the network-based proximity between the query disease Q module and drug B module (PQB), as follows:$${P}_{{QAB}}=\frac{\left({P}_{{QA}}\,+{P}_{{QB}}\right)}{2}$$
(5)
The transcriptional correlation score (CQAB) was calculated by averaging the absolute value of the transcriptional correlation coefficient between the query disease Q module and drug A module (CQA) and the transcriptional correlation coefficient between the query disease Q module and drug B module (CQB), as follows:$${C}_{{QAB}}=\frac{\left(\left|{C}_{{QA}}\,\right|+\left|{C}_{{QB}}\right|\right)}{2}$$
(6)
Finally, the prediction score was calculated by adding the network-based localization relationship score (TQAB), network-based proximity score (PQAB), and transcriptional correlation score (CQAB) as follows35:$${Predction\; score}=\,{{T}_{{QAB}}+{P}_{{QAB}}+C}_{{QAB}}$$
(7)
Collection of drug combinations with known synergistic effects for various diseasesThe known synergistic drug pairs for AML, CML, colorectal cancer, asthma type 2 diabetes, and hypertension were obtained from the PubMed database. The keywords “synergy,” “synergic,” “synergistic,” “synergism,” “interaction,” and “combination” along with disease names were used as keywords for the search procedure. For AML and CML, known synergistic drug pairs were retrieved from DrugCombDB (2019.05.31 release version)42. DrugCombDB contains drug combinations for human cancer cell lines. We linked the drugs and diseases according to the cell line. For hypertension, known synergistic drug pairs were retrieved from a previous paper13. The curated known synergistic drug pairs are summarized in Supplementary Data S6.Performance evaluation protocolThe area under the receiver operating characteristic curve (AUC) was calculated using the performance function in the ROCR library (v.1.0-11) in R. The code for the performance evaluation is deposited in figshare35.Chemicals used for the cell survival assayCapsaicin was purchased from FUJIFILM Wako Pure Chemical Industries, Ltd. (Osaka, Japan). Daunorubicin hydrochloride, idarubicin, and topotecan were purchased from Cayman Chemicals (Ann Arbor, Michigan, USA). Mitoxantrone, fasudil were purchased from Tokyo Chemical Industries Co., Ltd. (Tokyo, Japan). Cell Counting Kit-8 used for the WST-8 assay was purchased from DOJINDO (Tokyo, Japan).Cell culture and reagentsThe K562 human CML cell line was obtained from the RIKEN BioResource Center (Tokyo, Japan) and grown in RPMI 1640 (NACALAI TESQUE, INC., JAPAN) medium supplemented with 10% fetal bovine serum (Funakoshi Co., Ltd., JAPAN). All cells were incubated at 37 °C in a humidified atmosphere containing 5% (v/v) CO2.Cell survival assayIn vitro growth inhibition was evaluated according to the manufacturer’s standard protocol using the Cell Counting Kit-8. Cells were seeded in 96-well plates at a density of 5000 cells/well in a total volume of 100 µL and exposed to various drugs for 72 h at 37 °C in a 5% CO2 atmosphere. WST-8 was added, and after 3 h, the absorbance was measured at a wavelength of 450 nm (reference 630 nm) using a microplate reader (Bio-Rad Laboratories, Inc., Hercules, CA). The results are expressed as percentages [i.e., as the ratio of the absorbance of treated cells to that of the control (drug untreated group, 100%)]. Percent survival was calculated using the following formula: percentage survival = (absorbance of treated wells − absorbance of blank wells)/(absorbance of untreated wells − absorbance of blank wells) × 100. The number of biological replicates is three.Statistical evaluations of the significance of combinatory effects for drug synergyTwo essential models were used to evaluate the significance of combinatory effects for drug synergism: Bliss’s IA43 model and Loewer’s additivity (CA) model44.For Bliss’s IA model, it is assumed that the effects of drugs are stochastic events under the non-interaction assumption between drugs. The effects of a drug combination are calculated as the joint probability of each effect. Drug A causes v% effects and drug B causes w% effects at a given combination of concentrations. The total effect rate of the combination can be computed as ${{CI}}_{{mix}}=1-\left(1-v\right)\left(1-w\right)$ under the additive assumption. Thus, if ${{CI}}_{{mix}} \, > \, 1$, a given drug combination is considered to have synergistic effect based on Bliss’s IA model. ${{CI}}_{{mix}}$ is denoted as the IA score in this study.For Loewer’s additivity model, it is assumed that the toxic unit equals one under the non-interaction assumption between drugs. The concentrations weighted by the effect of each drug (toxic unit [TU]) are added together to yield the TU of a given drug combination.$${{{\mbox{Toxic}}}} {{{{\rm{Unit}}}}} {({{\mbox{TU}}})} = \frac{{C}_{a}}{{{EC}}_{u}^{A}}+\frac{{C}_{b}}{{{EC}}_{u}^{B}}$$For a given drug combination, Ca and Cb stand for the concentrations of drug A and drug B, respectively. ${{EC}}_{u}^{A}$ and ${{EC}}_{u}^{B}$ are the concentrations of the drugs causing u% effect by drug A and drug B, respectively. If TU = 1, the effect rate of the drug combination remains at u% under the additive assumption. Thus, if TU < 1, a given drug combination is assumed to have synergistic effect based on Loewer’s additivity (CA) model, when u% is selected as the results of the growth inhibition assay. ${{\mbox{TU}}}$ is denoted as the CA score in this study.Sample preparation for microarray analysisCells were seeded at a density of 50,000 cells/mL in a 10-cm diameter dish and exposed to various drugs (capsaicin 50 μM, mitoxantrone 30 nM) for 24 h at 37 °C in a 5% (v/v) CO2 atmosphere. Experiments were conducted in three independent wells for each group. Total RNA was extracted using the RNeasy Mini Kit (QIAGEN, Valencia, CA) according to the manufacturer’s protocol and used for microarray experiments by the well. The extracted total RNA from these wells was combined by exposure condition.Mode-of-action of drug combinations by microarray analysisCyanine-3 (Cy3) labeled cRNA was prepared from 150 ng RNA using the One-Color Low Input Quick Amp Labeling kit (Agilent) according to the manufacturer’s instructions, followed by RNAeasy column purification (QIAGEN, Valencia, CA). Dye incorporation and cRNA yield were assessed using a NanoDrop ND-1000 Spectrophotometer.Cy3-labeled cRNA (600 ng, specific activity >6 pmol Cy3/µg cRNA) was fragmented at 60 °C for 30 min in a reaction volume of 25 µl containing 25× Agilent fragmentation buffer and 10× Agilent blocking agent following the manufacturer’s instructions. Upon completion of the fragmentation reaction, 25 µl of 2× Agilent hybridization buffer was added to the fragmentation mixture and hybridized to Agilent SurePrint GE Unrestricted Microarrays (G2519F) for 17 h at 65 °C in a rotating Agilent hybridization oven. After hybridization, microarrays were washed for 1 min at room temperature with GE Wash Buffer 1 (Agilent) and 1 min at 37 °C with GE Wash Buffer 2 (Agilent), then dried immediately.The slides were scanned immediately after washing using an Agilent DNA Microarray Scanner (G2505C) with a one-color scan setting for 8 × 60 K array slides (Scan Area 61 × 21.6 mm, Scan resolution 3 µm, Dye channel was set to Green and Green PMT was set to 100%).The scanned images were analyzed with Feature Extraction Software 10.7.1.1 (Agilent) using default parameters (protocol GE1_107_Sep09 and Grid: 028282_D_F_20110531) to obtain background subtracted and spatially detrended processed signal intensities. Features flagged in Feature Extraction as Feature Non-uniform outliers were excluded. Our microarray data have been registered and are publicly available in the Gene Expression Omnibus (GEO) database (GSE254052). We calculated the log2 fold change (log2FC) for each probe when comparing the control and exposed groups. Then, we averaged the log2FC for genes. We defined differential expression genes (DEGs) as genes with absolute value of log2FC greater than one ($\left|{\log }_{2}{FC}\right|\ge \,1$) in the exposed group compared to the control group.Enrichment analysis of transcription factorsGene set enrichment analysis was performed using DAVID45 and “clusterProfiler” (v4.4.4) library in R46. The transcription factor enrichment analysis was performed using ChEA47. The results were considered statistically significant at p < 0.05.Statistics and reproducibilityWe performed two-sided t-test using “stats” (v4.3.1) library in R and Fisher’s exact test using DAVID45 and “clusterProfiler” (v4.4.4) library in R46. We examined whether the mean distance from a disease to a drug with a known effect differs from that to a drug with an unknown effect by two-sided t-test. For t-test, the sample size is 1,106,328 drug pairs for each query disease. For Fisher’s exact test using DAVID45 and “clusterProfiler”, the sample size is the number of genes of functional pathways in the KEGG database, totaling 8,156 and 8,772 genes, respectively. We conducted cell survivable assay following microarray analysis. For cell survivable assay, we preformed three biological replicates. The mRNA from these replicates were mixed and used for microarray analysis.Inclusion & ethics statementAll members involved in this study have met the authorship criteria mandated by Nature Portfolio journals and have been listed as authors. Their contributions were vital to the study’s design and execution. The roles and responsibilities of each collaborator were clearly defined and mutually agreed upon before initiating the research. This research faced no severe restrictions or prohibitions within our operational environment and was conducted in a manner that avoids causing stigmatization, incrimination, discrimination, or personal risk to any parties involved. In preparing our manuscript, we diligently referenced research that aligns with our study, ensuring that our citations reflect the pertinent scientific context and contributions.

A network-based trans-omics approach for predicting synergistic drug combinations

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis

Chemistry wordoku #062 | Puzzle

Multi-output prediction of dose–response curves enables drug repositioning and biomarker discovery

Hot Topics

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis

Related Articles

Balancing Act: Pregnancy and Bipolar Disorder

Cohesion at the cellular level: flexible yet stable

Gut bacteria influence responses to immunotherapy in patients with asbestos related cancer

Quick Links

Must Read

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis

Chemistry wordoku #062 | Puzzle

Popular Articles

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis