SAFER: sub-hypergraph attention-based neural network for predicting effective responses to dose combinations | BMC Bioinformatics

We first introduce SAFER, which stands for sub-hypergraph attention-based neural network for predicting effective responses to dose combinations. Then, we describe each component of the framework. Figure 1 shows the overview of the analysis pipeline. SAFER utilizes the attention-based hypergraph neural networks to learn the representations of cell line-gene, drug-gene, and drug-structure in the context of gene regulatory networks (Fig. 1A). It then uses two-layer feed-forward neural networks to learn the inter-correlation between these data representations along with dose combinations and synergistic effects at different dose combinations (Fig. 1B). Analysis of attention weights is designed to study the correlation between molecular mechanisms and responses to drug combinations (Fig. 1C).Fig. 1A. Hypergraph representation learning: attention-based graph neural network was used to generate context-specific subject embeddings for downstream prediction tasks. B. Synergy prediction: A two-layer neural network takes in concatenation of three vectors, including cell line embedding, drug pair embedding, and one-hot encodings of drug-drug-cell line triplet, and output drug synergy label represented by 1 as synergistic or 0 as antagonistic. C. Attention weight analysis: comparison of attention weights between synergistic and antagonistic groups and highlight differentially weighted functional gene setsSample collectionTable 1 describes the overview for the datasets used in this study. We provide detailed descriptions in the following.
Table 1 Summarization of dataset collected and used in this paperDrug combination dataWe downloaded drug combination datasets from the DrugComb database (v1.5) [16]. Dose-level drug synergy measures were used in this study and were retrieved via the API (https://api.drugcomb.org/response/). The DrugComb database is the most comprehensive resource for studying drug synergism. It harmonized the degree of combination additive, synergistic, or antagonistic by cell inhibition rate (i.e., percent inhibition or %inhibition) relative to the corresponding untreated control model, and presented with different synergy principles, including Bliss independence (BLISS) [17], Highest single agent (I) [18], Loewe additivity (LOEWE) [19], and Zero interaction potency (ZIP) [20] (more detailed description about the equations has been provided in the official website: https://drugcomb.org/help/). For each drug-drug-cell triplet, synergy scores were averaged if it had experiment replicates. We removed non-cancer samples in the dataset (i.e., malaria and SARS-COV-2). The remaining samples that have available drug data and cell line data were included in this study. We focused on predicting drug combination effects deviating from additive, and therefore we excluded additive samples that had synergy scores within –10 and 10 in all measures (i.e., BLISS, I, LOEWE, and ZIP). Additionally, samples with full agreement synergistic effects but had less ability to kill cancer cells (i.e., percent inhibition < 0) than the untreated control were also excluded as they are not within the scope of our study. As a result, the datasets we used contain 225, 608 data points, consisting of 104,819 drug combinations and 9,078 drug pairs from 11 studies, and 90 cancer cell lines from 14 tissues. Finally, we used the dataset from the O’Neil study [21] as the benchmark for comparison purposes and used the Almanac dataset [22] for external validation.Cell line dataWhile baseline gene expression data remain widely used for their ability to predict drug response, gene essentiality scores and transcription factors have shown demonstrative potential [4, 9, 23]. Cancer cell lines used in this study were unified by DepMap ID, and their genomics profiles were retrieved from the DepMap portal (public 22Q4, https://doi.org/https://doi.org/10.6084/m9.figshare.21637199.v2), including the gene expression TPM values (i.e., log2 TPM + 1) of bulk RNA sequencing [24], gene knockout readouts derived from CRISPR screen (i.e., gene essentiality scores) [25,26,27,28]. In addition, we obtained the cell line-specific transcription factor (TF)-gene regulatory network from the GRAND [23] database and averaged across transcription factors to generate a cell line by gene matrix representing probabilities of being a TF-targeting gene in a cell line.Drug dataWe leverage molecular structures and drug-targeting genes to learn property relationships and mechanism of actions. Compound structures (i.e., SMILE strings) were used and retrieved directly from the DrugComb database. In the dataset, drug names were coded using various labels, such as the company’s serial number, the generic name, or chemical name. We unified drug names with unique SMILE strings to avoid data leakage. Drug-associated genes were collected from DrugBank [29], TTD [30], and STITCH [31] databases. Non-human genes were excluded, and gene symbols were assigned using NCBI’s genes.Functional gene setsOur previous findings revealed pathway-level signals have advantage to improve model performance and explainability [32]. Studies also showed tissue-specific drug response in transcription factors [4]. In this study, we used two different gene sets to learn biological contextual information of drug synergy. Functional gene sets were downloaded from the Molecular Signatures Database (MsigDB) [33], including C2 Canonical Pathways and C3 transcription factor-target genes (v2022). The C2 collection used in this study contains pathway gene sets form BioCarta [34], KEGG [35], PID [36], REACTOME [37], and WikiPathways [38]. The C3 collection is a subset of GTRD [39] transcription factor (TF) binding genes. We retained gene sets using one standard deviation of the gene size. That is, gene size below or above one standard deviation of the mean was excluded.Sub-hypergraph representation learningA hypergraph can be viewed as an incidence matrix, \(\mathcal{H}\in {\{0, 1\}}^{|\mathcal{V}|\times |\mathcal{E}|}\), with nodes \(\mathcal{V}\) and hyperedges\(\mathcal{E}\). It is different from a standard graph in that a hyperedge can connect to more than two nodes, making it suitable for learning complex interactions of molecular networks, such as gene-pathway relationships. The SHINE [15] framework represents gene-pathway relationship as a hypergraph in which nodes were genes and hyperedges were pathways, and then employs message passing neural networks to learn node representation. The main contribution of this learning process is that it allows for subgraph representation learning through both node and hyperedge representation. Specifically, the representation for a hyperedge \(p_{j}\) at layer \(k\), \(h_{{\text{\rm E}}}^{k} \left( {p_{j} } \right)\), is calculated from the nodes’ representation at \(k – 1\) layer, \(h_{{\mathcal{V}}}^{k – 1} \left( {g_{i} } \right)\), as in \(h_{{\text{\rm E}}}^{k} \left( {p_{j} } \right)\) = \(\sigma \left( {\mathop \sum \limits_{{g_{i} \in p_{j} }} a_{{\text{\rm E}}} \left( {p_{j} ,g_{i} } \right) h_{{\mathcal{V}}}^{k – 1} \left( {g_{i} } \right)} \right)\), where \(\sigma\) is the ReLU non-linear function, and \(a_{{\text{\rm E}}} \left( {p_{j} ,g_{i} } \right)\) denotes the hyperedge attention which aggregates information over incident nodes and was computed as.\(a_{{\text{\rm E}}} \left( {p_{j} ,g_{i} } \right) =\) exp \(\left( {c^{T} s\left( {p_{j} ,g_{i} } \right)} \right) /\left( {\mathop \sum \limits_{{g_{i^{\prime}} \in p_{j} }} \exp \left( {c^{T} s\left( {p_{j} ,g_{i^{\prime}} } \right)} \right)} \right)\). where \(c\) is a learnable context vector and s is the attention ready state. Likewise, the nodes’ representation at layer \(k\) was obtained from the hyperedges’ representation at layer \(k – 1\) as \(h_{{\text{V}}}^{k} \left( {g_{i} } \right)\) = \(\sigma \left( {\mathop \sum \limits_{{p_{j} { \ni }g_{i} }} a_{{\text{V}}} \left( {g_{i} ,p_{j} } \right) h_{{\text{\rm E}}}^{k – 1} \left( {p_{j} } \right)} \right)\), where alpha \(a_{{\text{V}}} \left( {g_{i} ,p_{j} } \right)\) denotes the node attention which aggregates information over associated hyperedges and was computed as:$$a_{{\text{V}}} \left( {g_{i} ,p_{j} } \right) = {\text{exp}}\left( {c^{T} s\left( {p_{j} ,g_{i} } \right)} \right) /\left( {\mathop \sum \limits_{{p_{j^{\prime}} { \ni }g_{i} }} \exp \left( {c^{T} s\left( {p_{j^{\prime}} ,g_{i} } \right)} \right)} \right)$$This dual attention allows the subsequent subgraph attention to learn nodes and hyperedges’ representation simultaneously. Sub-hypergraph \({\mathcal{G}}_{j}\) refers to a subset of nodes in a hypergraph and can be used to represent a subject. In other words, a subgraph representation, \(h\left( {{\mathcal{G}}_{j} } \right)\), is characterized by its node representation in the hypergraph structure and was computed as \(h\left( {{\mathcal{G}}_{j} } \right) = \sigma \left( {\mathop \sum \limits_{{g_{i} \in {\mathcal{G}}_{j} }} a\left( {{\mathcal{G}}_{j} ,g_{i} } \right) h_{V}^{k} \left( {g_{i} } \right)} \right)\), where \(a\left( {{\mathcal{G}}_{j} ,g_{i} } \right)\) is the subgraph attention computed as.\(a\left( {{\mathcal{G}}_{j} ,g_{i} } \right) =\) exp \(\left( {F_{ji} b^{T} h_{V}^{k} \left( {g_{i} } \right)} \right) /\left( {\mathop \sum \limits_{{g_{i^{\prime}} \in {\mathcal{G}}_{j} }} \exp \left( {F_{ji} b^{T} h_{V}^{k} \left( {g_{i^{\prime}} } \right)} \right)} \right)\), where \(F_{ji}\) is the feature matrix in which each row represents a subject, and each column corresponds to node features. Finally, the output subgraph embedding, \({\mathcal{S}}\), is obtained through concatenation of all node representations: \({\mathcal{S}} = \left[ {h\left( {{\mathcal{G}}_{1} } \right)^{T} \left| {~h\left( {{\mathcal{G}}_{2} } \right)^{T} } \right| \ldots |~h\left( {{\mathcal{G}}_{v} } \right)^{T} } \right]\). In this study, we applied this subgraph attention mechanism to generate context-specific cell line-gene embedding, drug-gene embedding, and drug-structure embedding.Cancer-gene representationWe used the Python library, ssGSEApy (version 1.0.4) [40], to perform the Single-Sample Gene Set Enrichment Analysis (ssGSEA) using cancer cell lines’ gene expression data, which outputs normalized enrichment score (NES) representing the level of enrichment of a gene set in a cell line. We represented the C3 TF-targeting genes as a hypergraph by using TFs as nodes and their targeting genes as hyperedge, and then applied the subgraph attention on the enrichment matrix (For SAFER-c2 model, the hypergraph is replaced by C2 pathway gene sets). By doing this, we can benefit from enrichment analysis to capture TF activities specific to cell lines and leverage the built-in dual attention mechanisms to learn the relationship across gene sets. To model dosage effects on cell line’s transcriptional changes in the context of gene regulation, we injected dose combinations (e.g., the addition of drug dose a and drug dose b) into the node values by multiplying dose combination values with NES. Therefore, the subgraph embedding has the size of the number of subject and embedding dimensions, where subjects represent drug combinations (i.e., drugA-drugB-cell line) at different dose combinations (i.e., doseA-doseB).Drug-gene and drug-structure representationDrug embeddings were extracted from the heterogeneous hypergraphs that consists of two types of nodes: molecular structures and drug-targeting genes. First, we created the structure hypergraph. To represent chemical structures in a hypergraph, we used the k-mer approach [41] which splits a SMILE string into sub-strings with a length of k as nodes. The choice of the k is determined by the ablation study. Second, we used hypergraph to model functional gene sets in which nodes are functional gene sets that drug-genes participated in (which is C2 pathways for SAFER-c2 model, and C3 TFs for SAFER-c3 model). A hypergraph can be represented as an incidence matrix where rows are nodes and columns are hyperedges. As we represent drugs as hyperedges, such that a cell value of one at node \(i\) for hyperedge drug \(j\) represents this drug has the substructure/TF-associated gene in \(i\).To reflect dose effects, we injected the drug dose of each drug into the node values of its associated genes and sub-structures, respectively. We then used the same attention mechanism described above to generate subgraph embeddings for drug-gene and drug-structure multiple relationships.Dose combination representationWe also used one-hot encoding to represent drug doses in the dataset as we observed an increase in per-triplet AUPRC from 0.81 to 0.83. In addition, we encoded phenotypes of cancer cell lines, including tissue of origin, patient gender, and age in a one-hot vector as this information help to improve the overall AUPRC from 0.87 to 0.89. Therefore, we concatenated them all, including the dose combination vector.Synergy predictionModel architectureThe resulting embeddings described above were concatenated and used as the input to the two-layer feed-forward neural (FNN) networks to generate synergy predictions. We applied batch normalization to the input layer. The ELU activation function was used for the FNN layers. We binarized Loewe scores into synergistic samples (LOEWE > 10) and antagonists (LOEWE < -10) and used the BCELoss function combined with a sigmoid layer (i.e., the BCEWithLogitsLoss function in PyTorch library version 2.1.2 [42]). As for model training, we used group k fold to split dataset into ten folds to ensure that there is no overlapping drug-drug-cell line triplet in the training and the test folds. To reduce the impact of graph smoothing effects, we implemented regularization methods including L2 regularization, employed dropout rate, and limited the number of attention layers to two following previous approach [15]. We performed Bayesian optimization for hyperparameter tuning with the Optuna [43] package (version 3.1.0), and we tuned the hidden dimension of sub-hypergraph representation \(\in\) (100, 200, 300, 400, 500), the dropout rate layer for the subgraph attention layer and for the feed-forward neural networks \(0.2 \le dropout rate\le =0.8\). All hyperparameters were tuned on the validation set in a ten-fold cross-validation manner to optimize the area under the precision–recall curve (AUPRC). The best hyperparameters were determined by averaging AUPRC on the validation set across all folds. We used a learning rate scheduler, so we did not tune the learning rate and weight decay. The final hyperparameters used in this study are as follows: hidden dimension = 200, dropout rates = 0.2, 0.3 for the subgraph layer and FNN layers, respectively.Model evaluationThe area under the receiver operating characteristic curve (AUROC) and the area under the precision–recall curve (AUPRC) were chosen to assess the overall performance of our binary classifier. To assess the model’s capability of predicting dose-level drug combination response, we averaged AUROC and AUPRC by drug-drug-cell line groups which we called them per-triplet AUROC and per-triplet AUPRC, respectively. Model performance was evaluated internally and externally under the following scenarios:

Cross-validation on the benchmark dataset
In this setting, we used the O’Neil dataset which is a screening of 490 drug pairs on 34 cell lines on a 5 by 5 dose-grid matrix. We split the data into 10 folds with a ratio of 80:10:10 for train, validation, and test partition. We then averaged performance on the test set across all folds for our models and the competing models, including the typical deep learning model, DeepSynergy, and three other state-of-the-art models, which are DeepDDs, TranSynergy, and CCSynergy. We reported the results of CCSynergy V as it shows a better performance among the other four versions. Hence, the CCSynergy mentioned in this study refers to CCSynergy V.

Split the O’Neil dataset by pair-wise drug pair similarity
In this out-of-distribution (OOD) scenario, we created test sets distinct from the training and test data by dividing drug pairs based on their compound properties and overlapping targeting genes, respectively. We used two measures: Tanimoto similarity for comparing SMILE strings of compounds, and cosine similarity for target gene-based comparisons. This resulted in two data splits: one based on target similarity (i.e., OOD-target) and another on compound similarity (OOD-structure). Test on the unseen data.
We used the dataset from the Almanac study that contains 4 by 4 and 4 by 6 dose-response matrices for 3811 drug pairs and 46 cell lines. We excluded the common drug-drug-cell line triplets seen in the O’Neil dataset, generating an independent test with 72,785 samples. We trained our model on the O’Neil data to obtain the best-performing model and then applied it to the Almanac data to generate predictions. We repeated this procedure ten times and reported mean and standard deviation.

Evaluation on different dose areas
We obtained the half-maximal inhibitory concentration (i.e., IC50) for each drug from the DrugComb database and used that to dissect dose-response matrix into four quadrants for each drug-drug-cell line triplet in the dataset. We defined a higher dose if the tested concentration is above the IC50 of a drug, a lower dose implies the drug is tested at a dose lower than its IC50. When graphically represented, these are the lower-left (or low dose area), upper-left (or low-high dose area), upper-right (or high dose area), and lower-right (or high-low dose area) quadrants. The overall AUPRC and AUROC for these areas is reported.

Model ablation analysisTo find the best-performing model, we first compared different sizes of k for the compound structure hypergraph used for creating drug pair embeddings, and then we tested different cell line-gene embedding using single transcriptomics data against multi-omics. To construct the multi-omics model, we used DepMap’s gene essentiality scores (GES) and GRAND’s cell line-specific TF-gene regulatory networks (TF-GRNs) to obtain subgraph embeddings, respectively. Then, we concatenated them with the gene expression-based subgraph embeddings. The final comparison was conducted by comparing different molecular contexts including the C2 and the C3 collections, representing extracellular signaling pathways and intracellular regulatory networks, respectively.Attention weight analysisStatistical testingWe obtained weighted subgraphs by multiplying attention weights by subgraph values (e.g., dosage effects on enrichment scores for cell lines), then we applied the Mann–Whitney U test to compare difference in weights between synergistic and antagonistic samples. The P-values were adjusted using the Benjamini–Hochberg method to control the false discovery rate at a level of 0.05.Sample poolsWe performed attention weight analysis for the top 20 drug-drug-cell line triplets our model performed well for. In this experiment, we sought to identify functional gene sets that were significantly differentially weighted between synergistic and antagonistic interactions. For each triplet, we first search such gene sets in a sample pool consisting of different dose combinations, and then search in another sample pool consisting of different cell lines that were also screened against the same pair of drugs. Gene sets with adjusted P-value smaller than 0.05 were reported.

Hot Topics

Related Articles