MSH-DTI: multi-graph convolution with self-supervised embedding and heterogeneous aggregation for drug-target interaction prediction | BMC Bioinformatics

The overall framework of MSH-DTI is shown in Fig. 1. Firstly, the initial features of drug and target are obtained using the pre-trained self-supervised learning models, InfoGraph [28] and CPCPro [29], respectively. Next, three graphs including protein–protein interaction (PPI), drug–drug interaction (DDI) and drug–drug similarity (DDS) are constructed based on the drug and target initial features. In addition, another heterogeneous graph, protein-drug interaction (PDI), is constructed using multi-source data by a HIFFM, which could extract more comprehensive features in DTI. Next, multilayer graph convolutions are applied on four graphs to obtain the corresponding target features \(p^e\), \(p^c\) and drug feature \(d^e\), \(d^c\) and \(d^s\). Final target and drug feature are fused by the attention mechanism independently, which are multiplied together for DTI prediction.Fig. 1Overall framework. a Feature extraction of drug and target using self-supervised learning method. b Construction of multiple graphs with HIFFM. c Multilayer graph convolution module. d Feature aggregation and result prediction with attention mechanismDatasetsThe DTINet dataset [30] is used in MSH-DTI from model training and test. The dataset consists of 708 drugs and 1512 targets, totaling 1923 drug-target associations, 10036 drug–drug associations, 7363 target–target associations. Among them, they extracted drug nodes, DTIs and drug–drug interactions from Drugbank 3.0 [31], and extracted target nodes and protein–protein interactions from HPRD [32]. The SMILES representations of drug are identified using DrugBank IDs, while the amino acid sequence are acquired according to the UniProt IDs of each target.In addition, the ratio of negative to positive samples in the DTINet dataset is remarkable. The percentage of positive sample is 0.18%, and the percentage of negative samples is 99.82%.Self-supervised feature extraction moduleTo fully utilize the structural information of drugs and targets, two self-supervised learning methods including Infograph and CPCProt are introduced to obtain more comprehensive representation for drug and target respectively.However, it is important to note that self-supervised learning methods typically have higher computational complexity than traditional feature extraction approaches. Traditional methods often rely on hand-crafted feature extractors, which have relatively low computational costs, but may not capture the comprehensive intricate relationships within the data. Self-supervised learning methods, on the other hand, require more computational resources, more model training time and large-scale training data, which can involve more iterations and complex optimization processes.10K molecules are used drug feature pre-training, which take about 0.07 h for each epoch, and 14 h in total. The protein feature pre-training process also need 1.5 h for each epoch on 5k protein sequences. However, through the pre-trained model, it only takes about 5 s to extract features of each drug and protein in the down-stream task.Feature extraction of drugThe InfoGraph is an self-supervised learning model that utilizes graph neural networks to learn node and graph representations from graph data. By maximizing the mutual information between the graph representation and the patch representation, the InfoGraph model obtains an effective graph representation.The SMILES strings of all drugs in the dataset are converted into molecular graphs using the RDKit [33], where each atom serves as a node and each bond serves as an edge. The generated drug molecular graph can be fed into the InfoGraph model which is pretrained on QM9 dataset [34]to extract structural features of the drug.Feature extraction of targetTo extract the structural features of the target, the CPCProt model is used, which is a self-supervised learning method by maximizing mutual information between both local and global information of protein sequence to obtain representation of the protein. The CPCProt model first divides the target sequence into fixed-size fragments and distinguishes between subsequent fragments from the same protein and fragments from random proteins using autoregressive modeling. Each fragment is then processed by the encoder to generate a feature. All features are concatenated to form the feature of protein.In our model, all target sequences in the dataset are fed into the CPCProt which are already pre-trained on Pfam dataset [35] to extract the structural feature for each target.To unify the dimensions of the target and drug features, both initial features of drug and target are transformed into the 128 dimension and fed into the proposed model. The target feature is denoted as p, while the drug feature is denoted as d.Multiple graphs construction with heterogeneous interaction-enhanced feature fusion moduleMultiple graphs constructionOnce the self-supervised features have been extracted, they can be used to construct multiple graphs. For each graph, the self-supervised features of protein or drug are used as the feature each node in the graph. The relationship between each node according to the types of graphs. To capture the correlations between drug and target, multiple graphs are introduced into the model which capture different interaction information in each view. The first one is the drug–drug interaction graph (DDI), where the edges between drugs indicate their interaction relationships. The second one is the drug–drug similarity graph (DDS), where the edges between drugs indicate their similarity relationships. The similarity scores are Tanimoto coefficient calculated by extracting Morgan fingerprint [36] for each drug molecule. The other two graphs are protein–protein interaction graph (PPI) and protein-drug interaction graph (PDI) generated from the dataset.Heterogeneous interaction-enhanced feature fusion moduleThe feature matrix of DDI, DDS and PPI are constructed by initial feature of drug or target. In PDI, the types drug node and target node are different. When aggregating node features from its neighborhood, it is helpful to incorporate information from other relationship into the feature aggregation process. Therefore, a Heterogeneous Interaction-enhanced Feature Fusion Module (HIFFM) is proposed to updated node feature in PDI through the other three graphs. The specific process is shown in Fig. 2.Fig. 2Firstly, the homogeneous features of each drug and target are aggregated through the initial features of the 1-hop neighbors in the PPI, DDI and DDS respectively, then the HA features of drug and target are aggregated by their 1-hop heterogeneous neighbors respectively, and the final fusion feature is obtained by the initial features and HA features of the drug and target respectivelyThe nodes encoding of the three homogeneous networks, PPI, DDI, and DDS, are initially obtained based on features from their 1-hop neighborhood nodes through Eqs. 1 and 2.$$\begin{aligned} p_m^*= & \frac{1}{\vert {N_{PPI}(m)}\vert }\sum \limits _{n\in {N_{PPI}(m)}}p_n \end{aligned}$$
(1)
$$\begin{aligned} d_i^*= & \frac{1}{\vert {N_{DDI}(i)}\vert }\sum \limits _{j\in {N_{DDI}(i)}}d_j+\frac{1}{\vert {N_{DDS}(i)}\vert }\sum \limits _{j\in {N_{DDS}(i)}}d_j \end{aligned}$$
(2)
where \(N_{PPI}(m)\),\(N_{DDI}(i)\) and \(N_{DDS}(i)\) are the set of neighbors of node in corresponding graph, and \(p_m^*\) and \(d_i^*\) are the homogeneous feature of the target and drug respectively.Next, features of each node in PDI aggregate their 1-hop heterogeneous neighboring node as the Heterogeneous Aggregation (HA) feature to enrich the target and drug representation, as Eqs. 3 and 4.$$\begin{aligned} p_m^{**}= & \frac{1}{\vert {N_{PDI-H}(m)}\vert }\sum \limits _{n\in {N_{PDI-H}(m)}}d_n^* \end{aligned}$$
(3)
$$\begin{aligned} d_i^{**}= & \frac{1}{\vert {N_{PDI-H}(i)}\vert }\sum \limits _{j\in {N_{PDI-H}(i)}}p_j^* \end{aligned}$$
(4)
where \(N_{PDI-H}(m)\) and \(N_{PDI-H}(i)\) is set of 1-hop heterogeneous neighbor nodes of m and i, \(p_m^{**}\) and \(d_i^{**}\) are the HA features of the target and the drug.Considering the different contributions of the initial features and HA features, the final fusion features of the target and the drug are calculated by combining with initial features and HA features by assigning different weights through Eq. 5. The feature \(p_m^{**}\) processed by the HIFFM module is added to the original feature \(p_m\). The original feature \(p_m\) retains a large part of the original information, while the feature \(p_m^{**}\) contains more interaction information. Combining the two allows the model to obtain more comprehensive and enriched information.$$\begin{aligned} p^f_m= \alpha *p_m+ \beta *p^{**}_m \;\;\;\;\; d^f_i= \alpha *d_i+ \beta *d^{**}_i \end{aligned}$$
(5)
where \(\alpha\) and \(\beta\) are initialized weights, and \(\alpha\) + \(\beta\) = 1.Multilayer graph convolution moduleAfter obtaining the feature of each node in the four graphs, the graph convolutional neural networks (GCN) is used to capture the relationships between nodes. Due to the complex and highly correlated structure of the interaction network, traditional machine learning methods often struggle to capture the intricate patterns and correlations within it. However, GCN uses the connectivity between nodes to propagate information throughout the network and aggregate features from neighboring nodes. This approach preserves the global structure while capturing local features and relationships. Therefore, GCN is appropriate for updating node features. The core idea of GCN is to aggregate and update node features by exploiting the connectivity between nodes. Through iterative convolution operations and feature aggregation, each node can obtain more comprehensive information, improving the learning ability of graph data.Take the target-centered PDI as an example, the initial features p of the targets and the fusion features \(d^f\) of the processed drugs from the previous section are fed into the GCN. Through multi-layer graph convolution operations, higher-order node information can be gradually passed and integrated. To effectively utilize the feature representations from each layer and improve the comprehensiveness of each node, features obtained from different layers are aggregated by mean pooling to obtain the final feature of each node.For PDI, two GCNs are implemented to obtain the features of target and drug respectively. The first GCN uses the initial feature of drug and fusion feature of target as input to extract the target-centered PDI feature \(p^e\). The second GCN use the initial feature of target and fusion feature of drug as input to extract the drug-centered PDI feature \(d^e\).For PPI, DDI, and DDS, only the initial features of the drug and target are used as inputs. The target PPI feature \(p^c\) and drug DDI feature \(d^c\) and DDS feature \(d^s\) are obtained.Feature aggregation and result predictionAfter obtaining multiple features, a simple concatenation operation of them as the final feature for prediction is not sufficient to adequately express the various relationships between drug and target. To better capture the association information and improve the accuracy of the model, inspired by Neural Attentive Item Similarity model(NAIS) [27],an attention-based weighted summation mechanism is introduced for feature representation of drugs and targets, The core principle of attention mechanism is to dynamically adjust the weights of each feature based on its importance to better capture associative information. The attention mechanism allows the model to be more flexible and accurate in handling feature representations of drugs and targets. Through computation, the attention weights for each feature are automatically adjusted according to its task relevance. As a result, during the feature weighting and summation process, the model is better equipped to capture the intricate associative information within the network, which is described as Eq. 6.$$\begin{aligned} \alpha _{p_m} = z_{\alpha }Relu(w_{\alpha }p_m^e+b_{\alpha }) \;\;\;\;\; \beta _{p_m} = z_{\beta }Relu(w_{\beta }p_m^c+b_{\beta }) \end{aligned}$$
(6)
where \(z_{\alpha }\), \(z_{\beta }\), \(w_{\alpha }\), \(w_{\beta }\), \(b_{\alpha }\) and \(b_{\beta }\) are trainable weight parameters, and RELU denotes the activation function. After obtaining \(\alpha _p\) and \(\beta _p\), the softmax activation function is used for normalization to calculate the final target feature \(p’\):$$\begin{aligned} {\tilde{\alpha }}_{p_m}= & \frac{exp(\alpha _{p_m})}{exp(\alpha _{p_m})+exp(\beta _{p_m})} \end{aligned}$$
(7)
$$\begin{aligned} p_m’= & {\tilde{\alpha }}_{p_m}p_m^e+(1-\tilde{\alpha }_{p_m})p_m^c \end{aligned}$$
(8)
Similarly, based on the three features \(d^e\), \(d^c\) and \(d^s\) of the drug, the final drug feature \(d’\) is calculated in the same way.After obtaining the final feature representations of the target and drug, the inner product operation is used to predict the drug-target interaction through Eq. 9. The inner product of the drug feature \(d_i’\) and target feature \(p_m’\) could be represented as the relationship between drug and target, since the inner product of two vectors is defined to be the amount that a vector is pointing in the same direction as the other vector. A larger inner product indicates that they are more similar or correlated.$$\begin{aligned} y_{im}’=sigmoid(d_i’ p_m’^T) \end{aligned}$$
(9)
\(y_{im}’\) denotes the label predicted by the model. Finally, the loss function is used to optimize the model:$$\begin{aligned} \begin{aligned} L=(1-\mu )\sum \limits _{i=1}^{N_d}\sum \limits _{j=1}^{N_p}\parallel {y_{ij}}\odot (y_{ij}-y_{ij}’)\parallel ^2\\ + \mu \sum \limits _{i=1}^{N_d}\sum \limits _{j=1}^{N_p}\parallel {(1-y_{ij})\odot (y_{ij}-y_{ij}’)}\parallel ^2 \end{aligned} \end{aligned}$$
(10)
\(\mu\) is the weight parameter, \(N_d\) is the number of drugs, \(N_p\) is the number of targets, \(y_{ij}\) is the true value of drug i and target j, \(y_{ij}’\) is the predicted value of drug i and target j. \(\odot\) denotes element-by-element multiplication, \(\parallel \parallel ^2\) is the squared Frobenius norm. Figure 3 shows the entire feature variation process.Fig. 3The workflow of feature processing

Hot Topics

Related Articles