Multi-layer graph attention neural networks for accurate drug-target interaction mapping

Multi-layer DTI networkGive a drug set \(D=\{d_i\}^{n_d}_{i=1}\) and a target set \(T=\{t_j\}^{n_t}_{j=1}\), the similarity between drugs (targets) can be assessed in various perspectives, which are represented by a set of matrices \(\{A^{D,k}\}^{m_d}_{k=1},(\{A^{T,l}\}^{m_t}_{l=1})\),where \(A^{D,k}\in \mathbb {R}^{n_d\times n_d},(A^{T,l}\in \mathbb {R}^{n_t\times n_t})\) and \(m_d (m_t)\) is the number of similarity types for drugs (targets). Let the binary matrix \(A^Y\in \{0, 1\}^{n_d\times n_t}\) indicate the interactions between drugs in D and targets in T, where \(A^Y_{ij} = 1\) denotes that \(d_i\) and \(t_j\) interact with each other, and \(A^Y_{ij} = 0\) otherwise. A multi-layer DTI network \(G^M=(V^M,E^M)\) as shown in Fig. 1 for D and T consists of \(\{A^{D,k}\}^{m_d}_{k=1},\{A^{T,l}\}^{m_t}_{l=1}\) and \(A^Y\), with its adjacency matrices represented as follows:$$\begin{aligned} A^M=\left( \begin{array}{cccccc} A^{D,1}& I& I& \cdots & A^Y& A^Y \\ I& A^{D,2}& I& \cdots & A^Y& A^Y \\ I& I& A^{D,3}& \cdots & A^Y& A^Y \\ \vdots & \vdots & \vdots & \cdots & \vdots & \vdots \\ A^Y& A^Y& A^Y& \cdots & A^{T,m_t-1}& I\\ A^Y& A^Y& A^Y& \cdots & I& A^{T,m_t} \end{array} \right) . \end{aligned}$$
(1)
where \(||V^M|| =N= (n_d\times m_d+n_t\times m_t)\).Fig. 1The figure illustrates a multiplex layer drug-target interaction (DTI) network, which integrates multi-level information from drugs (D) and targets (T) across several layers. On the left, different layers of drug associations are shown (labeled as \(A^{D,1},A^{D,2},A^{D,3}\)), representing various relationships among drugs \(D_1, D_2, D_3\) and \(D_4\). On the right, target associations are depicted in similar layers (\(A^{T,1}\) and \(A^{T,2}\)) with targets \(T_1,T_2\) and \(T_3\). The central part of the diagram displays the interaction between drugs and targets, where a multi-layer network structure is used to capture the complex interplay between different layers of information. This multi-layered approach enables for more comprehensive DTI prediction by considering both intra-layer and inter-layer interactions.Multi-layer attention graph neural networkWe propose a model called the Multi-Layer Graph Attention Neural Network (MLGANN) for DTI prediction. In the multi-layer network of DTI, apart from the interaction between drugs and targets, there is also interaction information among various properties within drugs and targets themselves. Therefore, we utilize the designed MLGANN to capture both the interaction information between drugs and targets and the multi-source information within drugs and targets.Multi-layer neighbor aggregationLet \(X\in \mathbb {R}^{N\times f}\) represent the initial features of nodes in the multi-layer DTI network, where f denotes the dimension of the embedding space. We apply graph neural networks to learn embeddings for drugs and targets on the multi-layer DTI network. Specifically, in our model, we employ Graph Convolutional Networks (GCN), as they are both simple and effective. These embeddings can be refined by applying P layers of GCN across the entire multi-layer DTI network:$$\begin{aligned} X^{(p)} = \sigma \left( \hat{D}^{-\frac{1}{2}}\hat{A}\hat{D}^{-\frac{1}{2}}X^{(p-1)}W^{(p)}\right), \quad p=1,2,\ldots ,P, \end{aligned}$$
(2)
where \(X^{(0)} =X,\hat{A}=A^M+I^M\), and \(A^M\) is the adjacency matrix of the multi-layer DTI network, \(I^M\) is an identity matrix of the same size as \(A^M\), \(\hat{D}\) is a diagonal matrix with \(\hat{D}_{ij} = \sum \nolimits _{j = 1}\hat{A}_{ij}\), \(W^{(p)}\in \mathbb {R}^{f\times f}\) is a trainable weight matrix, \(\sigma\) is a nonlinear activation function ReLU.For a node \(v\in G^M\) (representing either a drug or a target), Eq. (2) updates the embedding of that node as follows:$$\begin{aligned} x_v^{(p)} = \sigma \left( W^{(p)}\sum \limits _{u}\dfrac{1}{\alpha _{vu}}x_u^{(p-1)}\right) ,\quad u\in \{v\cup N_v\cup C_v\} \end{aligned}$$
(3)
where \(\alpha _{vu}\) is the normalized weight, \(N_v\) is the set of neighbors of node v in layer of \(G^M\), and \(C_v\) is the set of nodes that correspond to the same drug/target as node v. Therefore, MLGANN not only aggregates the neighbors of node v in layer of \(G^M\) (similar to what GCN does) but also embeds nodes corresponding to the same drug/target in different layers of \(G^M\). This allows information to be transmitted across different layers of \(G^M\). By leveraging information from different layers of \(G^M\), MLGANN can learn better representations for each node, especially for nodes with limited interactions in a particular layer of \(G^M\). This is the main distinction between MLGANN and existing other network embedding methods.Multi-layer attention poolingWe concatenate all representations learned by the P-layer GCN to obtain the final node embedding:$$\begin{aligned} \begin{aligned} z_i^{D,k} = \left[ z_i^{D,k(0)},z_i^{D,k(1)},\cdots ,z_i^{D,k(P)}\right] \\ z_j^{T,l} = \left[ z_j^{T,l(0)},z_j^{T,l(1)},\cdots ,z_j^{T,l(P)}\right] , \end{aligned} \end{aligned}$$
(4)
where \(z_i^{D,k}\) denotes final embedding of ith drug in k layer of \(G^M\) and \(z_i^{D,k(p)}\) denotes \(G^M\)’s kth layer embedding of ith drug in GCN pth layer, \(z_j^{T,l}\) represents final embedding of jth target in lth layer of \(G^M\), \(z_j^{T,l(p)}\) denotes \(G^M\)’s lth layer embedding of jth drug in GCN pth layer.To obtain the final representations of drugs and targets, we have designed a self-attention mechanism to aggregate the representation vectors of drugs and targets across different layers for DTI prediction in the \(G^M\) graph. The computer process is as follows:$$\begin{aligned} \begin{aligned} e_i^{D,k} = q^D \cdot LeakyReLU\left( W^Dz^{D,k}_i\right) ,\quad e_j^{T,l} = q^T \cdot LeakyReLU\left( W^Tz^{T,l}_j\right) \\ \alpha ^k_i = \frac{e_i^{D,k}}{\sum \nolimits ^{m_d}_{k’=1}e_i^{D,k’}},\quad z^D_i = \sum \limits ^{m_d}_{k=1}\alpha ^k_i z^{D,k}_i,\quad \beta ^l_j = \frac{e_i^{T,l}}{\sum \nolimits ^{m_t}_{l’=1}e_j^{T,l’}},\quad z^T_j = \sum \limits ^{m_t}_{l=1}\beta ^l_j z^{T,l}_j, \end{aligned} \end{aligned}$$
(5)
where \(z^D_i\in \mathbb {R}^{f’}\) and \(z^D_i\in \mathbb {R}^{f’}\) are the final representations of drugs and targets, \(W^D\in \mathbb {R}^{f’\times f’}\) and \(W^T\in \mathbb {R}^{f’\times f’}\) are trainable parameter matrices, \(q^D\in \mathbb {R}^{f’}\) and \(q^T\in \mathbb {R}^{f’}\) are trainable vectors.DTI predictionLet \(G^Y\) be the DTI network derived from the adjacency matrix \(A^Y\). For an edge \(d_it_j\) in \(G^Y\), where \(z^D_i\) and \(z^T_j\) are final representation vectors of drug \(d_i\) and target \(t_j\), respectively. we sample a non-existing edge \(d_ut_v\) in \(G^Y\), where \(z^D_u\) and \(z^T_v\) are final representation vectors of drug \(d_u\) and target \(t_v\), respectively. We consider DTP \(d_it_j\) as a positive sample and \(d_ut_v\) as a negative sample. Therefore, we design the loss function based on cross-entropy as follows:$$\begin{aligned} \mathcalligra{L}=-\log \left( \sigma \left(<z^D_i,z^T_j>\right) \right) -\log \left( \sigma \left( -<z^D_u , z^T_v>\right) \right) \end{aligned}$$
(6)
where \(\sigma\) is a nonlinear activation function Sigmoid, \(<\cdot ,\cdot>\) is the inner product in Euclidean space.

Hot Topics

Related Articles