GLNNMDA: a multimodal prediction model for microbe-drug associations based on global and local features

DatasetsIn this article, two well-known public databases, such as the MDAD and the aBiofilm, would be utilized to verify the prediction performance of GLNNMDA. Firstly, we downloaded 2740 known associations between 1373 drugs and 173 microorganisms from the MDAD. After eliminating those microbe-drug connections having not been verified in MDAD by Wang et al.25, we finally obtained 1121 known associations between 233 medications and 109 diseases, and 402 known associations between 73 microbes and 109 disorders from the MDAD. Next, we downloaded 2884 known associations between 1720 drugs and 140 microbes from the aBiofilm. And in a similar way, we eventually obtained 435 known connections between 103 medications and 72 diseases, and 254 known associations between 59 microorganisms and 72 diseases from the aBiofilm after excluding those microbe-drug relationships having not been validated in aBiolfilm by wang et al.25. Table 1 illustrated the details of all these newly-downloaded data.Table 1 Details of the newly-downloaded data.Construction of GLNNMDAAs shown in Fig. 1, GLNNMDA consists of the following three major steps:Fig. 1The flowchart of GLNNMDA.Step 1: Constructing the heterogeneous network C by combining the Gaussian kernel similarity and functional similarity of microbes and drugs based on the newly-downloaded datasets.Step 2: Extracting the low-dimensional feature matrix \(N\) for microbes and drugs respectively by inputting C into the graph attention encoder.Step 3: Concatenating the low-dimensional feature matrix N, the adjacency matrix, and the cosine similarities of drugs and microorganisms to form a novel comprehensive microbe-drug feature matrix B, and feeding B into the GLF module to extract the local and global features of microbes and drugs separately, based on which, obtaining the predicted scores of potential microbe-drug associations in final.Construction of the microbe-drug heterogeneous network C
Construction of the adjacency matrix \({\varvec{A}}\)
Based on above newly-downloaded dataset S illustrated in Table 1, let \({n}_{r}\) and \({n}_{m}\) denote the number of microbes and drugs in S respectively, then we can create an adjacency matrix \(A\in {R}^{{n}_{r}*{n}_{m}}\) as follows: for any give drug \({r}_{i}\), microbe \({m}_{j}\) and disease \({d}_{k}\) in S, there is \(A\left(i,j\right)=1\), if and only if there is a known association between \({r}_{i}\) and \({m}_{j}\) in S, or there are both a known association between \({r}_{i}\) and \({d}_{k}\) and a known association between \({d}_{k}\) and \({m}_{j}\) in S, otherwise there is \(A\left(i,j\right)=0\).Calculation of the Gaussian kernel similarities between microbes or drugsLet \(D\left({r}_{i}\right)\) and \(D\left({r}_{j}\right)\) represent the i-th and the j-th rows in A respectively, then the Gaussian kernel similarity \(GR\left({r}_{i},{r}_{j}\right)\in {R}^{{n}_{r}*{n}_{r}}\) between any two given medicines \({r}_{i}\) and \({r}_{j}\) can be calculated in the following way:$$\begin{array}{c}GR\left({r}_{i},{r}_{j}\right)= exp\left(-\gamma \| D\left({r}_{i}\right)- D\left({r}_{j}\right){\| }^{2}\right)\#\end{array}$$
(1)
$$\begin{array}{c}r=\frac{1}{\left(\frac{1}{{n}_{r}}\sum_{i=1}^{{n}_{r}}\| D\left({r}_{i}\right){\| }^{2}\right)}\#\end{array}$$
(2)
Similarly, let \(S\left({m}_{i}\right)\) and \(S\left({m}_{j}\right)\) denote the i-th and the j-th columns in A separately, then the Gaussian kernel similarity of microorganisms \(GM\left({m}_{i},{m}_{j}\right)\in {R}^{{n}_{m}*{n}_{m}}\) between any two given microorganisms \({m}_{i}\) and \({m}_{j}\) can be calculated in the following way:$$\begin{array}{c}GM\left({m}_{i},{m}_{j}\right)=\text{exp}\left(-\gamma {\Vert S\left({m}_{i}\right)-S\left({m}_{j}\right)\Vert }^{2}\right)\#\end{array}$$
(3)
$$\gamma =1/(\frac{1}{{n}_{m}}\sum_{i=1}^{{n}_{m}}{\Vert S\left({m}_{i}\right)\Vert }^{2})$$
(4)
Calculation of the functional similarities between microbes or drugsIn order to determine the functional similarity between medications or microbes, we need to determine the semantic similarity between disorders first. As suggested by Wang et al.25, for any given disease \({d}_{i}\), let \({D}_{i}\) denote the set that consists of \({d}_{i}\) and all diseases related to \({d}_{i}\), then the semantic contribution value of each sub-item d in \({D}_{i}\) can be computed as follows:$$\begin{array}{c}{K}_{{d}_{i}}\left(d\right)=\left\{\begin{array}{cc}1,& d={d}_{i}\\ \underset{{d}{\prime}\in \mathit{children of d}}{\mathit{max}}\left(\frac{{K}_{{d}_{i}}\left({d}{\prime}\right)}{2}\right),& d\in {D}_{i} and d\ne {d}_{i}\end{array}\right.\#\end{array}$$
(5)
Besides, for any given disease \({d}_{j}\) other than \({d}_{i}\), the semantic similarity between \({d}_{i}\) and \({d}_{j}\) can be calculated as follows:$$\begin{array}{c}DE\left({d}_{i},{d}_{j}\right)=\frac{{\sum }_{d\in {D}_{1}\cap {D}_{2}}\left({K}_{{d}_{i}}\left(d\right)+{K}_{{d}_{j}}\left(d\right)\right)}{DK({d}_{i})+DK({d}_{j}))}\#\end{array}$$
(6)
where \(DK\left({d}_{i}\right)={\sum }_{d\in {D}_{\text{i}}}{K}_{{d}_{i}}(d)\) indicates the sum of semantic contribution values of all sub-items in \({D}_{i}\).Thereafter, for any two given drug \({r}_{i}\) and \({r}_{j}\), let \({d}_{i1}\) and \({d}_{j2}\) denote the number of diseases related to \({r}_{i}\) and \({r}_{j}\) respectively, then the drug functional similarity \(FR\left({r}_{i},{r}_{j}\right)\in {R}^{{n}_{r}*{n}_{r}}\) can be calculated as follows:$$\begin{array}{c}FR\left({r}_{i},{r}_{j}\right)=\frac{1}{{d}_{i1}+{d}_{j2}}\left[\sum_{t=1}^{{d}_{i1}}\underset{1\le s\le {d}_{j2}}{\text{max}}\left(DE\left({d}_{it},{d}_{js}\right)\right)+\sum_{s=1}^{{d}_{j2}}\underset{1\le t\le {d}_{i1}}{\text{max}}\left(DE\left({d}_{jt},{d}_{is}\right)\right)\right]\#\end{array}$$
(7)
For any two given microorganisms \({m}_{i}\) and \({m}_{j}\), we let \({d}_{i3}\) and \({d}_{j4}\) represent the number of diseases associated with \({m}_{i}\) and \({m}_{j}\) separately, then the microbial functional similarity \(FM\left({m}_{i},{m}_{j}\right)\in {R}^{{n}_{m}*{n}_{m}}\) can be calculated as follows:$$\begin{array}{c}FM\left({m}_{i},{m}_{j}\right)=\frac{1}{{d}_{i3}+{d}_{j4}}\left[\sum_{p=1}^{{d}_{i3}}\underset{1\le k\le {d}_{j4}}{\text{max}}\left(DE\left({d}_{ip},{d}_{jk}\right)\right)+\sum_{k=1}^{{d}_{j4}}\underset{1\le p\le {d}_{i3}}{\text{max}}\left(DE\left({d}_{jk},{d}_{ip}\right)\right)\right]\#\end{array}$$
(8)
Calculation of the integrated similarities between microbes or drugsBased on above Eqs. (1), (3), (7) and (8), we can obtain the combinatorial similarities of microbes and diseases as follows:$$\begin{array}{c}IR\left({r}_{i},{r}_{j}\right)=\left\{\begin{array}{cc}\frac{GR\left({r}_{i},{r}_{j}\right)+FR\left({r}_{i},{r}_{j}\right)}{2},& if FR\left({r}_{i},{r}_{j}\right)\ne 0\\ GR\left({r}_{i},{r}_{j}\right),& otherwise\end{array}\right.\#\end{array}$$
(9)
$$\begin{array}{c}IM\left({m}_{i},{m}_{j}\right)=\left\{\begin{array}{cc}\frac{GM\left({m}_{i},{m}_{j}\right)+FM\left({m}_{i},{m}_{j}\right)}{2},& if FM\left({m}_{i},{m}_{j}\right)\ne 0\\ GM\left({m}_{i},{m}_{j}\right),& otherwise\end{array}\right.\#\end{array}$$
(10)
In order to improve the reliability of feature expressions of microbes and drugs, we will further implement the random walk with restart (RWR) algorithm on above combinatorial similarities of microbes and diseases to obtain a unique drug similarity matrix \({S}^{r}\) and a unique microbial similarity matrix \({S}^{m}\) separately, where the algorithm of RWR is as follows:$$\begin{array}{c}{q}_{i}^{l+1}=\lambda M{q}_{i}^{l}+\left(1-\lambda \right){\beta }_{i}\#\end{array}$$
(11)
$$\begin{array}{c}{\beta }_{i,j}=\left\{\begin{array}{cc}1& if i=j\\ 0& otherwise\end{array}\right.\#\end{array}$$
(12)
where \(\lambda \) is the restart probability, \(M\) is the matrix of transition probabilities, \({\beta }_{i}\in {R}^{\left(1*m\right)}\) is the starting odds vector for node i , and \({q}_{i}^{l}\) is the likelihood of node i transferring to node l.Based on these two newly-obtained matrices \({S}^{r}\) and \({S}^{m}\), then we can construct a heterogeneous network \(C\in {R}^{\left({n}_{r}+{n}_{m}\right)*\left({n}_{r}+{n}_{m}\right)}\) as follows:$$\begin{array}{c}C= \left[\begin{array}{cc}{S}^{r}& A\\ {A}^{T}& {S}^{m}\end{array}\right]\#\end{array}$$
(13)
where \({A}^{T}\) represents the transpose matrix of A.Extracting lower-dimensional features for nodes in C
GAT is a model that can reconstruct the node characteristics in a structural network through using several encoder and decoder layers with the same number of levels. This allows for the generation of new node representations by each encoder layer using the correlation between nodes and surrounding layers26 according to the following steps:Step 1 (Encoder): for a given arbitrary node i, the correlation coefficient \({e}_{ij}\) between i and one of its neighboring node j in the heterogeneous network C can be computed as follows:$$\begin{array}{c}{e}_{ij}= LeakyRelu\left(\alpha \left[W*C\left(i\right)||W*C\left(j\right)\right]\right),j\in {\varphi }_{i}^{v}\#\end{array}$$
(14)
$$\begin{array}{c}LeakyRelu\left(x\right)= \left\{\begin{array}{c}x, x>0 \\ \mu x, otherwise \end{array}\right.\#\end{array}$$
(15)
where, || denotes the concatenation operation, W is the trainable matrix, \(\alpha \) is an operation for feature mapping, \(C\left(i\right)\) is the i-th row of C, \({\varphi }_{i}^{v}\) is the set of nodes adjacent to node i, and \(\mu \) is the hyper-parameter. Hence, based on above Eq. (14), the attention score \({\rho }_{ij}\) between i and j can be computed as follows:$$\begin{array}{c}{\rho }_{ij}= \frac{exp\left({e}_{ij}\right)}{\sum_{k\in {\varphi }_{i}^{v}}exp\left({e}_{ik}\right)}\#\end{array}$$
(16)
Thereafter, based on above Eq. (16), we can obtain the new feature representation for node i according to the following Eq. (17):$$\begin{array}{c}{C\left(i\right)}{\prime}=Relu\left(\sum_{j\in {\varphi }_{i}^{v}}{\rho }_{ij}WC\left(j\right)\right)\#\end{array}$$
(17)
$$\begin{array}{c}Relu\left(x\right)=\left\{\begin{array}{c}x, x>0 \\ 0, otherwise\end{array}\right.\#\end{array}$$
(18)
Hence, based on above Eq. (17), we can construct a new matrix N as follows:$$\begin{array}{c}N=\left[\begin{array}{c}{M}_{r}\\ {M}_{m}\end{array}\right]\in {R}^{\left({n}_{r}+{n}_{m}\right)*{k}_{1}}\#\end{array}$$
(19)
Step 2 (Decoder): in GLNNMDA, we employ the inner product as the decoder in the following way:$$\begin{array}{c}NN=sigmoid\left(N\bullet {\left(N\right)}^{T}\right)\#\end{array}$$
(20)
$$\begin{array}{c}sigmoid\left(x\right)= \frac{1}{1+{e}^{-x}}\#\end{array}$$
(21)
Step 3 (Optimization): in GLNNMDA, we adopt the the MSE function as the loss function of the GAT, and use the Adam optimizer to optimize the GAT, where the MSE function is defined as follows:$$\begin{array}{c}Loss= \frac{1}{{n}_{r}+{n}_{m}}\sum_{i=1}^{{n}_{r}+{n}_{m}}{\Vert NN\left(i\right)-C\left(i\right)\Vert }^{2}\#\end{array}$$
(22)
The i-th row of NN and C is indicated by \(NN\left(i\right)\) and \(C\left(i\right)\), respectively.GLF moduleIn recent years, the neural networks are becoming more and more popular in the field of feature extraction due to their advantages of simple operation and good results. However, the neural networks cannot generalize global and local features well, especially in scale data. Therefore, in GLNNMDA, we designed a module called GLF to extract global and local characteristics of microorganisms and drugs, respectively.Extraction of the initial features for microbes and drugsBased on the newly-downloaded dataset of known drug-disease connections, let the number of different diseases be \({n}_{d}\) in the dataset, then similarly to generation of the microbe-drug adjacency matrix A, it is easy to see that we can obtain a drug-disease adjacency matrix \(V\in {R}^{{n}_{r}*{n}_{d}}\). Thereafter, for any two given drug nodes i and j in V, we can compute the cosine similarity \({S}_{r}^{dis}\left(i,j\right)\) between them as follows:$$\begin{array}{*{20}{c}}{S_r^{dis}\left( {i,j} \right) = cos\left( {V\left( i \right),V\left( j \right)} \right) = \frac{{V\left( i \right) \cdot V\left( j \right)}}{{\left\| {V\left( i \right)} \right\| \times \left\| {V\left( j \right)} \right\|}}}\end{array}$$
(23)
where \(V\left(i\right)\) and \(V\left(j\right)\) represent the i-th and the j-th rows in V respectively.Obviously, in a similar way, based on the newly-downloaded dataset of known microbe-disease connections, for any two given microbial nodes i and j, we can further compute the cosine similarity \({S}_{m}^{dis}\left(i,j\right)\) between them.Thereafter, based on above descriptions, we can obtain two feature matrices \({B}_{m}\) and \({B}_{r}\) for microbes and drugs respectively in the following way:$$\begin{array}{c}{B}_{r}=\left[{M}_{r};{S}_{r}^{dis};A\right]\#\end{array}$$
(24)
$$\begin{array}{c}{B}_{m}=\left[{M}_{m};{S}_{m}^{dis};{A}^{T}\right]\#\end{array}$$
(25)
Based on above matrices \({B}_{m}\) and \({B}_{r}\), it is easy to know that we can fuse them together to form a combined microbe-drug characterization matrix \(B\in {R}^{\left({n}_{r}\times {n}_{m}\right)*2*k2}\) as follows:$$\begin{array}{c}B=\left[\begin{array}{c}{B}_{r}\\ {B}_{m}\end{array}\right]\#\end{array}$$
(26)
Thereafter, we will feed the above matrix B into a convolutional neural network to begin the process of features extraction for microbes and drugs. It is important to note that in order to widen the edges, we will put up two convolutional layers and utilize zero padding. In the convolutional neural network, the BatchNorm2d layer and the Relu activation function are specifically contained in each convolutional layer. Here, the purpose of the BatchNorm2d layer is to normalize the data and enhance the capacity of the convolutional neural network for generalization.In the convolutional neural network, the convolutional operation in the i-th layer was defined as follows:$$\begin{array}{c}{F}_{i}=Relu\left({F}_{i-1}\odot {G}_{i}+{b}_{i}\right)\#\end{array}$$
(27)
where \(\odot \) represents the convolutional operation, \({G}_{i}\) is the trainable matrix, and \({b}_{i}\) is the offset vector.Based on above descriptions, it is easy to see that we can obtain an initial feature representation matrix \({B}_{1}\in {R}^{\left({n}_{r}\times {n}_{m}\right)*l*2*k2}\) of microbes and drugs after two convolutional layers.Extraction of the global features for microbes and drugsIn this section, we will input \({B}_{1}\) into the Generalized Mean Pooling (GeM) layer, and then go through the fully connected layer to reduce the feature dimensions to obtain the global feature representation \({f}_{g}\) for microbes and drugs, where the GeM layer is defined as follows:$$\begin{array}{c}{f}_{g,c}={\left(\frac{1}{2*{k}_{2}}\sum_{\left(2,{k}_{2}\right)}{B}_{1,\left(c,2,{k}_{2}\right)}^{p}\right)}_{c=\text{1,2},\dots .,l}^\frac{1}{p}\#\end{array}$$
(28)
where p > 0 is the hyperparameter.Extraction of the local features for microbes and drugsAs illustrated in Fig. 2, the local feature extraction process comprises two parts such as the multi-attribute part and the self-attention part, in which, three null convolutional neural networks are included in the multi-attribute part to generate feature mappings with various spatial acceptance fields. After concatenating these newly-acquired features, a 1 × 1 convolutional layer is designed to further process these data. Next, the self-attentive part will be designed to receive these output feature mappings to further model the significance of each local feature point. To be more precisely, the input \({B}_{1}\) will be first processed through the BatchNorm2d and the 1 × 1 convolution layers, and then, the attention maps produced by the 1 × 1 convolution layer will be modulated and normalized, and finally a SoftPlus operation will be performed. To obtain the local features \({f}_{l}\), we will multiply the newly-acquired attention map with the feature mapping generated previously. The softplus operation is defined as follows:Fig. 2The local feature extraction process.$$\begin{array}{c}Softplus\left(x\right)=log\left(1+{e}^{x}\right)\#\end{array}$$
(29)
Orthogonal fusionObviously, based on the newly-obtained global feature \({f}_{g}\) and local feature \({f}_{l}\), the projection \({f}_{l}^{(i,j)}\) of each local feature point \({f}_{l.pro j}^{(i,j)}\) on the global feature \({f}_{g}\) can be calculated as follows:$$\begin{array}{c}{f}_{l,pro j}^{\left(i,j\right)}=\frac{{f}_{l}^{\left(i,j\right)}\cdot{f}_{g}}{{\left|{f}_{g}\right|}^{2}}{f}_{g}\#\end{array}$$
(30)
Here \({f}_{l}^{(i,j)}\cdot{f}_{g}\) is the dot product operation, and \({\left|{f}_{g}\right|}^{2}\) is the L2 paradigm of \({f}_{g}\), which are defined as follows:$$\begin{array}{c}{f}_{l}^{\left(i,j\right)}\cdot{f}_{g}= {\sum }_{C=1}^{C}{f}_{l,c}^{\left(i,j\right)}{f}_{g,c}\#\end{array}$$
(31)
$$\begin{array}{c}{\left|{f}_{g}\right|}^{2}= {\sum }_{c=1}^{c}{\left({f}_{g,c}\right)}^{2}\#\end{array}$$
(32)
Thereafter, as shown in Fig. 3, we can obtain the components orthogonal to \({f}_{g}\) according to the following way:Fig. 3Schematic of orthogonal complementary vectors.$$\begin{array}{c}{f}_{l,orth}^{\left(i,j\right)}= {f}_{l}^{\left(i,j\right)}- {f}_{l,pro j}^{\left(i,j\right)}\#\end{array}$$
(33)
Next, these orthogonal components will be aggregated into a completely new vector and passed through a fully-connected layer, based on which, the global and local features will be fused into a new feature matrix K. And then, by flattening K and passing it through a fully-connected layer and a softmax function, we will finally obtain the predicted \(score\left(i,j\right)\in {R}^{{n}_{r}*{n}_{m}}\) for potential microbe-drug associations.

Hot Topics

Related Articles