LDAGM: prediction lncRNA-disease asociations by graph convolutional auto-encoder and multilayer perceptron based on multi-view heterogeneous networks | BMC Bioinformatics

Deep topology feature extractionTo cope with the problem of sparse network structure, we propose a multi-similarity network fusion method for deep topological feature extraction based on the already computed functional similarity and Gaussian interaction profile kernel similarity of lncRNA and miRNA and the semantic similarity and Gaussian interaction profile kernel similarity of disease, to realize the complementation of the network features, and to combine the fused deep homogeneous networks, namely, the three deep homogeneous networks, namely, lncRNA, disease, and miRNA, to obtain the multi-similar network. networks and the three interacting similarity networks are combined to obtain a multi-view heterogeneous network, as shown in Fig. 5.Fig. 5Deep topological feature extraction and multi-view heterogeneous network construction. A Deep topological feature extraction based on fusion of lncRNA, miRNA functional similarity, Gaussian interaction profile kernel similarity and disease semantic similarity, Gaussian interaction profile kernel similarity. B Integration of the fused lncRNA, disease, and miRNA homogeneous networks with the similar networks of the three interactions to construct a multi-view heterogeneous networkLet lncRNA functional similarity adjacency matrix be \(LFM_1\), Gaussian interaction profile kernel similarity adjacency matrix be \(LGM_1\), and fused similarity adjacency matrix be \(LM_i\). miRNA functional similarity adjacency matrix be \(MFM_1\), Gaussian interaction profile kernel similarity adjacency matrix be \(MGM_1\), and fused similarity adjacency matrix be \(MM_i\), and disease semantic similarity adjacency matrix be \(DSM_1\), Gaussian interaction profile kernel similarity adjacency matrix be \(DGM_1\), and fused similarity adjacency matrix be \(DM_i\), then deep topological feature extraction formula is as follows.$$\begin{aligned} LM_1= & \frac{\left( LFM_1 + LGM_1 \right) }{\max (LFM_1 + LGM_1)} \end{aligned}$$
(11)
$$\begin{aligned} MM_1= & \frac{\left( MFM_1 + MGM_1 \right) }{\max (MFM_1 + MGM_1)} \end{aligned}$$
(12)
$$\begin{aligned} DM_1= & \frac{\left( DSM_1 + DGM_1 \right) }{\max (DSM_1 + DGM_1)} \end{aligned}$$
(13)
where \(\max (\cdot )\) is the maximum operation. After the first layer of topological features is extracted, the functional similarity and Gaussian interaction profile kernel similarity neighboring matrices of lncRNA, miRNA and the semantic similarity and Gaussian interaction profile kernel similarity neighboring matrices of disease are updated with the following equations:$$\begin{aligned} LFM_2= & LM_1 \otimes LFM_1 \end{aligned}$$
(14)
$$\begin{aligned} LGM_2= & LM_1 \otimes LGM_1 \end{aligned}$$
(15)
$$\begin{aligned} MFM_2= & MM_1 \otimes MFM_1 \end{aligned}$$
(16)
$$\begin{aligned} MGM_2= & MM_1 \otimes MGM_1 \end{aligned}$$
(17)
$$\begin{aligned} DSM_2= & DM_1 \otimes DSM_1 \end{aligned}$$
(18)
$$\begin{aligned} DGM_2= & DM_1 \otimes DGM_1 \end{aligned}$$
(19)
Where \(\otimes\) is the matrix dot product operation, repeat the operation of Eq. 11, Eq. 12 and Eq. 13 to extract the second layer of topological features, and then continue to update the functional similarity, Gaussian interaction profile kernel similarity and semantic similarity of disease of lncRNA, miRNA, Gaussian interaction profile kernel similarity adjacency matrix, and keep repeating, to extract the deep topological features.After extracting the deep topological features, they are integrated with similar networks of lncRNA, disease, and miRNA interactions in order to construct a multi-view heterogeneous network. The multi-view heterogeneous network is represented by the form of a neighbor-joining matrix with the following structure:$$\begin{aligned} A_i = \begin{bmatrix} LM_i & LD & LM \\ DL & DM_i & DM \\ ML & MD & MM_i \end{bmatrix} \end{aligned}$$
(20)
Where, \(LM_i\), \(DM_i\), \(MM_i\) are the topological features of lncRNA, disease and miRNA at layer i; LD is the lncRNA-disease association matrix; LM is the lncRNA-miRNA association matrix; DM is the disease-miRNA association matrix; and DL, ML, and MD are the transpositions of LD, LM and DM.GCN-AEAfter integrating the multi-view heterogeneous networks, each view heterogeneous network is sequentially fed into the GCN-AE for nonlinear feature extraction, which ensures that the information at each layer is fully learned and represented. The GCN-AE is divided into an encoder and a decoder, the encoder gets the low-dimensional embedded form of the input data, which can reflect the nonlinear relationships in the input data, and the decoder decodes the output of the encoder to restore the data, and the process is shown in Fig. 6.Fig. 6Nonlinear feature extraction. The multi-view heterogeneous network is fed into the encoder, which undergoes a convolution operation to obtain the embedded form of the input data and undergoes a bilinear decoding layer to decode the output of the encoder for reductionAt the encoder layer, the input data is first Laplace normalized to reduce the noise in the data with the following equation:$$\begin{aligned} L = D^{-\frac{1}{2}} A_i D^{-\frac{1}{2}} \end{aligned}$$
(21)
where D is the diagonal matrix consisting of the degrees of each row of \(A_i\). After calculating the Laplace normalized matrix of \(A_i\), a convolution operation is performed on it and the result of the convolution is linearly transformed to obtain the output of the encoder with the following equation:$$\begin{aligned} R_e = ReLU \left[ \left( A_i \times L \right) W + b \right] \end{aligned}$$
(22)
where W is a learnable weight matrix, b is a learnable bias term, and \(ReLU(\cdot )\) is a nonlinear activation function. After obtaining the output of the encoder, it is fed into the decoder, and the output of the encoder is decoded through the bilinear layer with the following equation:$$\begin{aligned} R_d = \left[ ReLU(R_eW + b) \right] W + b \end{aligned}$$
(23)
After obtaining the output of the decoder, the loss between the output of the decoder and \(A_i\) is measured using the mean square error loss function, and the loss is reduced by continuous iterative training, and finally, more accurate encoder embedding features can be obtained, and the formula for the mean square error loss function is as follows:$$\begin{aligned} Loss = \frac{1}{n} \sum _{i=1}^n \left( y_i – y_i’ \right) ^2 \end{aligned}$$
(24)
MLPThe nonlinear features are integrated with the deep topological features of the multi-view heterogeneous network to obtain the final feature representation of lncRNA-disease pairs, which are input into the MLP model for the prediction of lncRNA-disease association relationship. In order to improve the performance and stability of the MLP model, this paper proposes an aggregation layer in the MLP model, which is used to control the flow of information between each hidden layer, so that each hidden layer extracts the optimal features. Each hidden layer corresponds to an aggregation layer, and the aggregation layer receives inputs from the previous aggregation layer while receiving inputs from the current hidden layer, and the flow is shown in Fig. 7.Fig. 7Deep topological and nonlinear feature integration and MLP training. A Deep topological features and nonlinear features are integrated to obtain the final feature representation of lncRNA-disease pairs. B The integrated feature representations are input into the MLP model, and the final scores are obtained after a series of hidden and aggregated layers to fit the feature representations, and after a Sigmoid layerThe MLP model consists of multiple hidden layers, each of which receives the output of the previous hidden layer as input and linearly transforms it to fit the data, and the formula for the linear transformation made by each hidden layer is as follows:$$\begin{aligned} H_i = WH_{i-1} +b \end{aligned}$$
(25)
where \(H_{i-1}\) is the output of the previous hidden layer, \({H_i}\) is the output of the current hidden layer, and both W and b are learnable weight matrices.The aggregation layer consists of three gates, namely input gate, forget gate, and update gate, and the flow is shown in Fig. 8.Fig. 8Aggregate Layer. The output of the previous aggregate layer, \(AG_{i-1}\), and the output of the current hidden layer, \(H_i\), are passed through the forgetting gate and the input gate, where the features are filtered and the important features are retained. The update gate integrates the results of the forgetting gate and input gate outputs to get the output of the current aggregate layerThe forgetting gate is used to control the inflow of information from the previous aggregate layer by keeping the important features and discarding the unimportant ones with the following formula:$$\begin{aligned} FW_i= & Sigmoid\left( AG_{i-1}W + b \right) \end{aligned}$$
(26)
$$\begin{aligned} FD_i= & FW_i \otimes AG_{i-1} \end{aligned}$$
(27)
where \(AG_{i-1}\) is the output of the previous aggregate layer, \(Sigmoid(\cdot )\) is a nonlinear activation function that maps the values of the input into the interval [0, 1], \(FW_i\) stands for the weight of the forgetting gate, and \(\otimes\) stands for the dot product operation, \(FD_i\) is the output of the forgetting gate.Input gates are used to control the inflow of information into the current hidden layer, retaining important features and discarding unimportant features with the following formula:$$\begin{aligned} IW_i= & Sigmoid\left( H_iW + b \right) \end{aligned}$$
(28)
$$\begin{aligned} ID_i= & IW_i \otimes H_i \end{aligned}$$
(29)
where \(H_i\) is the output of the current hidden layer and \(IW_i\) is the weight of the input gate, \(ID_i\) is the output of the input gate.The update gate integrates the data that passes through the forgetting gate and the data that passes through the input gate to get the output of the current aggregate layer with the following formula:$$\begin{aligned} UW_i= & Sigmoid\left( [ID_i: FD_i] W + b \right) \end{aligned}$$
(30)
$$\begin{aligned} AG_i= & UW_i \otimes ID_i + \left( 1 – UW_i \right) \otimes FD_i \end{aligned}$$
(31)
Where \([ID_i: FD_i]\) represents the splicing operation of \(ID_i\) and \(FD_i\) along the last dimension, and \(UW_i\) is the weight of the update gate, \(AG_i\) is the output of the current aggregate layer. Stacking multiple aggregate layers, after considering the output of each hidden layer globally, by updating the gates, allows the model to dynamically learn which traits should be retained for each hidden layer.The output of the last aggregate layer is passed through the Sigmoid activation function and mapped to the interval [0,1], and the loss measure is performed using the binary cross entropy loss function with the following formula:$$\begin{aligned} Loss =-\sum \left[ ylog(p) + (1-y)log(1-p) \right] \end{aligned}$$
(32)
where y denotes the label and is 1 if this lncRNA-disease is associated and 0 otherwise, and p represents the probability that the sample is predicted to be a positive case.

Hot Topics

Related Articles