A method for miRNA diffusion association prediction using machine learning decoding of multi-level heterogeneous graph Transformer encoded representations

MHXGMDA frameworkMiRNA-disease data is often heterogeneous, including different types of entities and complex relationships among them. In order to fully consider the associations between multiple biological entities while effectively retaining the information of the encoding-decoding process, and considering the excellent performance of HGT38 in heterogeneous data processing, we propose a computational method based on multi-layer heterogeneous encoder – machine learning decoder structure for miRNA-disease association prediction (MHXGMDA). As shown in Fig. 1, MHXGMDA mainly includes three distinct stages:

Multi-view similarity feature extraction. We constructed homogeneous similarity matrices for miRNAs and diseases separately as inputs.

Construction of multi-layer heterogeneous graph Transformer. We consider miRNAs and diseases as nodes, and traverse meta-paths in HGT to integrate multiple high-level coding information.

Splice matrix classification. We apply the direct splicing method to fully fuse all the output features of the multi-layer heterogeneous encoder and decode them with the XGBoost classifier to derive the ultimate prediction outcomes.

Figure 1The overall architecture of the MHXGMDA for predicting miRNA-disease association.Multi-view similarity feature extractionBased on previous methods39, we apply Gaussian kernel function to the association network of topology between bioinformatic nodes, so as to obtain miRNA semantic similarity matrix Gaussian interaction profile kernel similarity. Similarly, the disease similarity matrix is obtained by applying Gaussian kernel according to the disease semantic similarity matrix. The multi-view similarity matrices are fused to extract miRNA–miRNA and disease–disease similarity features.$$\begin{aligned} A_{m}=mean\left\{ A_{ms}\left[ S_{ms}\right] ,A_{ms}\left[ S_{ms}\right] \right\} \end{aligned}$$
(1)
$$\begin{aligned} A_{d}=mean\left\{ A_{ds}\left[ S_{ds}\right] ,A_{ds}\left[ S_{ds}\right] \right\} \end{aligned}$$
(2)
where $S_{ms}$ represents miRNA semantic similarity, its matrix expression $A_{ms} \in R^{M\times M }$, $S_{mg}$ represents miRNA Gaussian similarity, its matrix expression $A_{mg} \in R^{M\times M }$; Similarly, $S_{ds}$ represents disease semantic similarity, its matrix expression $A_{ds} \in R^{D\times D }$, $S_{dg}$ represents disease Gaussian similarity, its matrix expression $A_{dg} \in R^{D\times D }$, the specific calculation method is detailed in the Supplementary Information. mean represents the average of the two, which is used to fuse multi-view similarity as the final miRNA similarity matrix $A_{m}$ and disease similarity matrix $A_{d}$.Construction of multi-layer heterogeneous graph TransformerMost of the previous methods failed to capture the dynamic property information of heterogeneous graphs, and the design of HGT in heterogeneous graph data processing makes it a powerful tool for dealing with complex relationships and structures, so we use HGT to learn node representations to capture potential features between miRNAs and diseases. This can be divided into three steps: Firstly, Heterogeneous Mutual Attention, the attention weights of the target node miRNA with respect to the disease of each neighbouring source node, are calculated. Specifically:$$\begin{aligned} Attention_{HGT}\left( n_{d},e_{d,m},n_{m} \right) =\mathop {Softmax}\limits _{{\forall d\in N\left( d \right) }}\left( {\mathop {\Vert }\limits _{i\in \left[ 1,h_{th} \right] }ATT\text {-}head^{i}\left( n_{d},e_{d,m},n_{m} \right) } \right) \end{aligned}$$
(3)
$$\begin{aligned} e^{\left( l \right) }\left[ n_{d} \right] \leftarrow \mathop {Aggregate}\limits _{{\forall d\in N\left( d \right) ,\forall e\in E\left( m,d \right) }}\left( Attention\left( n_{m},n_{d}\cdot Message(n_{m} \right) \right) \end{aligned}$$
(4)
$$\begin{aligned} e^{\left( l \right) }\left[ n_{m} \right] \leftarrow \mathop {Aggregate}\limits _{{\forall d\in N\left( m \right) ,\forall e\in E\left( d,m \right) }}\left( Attention\left( n_{d},n_{m}\cdot Message(n_{d} \right) \right) \end{aligned}$$
(5)
$$\begin{aligned} \sum _{\forall d\in N\left( m \right) } Attention_{HGT}\left( n_{d},e_{d,m},n_{m}\right) =1_{th\times 1} \end{aligned}$$
(6)
where Attention is used to evaluate the significance of the source node, Message extracts information based on the source node, and Aggregate aggregates the neighbourhood information through attention weights. $v_{d}$ denotes the coding of disease, $v_{m}$ denotes the coding of miRNA, $e_{d,m}$ denotes the edge from the source node (disease) to the target node (miRNA).For the i-th attention head $ATT\text {-}head^{i}\left( n_{d},e_{d,m},n_{m} \right)$ we project the source node d of type $\tau \left( n_{d} \right)$ to generate the i-th key vector $K^{i}\left( n_{d} \right)$. This linear projection process uses the $K-Linear_{\tau \left( n_{d} \right) }^{i} :R^{dim}\rightarrow R_{h_{th}}^{dim}$ function,where $h_{th}$ represents the number of attention heads and dim denotes the vector dimension of each head. More specifically, in an effort to cope with different meta relationships,we prepare different mapping matrices, and $K-Linear_{\tau \left( n_{d} \right) }^{i}$ is indexed according to the type $\tau \left( n_{d} \right)$ of the source node d, which aims to maximally preserve the unique features of various relationships and accurately reflect the relationships between different node types. Similarly, the target node m is linearly projected into $Q-Linear_{\tau \left( n_{m} \right) }^{i}$ and the i-th query vector is generated,which is designed to help capture the associations between source and target nodes more precisely. The specific calculation formulas are as follows:$$\begin{aligned} ATT\text {-}head^{i}\left( n_{d},e_{d,m},n_{m} \right) =\left( K^i\left( n_{m} \right) W_{\phi \left( e_{d},m \right) }^{ATT}Q^{i}\left( n_{d} \right) ^{T} \right) \cdot \frac{\mu _{\left\langle \tau (n_d),\phi (e_{d,m}),\tau (n_m) \right\rangle }}{\sqrt{dim}} \end{aligned}$$
(7)
$$\begin{aligned} K^i\left( n_m\right) = K\text {-}Linear_{\tau (n_d)}^i\left( h^{(l-1)}\left[ n_d\right] \right) \end{aligned}$$
(8)
$$\begin{aligned} Q^i\left( n_d\right) = Q\text {-}Linear_{\tau (n_m)}^i\left( h^{(l-1)}\left[ n_m\right] \right) \end{aligned}$$
(9)
Since different meta-paths contribute to the target node to different degrees,for each meta-path triad we set a prior importance weight $\mu _{\left\langle \tau (n_d),\phi (e_{d,m}),\tau (n_m) \right\rangle }$, which serves as an adjustment factor for attention. In order to integrate enough information from different source nodes, for each target node m, we collect all the attention vectors from its neighbours $N\left( m \right)$.The next part is Heterogeneous Message Passing, which calculates the information contribution of each source node to the target node. The idea of multi-head information merging is adopted to splice the information of h heads to get the final representation. The specific calculation method is as follows:$$\begin{aligned} Message_{HGT}\left( n_d,e_{d,m},n_m\right) =\mathop {\Vert}\limits _{i\in \left[ 1,e_{th}\right] }MSG\text {-}head^i(n_d,e_{d,m},n_m) \end{aligned}$$
(10)
$$\begin{aligned} MSG\text {-}head^i(n_d,e_{d,m},n_m)={\text {M-Linear}}_{\tau \left( n_d\right) }^i(e^{\left( l-1\right) }[n_d])W_{\phi (e_{d,m}) }^{MSG} \end{aligned}$$
(11)
Considering the heterogeneity of different edge types in information propagation, we invoke a mapping matrix ${M\text {-}Linear}_{\tau \left( n_d\right) }^i$ based on a specific edge type, denoting the i-th key vector obtained by linearly projecting the type $\tau \left( n_{d} \right)$ of the source node $n_{d}$. $W_{\phi (e_{d,m}) }^{MSG}$ is the weight matrix associated with the edge $e_{d,m}$. The information contributions from source nodes to target nodes are obtained from the view of multiple-heads (multi-heads) and they are combined into a final representation.The final part is Target-Specific Aggregation. Considering that the result of each single-head attention is softmax operated, which means that the sum of the attention weights of all the source nodes is 1, it is straightforward to use the attention as the weight and perform a weighted summation of the message of all the source nodes to obtain the update vector of the target node.$$\begin{aligned} {\widetilde{e}}^{(l)}[n_m]=\mathop {\oplus } \limits _{\forall d\in N(m)}(Attention_{HGT}(n_d,e_{d,m},n_m)\cdot Message_{HGT}(n_d,e_{d,m},n_m)) \end{aligned}$$
(12)
Similarly, to ensure the heterogeneity of the propagated information, the model incorporates corresponding linear matrices in the residual network, representing the embeddings of miRNAs and diseases.$$\begin{aligned} e^{(l)}[n_m]=A\text {-}Linear_{ \tau \left( n_{m} \right) }(\sigma \left( {\widetilde{e}}^{\left( l\right) }\left[ n_m\right] \right) )+e^{(l-1)}[n_m] \end{aligned}$$
(13)
$$\begin{aligned} e^{(l)}[n_d]=A\text {-}Linear_{ \tau \left( n_{d} \right) }(\sigma \left( {\widetilde{e}}^{\left( l\right) }\left[ n_d\right] \right) )+e^{(l-1)}[n_d] \end{aligned}$$
(14)
Then, embedded nodes are connected according to the output of each HGT layer to fuse the information between different layers.The feature extraction effect is shown in Fig. 2.Figure 2Visualisation of miRNA and disease feature heatmaps. The subplots represent vectors of learned representations of miRNA, disease, with colours indicating the intensity of the individual feature components.Splice matrix classificationThe miRNAs obtained from the multi-layer heterogeneous encoder were spliced with the disease features to obtain a fusion descriptor for miRNA-disease association, with the following details:$$\begin{aligned} Z_{ij}=[F_m\left( i\right) ,F_d\left( j\right) ] \end{aligned}$$
(15)
where $F_m\left( i\right)$ refers to the vector representation of the i-th miRNA within feature $F_{m}$ and $F_d\left( j\right)$ refers to the vector representation of the j-th disease within feature $F_{d}$.XGBoost, an efficient gradient boosting framework, fits the data by iteratively adding new decision trees and uses regularisation to control the complexity of the model and to better understand the data and the model through feature subset selection and feature importance assessment. In each decision tree generation, XGBoost uses a gradient-based splitting criterion to select the best division point to improve the accuracy and generalisation of the model. XGBoost is also widely used in bioinformatics research for classification and regression tasks such as gene expression profiling40, disease prediction41, and drug response prediction42. In this study, we utilise XGBoost as a classifier for fusion descriptors of miRNA and disease embedding features. From one perspective, the loss of encoding information is minimally avoided to improve the accuracy and reliability of the association prediction. From another perspective, XGBoost can effectively handle large-scale miRNA and disease data, optimise model performance and minimise the likelihood of overfitting through the gradient boosting algorithm and regularisation. Consequently, as shown in Fig. 3, our model demonstrates superior generalization capabilities on novel data, thereby enhancing its robustness.Figure 3Visualization of predicted score matrix and label matrix heatmaps. The subgraphs respectively depict known and anticipated relationships between miRNAs and diseases. In these heatmaps, the rows represent miRNAs while the columns correspond to various diseases.Experiments and resultsTo access the performance of MHXGMDA in the aspect of miRNA-disease association prediction, we conducted a comparative analysis with seven state-of-the-art baselines on two benchmark datasets: GATECDA43, MINIMDA44, AMHMDA39, VGAMF36, CGHCN45 ,HFHLMDA46, and MGADAE35.GATECDA43 uses graph attention autoencoder (GATE) to extract the high-dimensional feature information to low dimensions, respectively, and the combination is used as an input to the completely connected layer to predict the associations between RNAs and drugs sensitivity.MINIMDA44 constructs an integrated network through multi-source information, obtains embedding representations of miRNAs and diseases through integration of multimodal network’s higher-order neighbourhood information, and finally uses a multilayer perceptron (MLP) to predict the latent associations between miRNAs and diseases.AMHMDA39 combines the information of multiple similarity networks constructed by extracting the attention mechanism, and then introduces supernodes to construct a heterogeneous hypergraph to enrich the node information, and learns miRNA-disease features through graph convolutional networks.VGAMF36 integrates multiple perspectives on miRNAs and diseases through linear weighted fusion, while combining matrix decomposition and variational autoencoder to extract linear and nonlinear features of miRNAs and diseases, and then predict potential miRNA-disease associations.CGHCN45 uses a graph convolutional network to capture initial features of miRNAs and diseases, which is combined with a hypergraph convolutive machine network to further learn complex higher-order interaction information.HFHLMDA46 constructs hyper-edges for miRNA-disease pairs and their k most relevant neighbours to obtain a hypergraph by the nearest neighbour (KNN) method, and trains a projection matrix to predict the association scores between them.MGADAE35 predicts the correlation between miRNAs and diseases by fusing their similarity using multi-core learning. It constructs a heterogeneous network, learns representations through graph convolution, and introduces an attention mechanism to integrate multi-layer representations.To ensure fairness in comparing results, all methods utilize identical similarity data, encompassing miRNA semantic and Gaussian similarities, as well as disease semantic and Gaussian similarities. AMHMDA39 incorporates three modalities, while MHXGMDA uses two. To maintain consistency, we utilize miRNA and disease semantic similarities as the third modality for AMHMDA39 Single-modal models CGHCN45 and HFHLMDA46 are trained solely on the miRNA-disease semantic similarity matrix.Experimental setupIn order to verify the generalisation ability of the model, we divided the two benchmark datasets into training (80%) and testing (20%) samples. For the training set, we employed 5-fold cross-validation (5-CV) to fine-tune model parameters and structure. During training, we set the hidden channels to 64, attention heads to 8, and epochs to 2000. We employ the Adam optimiser with an optimal learning rate of 0.01 and a weight decay rate of 0.002. Additionally, dropout of 0.5 was applied to randomly omit neurons, preventing overfitting. Evaluation metrics included AUC, PRC, F1-score, accuracy, recall, specificity, and precision. Table 1 summarizes mean values across multiple experiments on the VG-data dataset, where MHXGMDA achieved AUC and PRC scores of 0.9594 and 0.9539, respectively.
Table 1 The 5-fold cross-validation test results of MHXGMDA on VG-DATA.Furthermore, we also performed model testing on DA-data, as shown in Table 2, the AUC and PRC reached 0.9601 and 0.9545, respectively, demonstrating its superior performance.
Table 2 The 5-fold cross-validation test results of MHXGMDA on DA-DATA.Parameter discussionIn this study, we learn biological knowledge such as meta-paths in heterogeneous graphs through the Heterogeneous Graph Transformer (HGT) model, and the parameter num-layers regulates the number of layers in the HGT. With the goal of deeply investigating the impact of the parameter num-layers on our model performance, we set different values with the search range of 2, 4, 6, 8, and 10, and tested them on two benchmark datasets, the results are shown in Fig. 4. In general, increasing HGT layers can gradually abstract higher-level feature representations and extract more biological information, but as the number of layers increases, the gradient may gradually disappear or explode during the backpropagation process, resulting in a model that is difficult to train or unstable to train. Eventually, we found that the model performance reaches the best when num-layers is set to 6 on VG-data, therefore, we set num-layers to 6 in all other experiments on VG-data. In addition, we also experimented with 5-CV on DA-data, and the performance reached the best when num-layers is set to 4, similarly, we set num-layers to 6 in all other experiments on DA-data all other experiments set num-layers to 4.Figure 4Parameter analysis for num-layers. They denote the AUC and PRC values corresponding to the number of heterogeneous layers of 2, 4, 6, 8 and 10 under the two benchmark datasets, respectively. As the data value increases on each axis, the data point moves further away from the center point.Classifier selectionFor the sake of selecting the best classifier adapted to the MHXGMDA model framework during the decoding phase, this section adjusts the relevant parameters of seven machine learning models, including XGBoost, SVM, Random Forest, KNN, Decision Tree, Logistic Regression, and Plain Bayes, to evaluate the performance on 5-CV. On VG-data, our model obtains a comparably high AUC value of up to 0.9594 when XGBoost is used as a classifier, while at the same time, the rest of the evaluated metrics reach high levels compared to other classifiers. We also performed a 5-fold cross-validation on DA-data, and the highest AUC value for XGBoost is 0.9601, which is 0.0066 higher compared to SVM with the second highest score. The average experimental results are shown in Tables 3 and 4, where SVM stands for Support Vector Machine, RF for Random Forest, KNN for K-Nearest Neighbors, LR for Logistic Regression, DT for Decision Tree, and NB for Naive Bayesian.
Table 3 The evaluation indicators of MHXGMDA with different classifiers on VG-DATA.Table 4 The evaluation indicators of MHXGMDA with different classifiers on DA-DATA.In conclusion, XGBoost’s ability to extract information from miRNA-disease splicing features on our model outperforms other classifiers, and therefore, we chose XGBoost as the best classifier in the MHXGMDA framework.Comparative analysis of performance with other modelsWe compare MHXGMDA with seven other state-of-the-art models on two benchmark datasets, for all experimental setups with 5-CV training. Figure 5 shows the AUC, PRC for each model.Figure 5ROC curves and PRC curves plotted by utilizing the cross validation results of different models. ROC curve represents Receiver Operating Characteristic Curve, PRC curve represents Precision-Recall Curve.Tables 5 and 6 show the average AUC, PRC, F1-score, accuracy, recall, specificity, precision, and running time per epoch for each model. It can be observed that although MHXGMDA fails to outperform the MGADAE method in terms of specificity, precision, and running time, it shows better performance in all other metrics, with average AUCs of 0.0138 and 0.0106 higher than the MGADAE method on VG-data and DA-data, respectively. Compared with the other seven methods in multiple cross-validations, MHXGMDA attained the highest AUC and AUPR values, validating its superiority in association discovery compared to other methods.
Table 5 Performance of MHXGMDA with other seven models on VG-DATA.Table 6 Performance of MHXGMDA with other seven models on DA-DATA.Ablation experiments with different network architecturesTo verify the effectiveness of heterogeneous graph representation encoding in the MHXGMDA model framework, we propose three model variants, MHXGMDA-w/o Last, MHXGMDA-used HAN, and MHXGMDA-w/o Linear, in which we validate the roles of the one-dimensional splicing network layer, the heterogeneous graph Transformer, and the linear layer, respectively. Among them, MHXGMDA-w/o Last is the model that excludes the one-dimensional splicing network layer, MHXGMDA-used HAN replaces the Heterogeneous Graph Transformer model (HGT) with the Heterogeneous Graph Attention Network model (HAN), and MHXGMDA-wo Linear refers to the removal of the linear layer in the feed-forward neural network that precedes the multi-layer heterogeneous encoder. As shown in Fig. 6, Tables 7 and 8, MHXGMDA outperforms the other three variants of the model, with AUCs of 0.8886, 0.9212, and 0.9135 for the variants on VG-data, and 0.9578, 0.9595, and 0.9572 on DA-data, respectively. In addition, in all of the evaluated metrics, MHXGMDA-used HAN can significantly outperform MHXGMDA-w/o Last, which indicates that the one-dimensional splicing network layer has the ability to fully learn the node representations, while the multi-layer heterogeneous graph Transformer can further enhance the model performance.Figure 6Ablation experience results on different network architectures of MHXGMDA. Mean denotes the mean, STD denotes the standard deviation, the vertical axis denotes the corresponding values of the evaluation indicators under each variant.Table 7 Ablation experiment results on different network architectures of MHXGMDA on VG-DATA.Table 8 Ablation experiment results on different network architectures of MHXGMDA on DA-DATA.Ablation experiments with different viewsIn order to access the rationality of including multimodal training data in MHXGMDA, we implemented two variants of the model ignoring multiple modalities, MHXGMDA-w/o SS and MHXGMDA-w/o GS. Specifically, MHXGMDA-w/o SS is trained without the miRNA semantic similarity matrix, the disease semantic similarity matrix. While the training data of MHXGMDA-w/o GS only excludes miRNA, disease Gaussian similarity matrix. The experimental results are shown in Fig. 7, Tables 9 and 10. On both datasets, almost all the metrics tested by the MHXGMDA model are significantly better than the single-modal variants MHXGMDA-w/o SS and MHXGMDA-w/o GS, which implies that combining the multimodal data is significant to the prediction of miRNA-disease relationships.Figure 7Ablation experience results on different views of MHXGMDA. The figure shows four indicator values of variant models from different views under two benchmark datasets, respectively, where the train data of w/o SS only includes Gaussian similarity matrix, w/o GS only includes semantic similarity matrix, and Ours includes both.Table 9 Ablation experiment results on different views of MHXGMDA on VG-DATA.Table 10 Ablation experiment results on different views of MHXGMDA on DA-DATA.Case studyTo evaluate the accuracy of MHXGMDA in predicting miRNA-disease associations in real cases, we chose three different diseases: Lung Neoplasms, Carcinoma, Hepatocellular and Glioblastoma as case study subjects. Firstly, we deleted all miRNAs associated with the above three diseases during training, Subsequently, the model’s ability to recover deleted associations during the prediction process is evaluated. Then, we ranked the association scores of the three disease-related miRNAs predicted by MHXGMDA and chose the top 20 miRNAs. For the sake of simplicity, we abbreviated HMDD v4.0 as ‘H4’ in Table 11.
Table 11 TOP 20 miRNA-disease resistance associations predicted by MHXGMDA.Numerous studies have demonstrated a close association between alterations in miRNA expression levels and the progression of diverse diseases. One of these diseases is lung tumours, one of the common malignant tumours, and the co-expression of hsa-miR-182 and hsa-miR-126 helps to differentiate between primary lung tumours and lung metastases47. The second group of diseases in the case study is hepatocellular carcinoma, one of the deadliest forms of cancer in the world, and it has been found that decreased levels of miR-16 and miR-199a expression in the serum of patients exhibit a robust linkage with the progression of hepatocellular carcinoma48. In addition, glioblastoma is one of the most common types of fatal brain tumours, where the tumour compresses, infiltrates, and destroys brain tissue, leading to local symptoms and neurological impairment. It has been shown that the microRNA-302-367 cluster effectively leads to the destruction of glioma initiating cells and their tumorigenic properties49. Experiments showed that nearly all possible associations forecasted by the model could be verified, which sufficiently demonstrated the excellent performance and reliability of MHXGMDA in actually exploring miRNA-disease associations.Finally, we focused on the three diseases mentioned above and used them as the central nodes to construct the network by carefully selecting the miRNAs that ranked in the top ten of their respective scores. As shown in Fig. 8, the finding that lung tumours and glioblastoma exhibited the highest number of identical miRNAs in the top ten scores is quite striking. However, it is more noteworthy that despite their significant commonalities at the miRNA level, lung tumours and glioblastomas show relatively little similarity in disease characteristics, and thus the same miRNAs may play different roles in different diseases.Figure 8miRNA-disease association subnetwork. The red nodes represent the three diseases and the blue nodes represent the top10 miRNAs associated with the diseases.

A method for miRNA diffusion association prediction using machine learning decoding of multi-level heterogeneous graph Transformer encoded representations

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis

Chemistry wordoku #062 | Puzzle

Multi-output prediction of dose–response curves enables drug repositioning and biomarker discovery

Hot Topics

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis

Related Articles

Balancing Act: Pregnancy and Bipolar Disorder

Cohesion at the cellular level: flexible yet stable

Gut bacteria influence responses to immunotherapy in patients with asbestos related cancer

Quick Links

Must Read

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis

Chemistry wordoku #062 | Puzzle

Popular Articles

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis