DGCPPISP: a PPI site prediction model based on dynamic graph convolutional network and two-stage transfer learning | BMC Bioinformatics

Evaluation metricsTo evaluate the performance of DGCPPISP, seven common evaluation metrics are used in this paper: accuracy, precision, recall, F1-measure, area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC) and Matthews correlation coefficient (MCC) [11]. The formulas of these metrics are$${\text{A}}ccuracy = \frac{TP + TN}{{TP + TN + FP + FN}}$$
(7)
$$Precision = \frac{TP}{{TP + FP}}$$
(8)
$$Recall = \frac{TP}{{TP + FN}}$$
(9)
$$F – measure = \frac{2 \times Precision \times Recall}{{Precision + Recall}}$$
(10)
$$MCC = \frac{TP \times TN – FP \times FN}{{\sqrt {\left( {TP + FP} \right)\left( {TP + FN} \right)\left( {TN + FP} \right)\left( {TN + FN} \right)} }}$$
(11)
where \(TP\) denotes correctly predicted PPI sites, \(TN\) means correctly predicted non-PPI sites, \(FP\) represents incorrectly predicted non-PPI sites and \(FN\) is incorrectly predicted PPI sites. AUROC and AUPRC are metrics for the overall performance of the prediction model. Note that our model is implemented in the unbalanced datasets, we more focus on F1-measure, MCC, AUROC and AUPRC indices as the main evaluation besides accuracy, precision and recall metrics.Performance comparison and analysis with different methodsComparison with competitive methodsIn order to assess the efficacy of diverse prediction approaches, we carry out a series of experiments on PPI site datasets, Dset_186_72_PDB164 and Dset_331, by contrasting DGCPPISP with an array of current state-of-the-art methods. This comparison involves a total of 10 methods, six of which are structure-based (StackingPPINet [4], SPPIDER [34], DeepPPISP [11], Attention-CNN [14], HN-PPISP [15] and EGRET [18]) utilizing the structural information of proteins, whereas the remaining are sequence-based (PSIVER [24], ISIS [35], RF_PPI [7], SCRIBER [10], DELPHI [12]), relying solely on the sequence features of proteins.Table 1 showcases the results of DGCPPISP in comparison to other methods on Dset_186_72_PDB164. It is evident that our model outperforms in five of the six evaluated metrics. Even though our method exhibits a minor setback, relatively 2.4% lower than HN-PPISP in terms of recall, it still secures the second rank. Acknowledging the imbalanced nature of the PPI site prediction dataset, DGCPPISP demonstrates a conspicuous superiority in other performance metrics. Specifically, improvements in F1-measure, AUPRC and MCC are noted at 8.7%, 23.9% and 25.4% respectively. Comparisons with DELPHI, the current leading model among sequence-based methods, further underline DGCPPISP’s dominance in all metrics, particularly AUPRC and MCC, exhibiting enhancements of 23.9% and 29.7% respectively. Interestingly, DGCPPISP surpasses structure-based methods even without relying on protein structure information. When compared to EGRET, a model employing GAT to incorporate protein structure, DGCPPISP outstrips it by 5.9%, 10.1% and 13.3% in F1-measure, AUPRC and MCC, respectively. These results imply that the superior performance of DGCPPISP may be attributed not only to the inherent advantages of dynamic graph convolution over GAT, but also to the integration of the ESM-2 embedding, assisting DGCPPISP in capturing covert information in protein feature representation. To further illustrate this, we select two representative methods from the sequence-based (SCRIBER and DELPHI) and structure-based (DeepPPISP and EGRET) categories and compare their AUROC values with the proposed DGCPPISP. As Fig. 3 indicates, DGCPPISP accomplishes superior prediction results on the AUROC performance index. In addition, we show the results of DGCPPISP on a version of the Dset_186_72_PDB164 with less than 20% sequence homology in Table 1, which persistently exhibit the commendable performance of our method.Table 1 Comparison of PPI site prediction performance on Dset_186_72_PDB164Fig. 3AUROC comparison of some methods on Dset_186_72_ PDB164A more comprehensive evaluation of DGCPPISP is conducted by comparing it with two leading-edge methods, HN-PPISP and DeepPPISP, on Dset_331. As illustrated in Fig. 4, DGCPPISP surpasses both comparative methods across all six metrics, achieving full area coverage on the radar map. Specifically, in comparison to DeepPPISP, our model manifests improvements of 9.7% on ACC, 43.1% on Precision, 21.4% on Recall, 35.4% on F1-measure, 43.6% on AUPRC, and 97.8% on MCC, respectively. When juxtaposed with HN-PPISP, all six metrics of our model exhibit improvements by 0.5%, 11.5%, 21.4%, 14.5%, 19.8% and 29.9%, respectively. Moreover, the above two methods resort to PSSM for protein feature extraction, which is a time-consuming process. Our model circumvents this extraction operation through the transfer learning and solely utilizes conveniently extracted sequence encoding features while maintaining the prediction precision. This approach highlights the clear advantage of our method when the model’s performance and the time cost of feature extraction are taken into account.Fig. 4Prediction comparison on Dset_331Comparison with traditional graph convolutional neural networksIn this subsection, we juxtapose the results of the dynamic graph convolutional neural network with the traditional graph neural network on PPI site prediction. We opt for two representative models for traditional graph neural networks: the graph convolutional network (GCN) [36] and the graph attention network (GAT) [19]. To establish a uniform comparison with the dynamic k-neighborhood graph of DGCPPISP, the node neighborhood range of both GCN and GAT is set to the same k, with nodes in the graph structure symbolizing the protein residues. Concurrently, we adhere to the adjacency construction method delineated in EGRET [18], electing the k residues with the nearest average atomic distance to the central node residues as their adjacent nodes. Concerning features, all five features in our model are selected as node features, meaning that all three graph networks have undergone the initial stage of information transfer. The performance of the three graph networks on Dset_186_72_ PDB164 is depicted in Fig. 5. Evidently, DGCPPISP outperforms the other two methods in five out of seven metrics, inclusive of four comprehensive performance metrics of primary concern. Although the GCN method does have an edge in Recall, its performance in the remaining six metrics is subpar in comparison. With the assistance of the multi-head attention mechanism, GAT exhibits superior overall performance compared to GCN, and its ACC is slightly higher than that of DGCPPISP. However, given the imbalance of the dataset, ACC alone is not the definitive metric for PPI site prediction. For the other metrics, GAT’s results trail those of DGCPPISP. This experiment underscores the enhanced performance of the DGCPPISP model, based on a dynamic graph convolutional network, as opposed to traditional graph network methods GCN and GAT, which rely on fixed neighborhoods for PPI site prediction. We also graph the ROC and PR curves of the three graph convolutional neural networks on Dset_186_72_PDB164, as illustrated in Fig. 6. The discrepancies between these two curves also echo the superiority of DGCPPISP over the other two methods. We conclude it is attributed to dynamic graph convolutional neural networks’ ability to effectively adapt to non-stationary data by capturing time dependencies and dynamically update graph representations in the feature training process, thereby improving the prediction of PPI binding sites.Fig. 5performance comparison with various graph neural networksFig. 6ROC and PR curves of three kinds of graph neural networksAblation studyFeature evaluationOur model harnesses a total of five protein sequence-based encodings as features. We gauge the impact of each feature on the performance of DGCPPISP through an ablation experiment. Specifically, we eliminate each feature in turn from the set of five features and subsequently test the model performance with a combination of the remaining four features. We then plot their AUPRC and MCC histograms on the primary dataset, Dset_186_72_PDB164 (notably, this experiment’s results do not implicate the second stage of transfer learning, as no pre-training is performed on the protein-peptide binding residue dataset). As Fig. 7 illustrates, the model’s performance on both metrics experiences a certain degree of decline after the removal of each feature. This validates that none of the features are superfluous, and all contribute substantially to the model’s performance. Of particular note is ESM-2, which makes the most significant contribution to the model due to its abundant protein pre-training information. Without the aid of ESM-2 features, AUPRC reduces from 0.421 to 0.359, signifying a decline of 14.7%, and MCC descends from 0.299 to 0.230, indicating a 23.1% reduction. Remarkably, even in the absence of ESM-2, DGCPPISP still outperforms the majority of the other methods depicted in the ensuing Table 1.Fig. 7Effect of different features on model performance on Dset_186 _72_PDB164. Note that the x-axis shows the specific features that are removed. For amino acid co-occurrence similarity encoding and electrostaticity and hydrophobicity similarity encoding, they are named as co-occurrence and ele&hyd, respectively. None indicates that no features are removedEffectiveness analysis of transfer learningTo ascertain the performance of DGCPPISP at various stages of transfer learning, we perform an experiment on Dset_186_72_PDB164, as displayed in supplementary Table 2. It is observed that each stage of transfer learning contributes to varying degrees of performance enhancement in the overall model. For instance, the first stage of transfer learning yields the most significant improvements of 10.9%, 7.4%, 17.3%, and 28.3% in the final four metrics, respectively. This substantiates a high correlation with the rich features encompassed in ESM-2. The second stage of transfer learning contributes further improvements of 1.1%, 1.9%, 5.9%, and 2.3% respectively, which indicates that pre-trained model parameters offer beneficial prediction performance compared to random parameter selection in feature extraction and representation. We also depict the ROC and PR curves corresponding to the three transfer learning stages to intuitively exhibit the impact of transfer learning on DGCPPISP, as illustrated in supplementary Fig. 8. It is discernible that the ROC and PR curves drawn using the model with the complete two-stage transfer learning are generally positioned above the curves of models with the other two stages separately, signifying excellent performance.Table 2 Performance comparison of different stages of transfer learning on Dset_186_72_PDB164Fig. 8Comparison of three states of transfer learningThe same experiment is subsequently conducted on Dset_331, and the results are presented in supplementary Table 3. After employing the first stage of transfer learning, the F1-measure, AUROC, AUPRC, and MCC of DGCPPISP are elevated by 8.4%, 6.3%, 19.1%, and 17.8% respectively. Further introduction of the second stage of transfer learning increases the performance of DGCPPISP on the four indicators by 6.0%, 0.8%, 1.3%, and 11.3% respectively, thereby demonstrating the effectiveness of transfer learning in our model.Table 3 Performance comparison of different stages of transfer learning on Dset_331The effect of different k-neighborhoodIn our model, dynamic graph convolution is employed to form a “dynamic graph” by constructing the neighborhood of the central node via a k-nearest neighbors algorithm based on the “feature distance” prior to the EdgeConv operation. Consequently, the size of the k setting determines the field of view of the EdgeConv operation and to a certain extent influences the model performance. To scrutinize the impact of neighborhood range on the model’s performance, we assign six diverse ranges of neighborhoods from small to large and document the performance of DGCPPISP on each metric. The minimum value of k is fixed at 1, implying that each central node has only a single neighborhood node. We illustrate the variance of four crucial metrics as a line graph in Additional file 1: Fig. S1. It is apparent that all four metrics achieve their optimal results at k = 10, and although the AUPRC exhibits a marginal increase at k = 40, overall, the values of the metrics trend downward as the neighborhood range contracts and expands. We deduce that when the k-neighborhood range is too minimal, the EdgeConv’s field of view is constrained, preventing the full exploration of relationships between the central node and other nodes (such as those long “feature distance” dependencies implied in the high-dimensional semantic space), which influences the effective feature extraction of DGCPPISP. When the k-neighborhood is excessively large, the neighborhoods of each node in the deep network may closely approximate each other, leading to the updated nodes exhibiting similarity in features and subsequently affecting the robustness of the model.Effect of different kernel size for Conv_encoderWithin the DGCPPISP model, we have integrated a Conv_encoder module, constructed via one-dimensional convolution, aimed at both elevating and reducing the dimensionality of features. To pinpoint an optimal convolution kernel size, we execute a comparative experiment encompassing disparate convolution kernel sizes, as presented in Additional file 1: Table S4. It is observed from the table that when the convolution kernel size of Conv_encoder is 3, it takes the lead in five metrics, thus being incorporated as the parameter setting in our model.Impact of different protein lengthA hyperparameter of our model, the dynamic neighborhood range k, is typically determined by the length of the protein. Hence, in this section, we maintain the k value at 10 (the superior parameter selected in Sect. 3.3.3) to scrutinize the influence of protein length on DGCPPISP. Utilizing Dset_186_72_PDB164 as a representative example, we partition the 70 proteins used for testing into seven non-overlapping intervals based on sequence length, ensuring each interval encompasses 10 proteins. The AUPRC of DGCPPISP is calculated across the seven intervals and subsequently compared with popular methods. These results are illustrated in Fig. 9. For the methods compared, GAT-PPI is a derivative of EGRET and essentially an EGRET version devoid of the aggregated edge feature. It is discernible that, although for DGCPPISP and other three methods, their AUPRC scores consistently decline as protein sequence length increases, the overall performance of DGCPPISP across the seven intervals outstrips other methods.Fig. 9AUPRC on various lengths of proteins (The x-axis represents different protein subsets according to length)Visualization analysisTo further appraise the capacity of DGCPPISP in PPI site prediction, we additionally furnish a visualization comparison of DGCPPISP and EGRET. Selecting four proteins (PDB ID: 1MAF, Chain F; PDB ID: 3D7V, Chain A; PDB ID: 3VDO, Chain B; PDB ID: 3W2W, Chain B, respectively) from Dset_186_72_PDB164’s test set for our experiment, we visualize the true PPI site (True), the PPI site predicted by EGRET (EGRET_pred), and the PPI site predicted by DGCPPISP (DGCPPISP_pred), as depicted in Additional file 1: Fig. S2. It is evident that the visualization results generated by DGCPPISP markedly surpass EGRET and align more closely with the true cases, particularly in Fig. 10, where EGRET identifies all residues of the 3VDO protein as PPI sites, starkly contradicting the true label, whereas DGCPPISP demonstrates superior precision on this protein.Fig. 10Visualization comparison of DGCPPISP and EGRET on PDB ID: 3VDO, Chain B. Note that the red area indicates the PPI site, and the blue area indicates the non-PPI siteFurthermore, we supply visual prediction results in the form of protein surfaces, selecting three proteins from Dset_331 for comparison with the true PPI site, the results of DGCPPISP, and EGRET. These results, displayed in Fig. 11 and Additional file 1: Fig. S3, further substantiate that our model’s results more closely resemble the true PPI site result.Fig. 11Surface visualization comparison of DGCPPISP and EGRET on PDB ID: 3L9F, Chain A, where purple indicates PPI sites and gray indicates non-PPI sites

Hot Topics

Related Articles