Hybrid framework for membrane protein type prediction based on the PSSM

Performance evaluation of deep learningThe parameters and structure of CapNet were optimized, namely, the OCNN (optimized capsule neural network) and the ICNN (improved capsule neural network). To further compare the performance of CapNet, the OCNN, and the ICNN in predicting membrane protein types, we evaluated the test datasets of Dataset 1, Dataset 2, and Dataset 3. Pre, Se, F-m and Mcc were used as the evaluation indices, and the results are shown in Tables 1, 2 and 3. Moreover, Fig. 2 displays the variations in prediction accuracy and loss based on Dataset3 for CapNet, the OCNN and the ICNN. The results indicate that, compared to those of CapNet, the OCNN model achieved prediction accuracy improvements of 4.58\(\%\) and 2.08\(\%\) on the validation and test sets, respectively. Furthermore, the ICNN model achieved prediction accuracy improvements of 6.67\(\%\) and 2.78\(\%\) on the validation and test sets, respectively. The result of the ICNN model increased, demonstrating that increasing the model depth can effectively enhance its generalizability. For Dataset 1, the greater the Se value is, the better the predictive performance of the model for positive samples. For different membrane protein categories, the fluctuation ranges of the Se evaluation metric for the CapNet, OCNN, and ICNN deep models were 0–96\(\%\), 33–96\(\%\), and 50–97\(\%\), respectively. The ICNN model exhibited relatively smaller fluctuations, indicating its overall better predictive performance for positive samples. The MCC can measure the performance of classifiers, and based on different models, the multispanning membrane protein class achieved the highest predicted values. Furthermore, the macroaverage and weighted average of the evaluation metrics for the ICNN model were greater than those for the CapNet and OCNN models.Figure 2Prediction performance of different deep learning models on Dataset3.For Dataset 2 and Dataset 3, F-m is a comprehensive performance evaluation metric for addressing sample class imbalance issues. The fluctuation ranges on Dataset 2 were 15–96\(\%\), 9–96\(\%\), and 14–97\(\%\) for the different models. Compared to that of CapNet, F-m of the ICNN improved by 2\(\%\), but there is still significant room for improvement regarding sample imbalance issues. The fluctuation ranges on Dataset 3 were 11–96\(\%\), 36–96\(\%\), and 37–96\(\%\). In all three datasets, single-spanning membrane protein Class III had the fewest categories compared to those of the others, resulting in lower performance metrics. Conversely, the multispanning membrane protein class has a greater number of samples, leading to better predictive accuracy and performance metrics. In conclusion, the ICNN demonstrates significant performance improvements across various categories, such as Class I single-spanning membrane proteins and multispanning membrane proteins, in Dataset 1, Dataset 2, and Dataset 3. Additionally, the OCNN achieved minor improvements over CapNet, while the ICNN exhibited improvements over both the OCNN and CapNet. Finally, there is still considerable room for improvement in the performance of lipid chain-anchored membrane proteins and GPI-anchored membrane proteins in Dataset 2.Table 1 Performance comparison for different deep learning models on Dataset1.Table 2 Performance comparison for different deep learning models on Dataset2.Table 3 Performance comparison for different deep learning models on Dataset3.Figure 3Prediction performance of different deep models based on different datasets.Figure 3 shows the mean and variance of Se, Sp, Mcc, F-m, OA and G-m for each membrane protein type under the CapNet, OCNN and ICNN models. The OA of the ICNN was higher than that of the other models, and the prediction performance for each type exhibited a smaller variance in different datasets. Second, the overall prediction performance of each type had an impact, the Mcc was lower, and the variance fluctuated greatly for each DL technique. Third, compared with that of the traditional feature description method based on local PSSMs, the OCNN model improved the prediction performance and alleviated the differences among the evaluation indices. In summary, the optimal results for the ICNN could be attributed to the following factors: (i) The model training time is increased after the batch_size is reduced and the convolution kernel size of the first convolution layer is increased, but the subsequent convolution operation achieves more effective features. (ii) By increasing the margin loss weight, the prediction accuracy of verification is improved effectively, while the loss value is reduced. (iii) A reduced dropout value not only avoids overfitting in the training process but also decreases the number of neural network parameters discarded and retains more valid feature information. (iv) The channel number of convolution layer three is decreased to better transmit information with the capsule layer.Performance evaluation of traditional machine learning methodsTo evaluate the impact of different classifiers on feature description methods, RF, KNN, light gradient boosting machine (LightGBM)17, SVM and XGBoost18 were compared based on independent test methods, and the parameters of each classifier were optimized via the grid search method. Each classifier successively predicts several classical feature description methods based on the PSSM: Dpc-PSSM, Rpm-PSSM19, Eedp-PSSM20, DF-PSSM21, Ab-PSSM, Smooth-PSSM, and TG-PSSM, and the results are shown in the blue histogram in Fig. 4.Figure 4Prediction performance of different feature extraction methods under different classifiers.Figure 5Prediction performance of different classifiers based on different datasets.The results of the blue histogram in Fig. 4 demonstrate that the average prediction accuracies of LightGBM and KNN for the above feature extraction methods were the best (92.59\(\%\), 87.83\(\%\) and 88.61\(\%\)) and the worst (90.41\(\%\), 81.71\(\%\) and 84.51\(\%\)), respectively. In addition, the Eedp-PSSM, Rpm-PSSM and TG-PSSM correspond to the Eedp-PSSM, PRSSM and TriPSSM description methods of Ref.22, respectively, but SVM rather than RF was adopted for prediction in this work. The results showed that the SVM was better than the RF based on the same feature description and parameter optimization of the classifier. The receiver operating characteristic (ROC) curve corresponding to each classifier is shown in Fig. 5. As shown in this figure, the same feature description method achieved different prediction performance under different classifiers, and the evaluation indices were positively correlated with the results in Fig. 4. Second, the orange histograms in Fig. 4 represent the prediction accuracies of XGBoost, KNN, LightGBM, SVM and RF for the corresponding meta-features. Although LightGBM was superior for each subfeature, it exhibited poor prediction accuracy in terms of its meta features. Based on the meta features, the prediction performance of LightGBM was lower than that of SVM, indicating that the generalizability of the model was weak. Moreover, for Dataset1, Dataset2 and Dataset3, SVM and RF, SVM and RF, and SVM and XGBoost achieved the best and worst performance, respectively, among the meta classifiers. Third, based on different classifiers, the prediction accuracy of meta features was generally greater than the average of the corresponding subfeature predictions for each classifier (the SVM obtained the highest increase of 4.03\(\%\), 5.76\(\%\) and 5.29\(\%\), respectively), indicating that meta features had a positive effect on improving the model prediction performance. Fourth, the same feature description methods yielded different prediction accuracies for different classifiers with the optimal parameters, indicating that different classifiers have complementary performance.Comparison between DL and MLTables 4, 5 and 6 present the performance differences between the TML and DL methods. These tables show that the prediction accuracies of the ICNN were 3.05\(\%\), 3.86\(\%\), 12.83\(\%\), 3.35\(\%\), 2.93\(\%\), 4.2\(\%\), and 1.92\(\%\) greater than those of the Dpc-PSSM, Eedp-PSSM, DF-PSSM, Kse-PSSM, Ab-PSSM, Rpm-PSSM, and TG-PSSM for Dataset1, respectively. For Dataset2, the ICNN model results were 3.61\(\%\), 3.91\(\%\), 12.18\(\%\), 2.78\(\%\), 4.21\(\%\), 5.14\(\%\) and 1.66\(\%\) greater than those of Dpc-PSSM, Eedp-PSSM, DF-PSSM, Kse-PSSM, Ab-PSSM, Rpm-PSSM, and TG-PSSM, respectively. The ICNN model outperformed the Dpc-PSSM, Eedp-PSSM, DF-PSSM, Kse-PSSM, Ab-PSSM, Rpm-PSSM, and TG-PSSM models by 4.21\(\%\), 5.51\(\%\), 12.28\(\%\), 4.21\(\%\), 4.39\(\%\), 5.34\(\%\) and 2.69\(\%\), respectively, for Dataset3. In summary, based on the PSSM, the results indicate that DL is better than TML. In addition, the ICNN can better extract important features in the PSSM to improve membrane protein type prediction.Table 4 Comparison of the prediction results for different PSSM-based feature description methods on Dataset1 (\(\%\)).Table 5 Comparison of the prediction results for different PSSM-based feature description methods on Dataset2 (\(\%\)).Table 6 Comparison of the prediction results for different PSSM-based feature description methods on Dataset3 (\(\%\)).Among the many feature description methods based on the PSSM, the DF-PSSM and TG-PSSM achieved the worst and best prediction accuracies, respectively. However, the TG-PSSM increased the classifier complexity because it produced the highest dimensional eigenvectors (8000-D). Furthermore, to further compare the performance of the models, the Se, Mcc, F-m, OA, area under the receiver operating characteristic (AUROC) curve and area under the precision-recall (AUPR) curve were also used to compare and analyze the TG-PSSM and ICNN, and the detailed results are shown in Table 7.Table 7 Comparison of the local PSSM and ICNN based on different datasets.Table 7 shows that the prediction accuracies of the ICNN were 1.91\(\%\), 1.55\(\%\) and 2.69\(\%\) higher than those of the TG-PSSM for Dataset1, Dataset2 and Dataset3, respectively. Second, for Dataset1, Dataset2 and Dataset3, the AUROC and AUPR of the ICNN were 1.5\(\%\) and 1.46\(\%\), 2.04\(\%\) and 1.61\(\%\), and 2.17\(\%\) and 2.32\(\%\) lower than those of the TG-PSSM, respectively. Third, the Se, Mcc and F-m of the ICNN were higher than those of the TG-PSSM, indicating that the TG-PSSM can better relieve sample imbalance and that they have some complementary effects. To intuitively validate the effectiveness of the model, t-distributed Stochastic Neighbor Embedding (t-SNE) was employed to visualize the reduced dimensions of the meta-features of the ICNN model. The ICNN model takes PSSM as input and is designed with an 8-layer deep neural network structure, comprising 3 convolutional layers, 3 pooling layers, and 2 capsule layers for feature enrichment. Specifically, to preserve more effective features and leverage the feature learning capability of capsule layers, the primary capsule layer is set with 8 capsules. The final prediction probabilities are generated through the output layer. t-SNE primarily utilizes the Kullback-Leibler divergence (KL divergence) to measure the difference between the conditional probability distributions in high-dimensional space and the Student-t distribution in low-dimensional space. It employs gradient descent to minimize the sum of KL divergences across all data points. After optimization, t-SNE outputs the positions of each data point in three-dimensional space, as illustrated in Fig. 6.
Figure 6t-SNE based on the ICNN model.Performance of the hybrid meta modelIn view of the diversity of the same feature extraction method for different classifiers in Sect. “Architecture of the proposed ICNN”, the meta feature adds different classifier outputs. Therefore, the feature dimensions were increased from 64-D to 328-D, and the results are shown in Table 8. Compared with that of the other classifiers, the SVM achieved the best predictive performance except for Dataset1. Combined with the results in Sect. “Evaluation measurements”, these results indicate that the SVM achieved better performance for processing meta features. Second, on account of the dataset, the F-m corresponding to KNN was highest. The metrics were 0.49\(\%\), 7.29\(\%\), 9.41\(\%\) and 9.45\(\%\); 9.55\(\%\), 12.06\(\%\), 15.89\(\%\) and 23.7\(\%\); and 6.26\(\%\), 4.39\(\%\), 16.69\(\%\) and 19.24\(\%\) greater than those of SVM, RF, LightGBM and XGBoost, respectively. The higher the value is, the less the presence of class imbalance. Third, the Se of the different models were lower than the Sp, indicating that the prediction accuracy of the hybrid model was lower than the error detection performance. Fourth, the meta features of the hybrid classifier improved the prediction accuracy compared with that of the meta features of a single classifier, and their average increased by 1.01\(\%\), 1.22\(\%\) and 0.12\(\%\). However, the highest prediction performance of the mixed classifier was still lower than that of the meta features of a single classifier.Table 8 Comparison of classifiers in hybrid meta features based on different datasets (\(\%\)).Comparison of different ensemble strategiesTo improve the prediction capability of the model, we constructed a new hybrid learning framework based on TML and DL. In this framework, we selected the top 1 to top 9 new feature vectors (8-72 D, where D represent dimensions) in descending order based on the Acc evaluation metric of each feature description method. While combination strategies can improve the stability of the model and avoid local optima, different combination strategies may result in different performance improvements. Therefore, we compared and analyzed the predictive performance of three combination strategies (majority voting, averaging, and stacking). The detailed comparison results are shown in Fig. 7, where the horizontal axis represents the nine selected new feature vectors. For Dataset1, the horizontal axes from 1 to 9 represent the meta features corresponding to the ICNN, TG-PSSM, Ab-PSSM, Dpc-PSSM, Kse-PSSM, Pse-PSSM, Eedp-PSSM, Rpm-PSSM, and DF-PSSM, respectively. For Dataset2, the horizontal axes from 1 to 9 represent the meta features corresponding to the ICNN, TG-PSSM, Kse-PSSM, Pse-PSSM, Dpc-PSSM, Eedp-PSSM, Ab-PSSM, Rpm-PSSM, and DF-PSSM, respectively. For Dataset3, the horizontal axes from 1 to 9 represent the meta features corresponding to the ICNN, TG-PSSM, Dpc-PSSM, Kse-PSSM, Ab-PSSM, Pse-PSSM, Rpm-PSSM, Eedp-PSSM, and DF-PSSM, respectively.Figure 7Comparison and analysis of different combination strategies.The results indicate that the performance of the average stacking method surpassed that of majority voting and averaging. First, for Dataset1, based on Dataset1, when the new feature vectors were selected as 4, 4, and 1, the stacking, majority voting, and average scoring combination strategies achieved the best prediction results, with accuracies of 95.85\(\%\), 95.66\(\%\), and 95.92\(\%\), respectively. Second, for Dataset2, when the new feature vectors were selected as 2, 1, and 1, the stacking, majority voting, and average scoring combination strategies achieved the best prediction results, with accuracies of 92.04\(\%\), 90.64\(\%\), and 90.51\(\%\), respectively. Finally, for Dataset3, when the new feature vectors were selected as 5, 1, and 2, the stacking, majority voting, and average scoring combination strategies achieved the best prediction results, with accuracies of 93.15\(\%\), 92.76\(\%\), and 93.37\(\%\), respectively.Table 9 presents a performance comparison of the best new eigenvectors under different combination strategies. First, Sp was greater than Se, indicating that our model’s diagnostic performance for negative samples was greater than that for positive samples. Higher values of both indicate better model performance. Second, under the optimal combination strategy, the hybrid model outperformed the best classical feature, TG-PSSM, by 2.18\(\%\), 3.10\(\%\), and 3.30\(\%\) across the three datasets and outperformed the best deep learning model, ICNN, by 0.26\(\%\), 1.55\(\%\), and 0.61\(\%\). Again, this finding demonstrates that the hybrid model can enhance the prediction performance. Third, across Dataset1, Dataset2, and Dataset3, majority voting exhibited the lowest prediction accuracy. These values were lower than those of the stacking method by 0.19\(\%\), 1.42\(\%\), and 0.39\(\%\) and lower than those of the averaging method by 0.26\(\%\), 0.13\(\%\), and 0.61\(\%\), respectively. This finding is attributed to the fact that stacking and averaging methods employ different strategies to relearn meta features. Although majority voting is the simplest approach, it may overlook discriminative minority outputs.
Table 9 Comparison of different combination strategies (\(\%\)).In Fig. 8, subplots A, B, C, and D represent the comparison of prediction accuracies for different feature description methods and the different evaluation metrics corresponding to each membrane protein type under the best hybrid models based on Dataset1, Dataset2, and Dataset3. Under the optimal combination strategy, the hybrid model outperformed the best feature description methods in traditional machine learning methods by 2.18\(\%\), 3.1\(\%\), and 3.3\(\%\) and surpassed the worst feature description methods in traditional machine learning methods by 13.09\(\%\), 13.62\(\%\), and 12.89\(\%\), respectively. These data once again validate the effectiveness of our model. In the box plot, a significant fluctuation in the F-m score was observed compared to the other evaluation metrics, which is attributed to the relatively low sample count of the single-span type III evaluation metric compared to that of the others. Compared to the F-m scores of the DL models, the optimal hybrid model improved the F-m score by 4.84\(\%\), 6.5\(\%\), and 3.12\(\%\), respectively. Overall, the results once again demonstrate that the proposed hybrid model not only exhibits better predictive performance but also improves prediction accuracy, addressing the issue of sample imbalance.Figure 8Performance comparison of different feature description methods.Comparison and analysis with other modelsTo verify the effectiveness of the proposed model, we compared it with other existing methods. To ensure fairness, the methods used for comparison were all based on the same data and validation methods. These methods include PsePSSM_Ensemble23, Physicochemical-Ensemble24, PsePSSM-LLDA25, PsePSSM-DC26, PsePSSM-PCA25, FEA-Fusion27, the sequence information model (SIM)22, CapNet22, recurrent neural network (RNN)15, Ave-WT16, and MKSVM-HSIC16. Most of these methods are based on the PSSM for feature extraction. The results are reported in Table 10.Table 10 Comparison of different methods on three datasets.On Dataset1, the performance values of the OCNN were 0.2\(\%\) and 13.9\(\%\) greater than those of the traditional classical algorithm (FEA-Fusion) and the worst algorithm (PsePSSM-PCA), respectively, and 0.5\(\%\) and 3\(\%\) greater than those of the RNN and SIM, respectively. The performance values of the ICNN method were 1.3\(\%\) and 15\(\%\) better than those of the best (FEA-Fusion) and worst (PsePSSM-PCA) TML methods and 1.6\(\%\) and 4.1\(\%\) greater than those of the RNN and SIM, respectively. Second, the gaps between the validation set and the CapNet, SIM, and ICNN test sets were 3.5\(\%\), 6.1\(\%\) and 0.9\(\%\), respectively. Compared with those of the CapNet and SIM models, those of the ICNN were lower by 2.6\(\%\) and 5.2\(\%\), respectively, indicating that the model has good generalizability. Third, based on Dataset2, the performance values of the OCNN were 0.7\(\%\) and 12.2\(\%\) greater than those of the best model (MKSVM-HSIC) and the worst model (PsePSSM_Ensemble), respectively, and 1.1\(\%\) and 1.4\(\%\) greater than those of the RNN and SIM of DL, respectively. Fourth, compared with the performance of the other algorithms, the prediction performance of the hybrid model was the best. Although the calculations of the hybrid model are relatively complicated, the meta feature dimension of the final construction is relatively low, reducing the complexity of the calculation of the classifier. Furthermore, based on Dataset2, the prediction accuracy of the hybrid model was 2.5\(\%\) higher than that of the ICNN.

Hot Topics

Related Articles