Prediction of protein secondary structure by the improved TCN-BiLSTM-MHA model with knowledge distillation

Evaluation metricsIn this article, three primary evaluation metrics were used to assess the performance of our protein secondary structure prediction models: the segment overlap measure 99 (SOV99), ACC (accuracy), and MiAUC (microaveraged area under the curve). Each of these metrics provides a different perspective on the model’s performance, offering a comprehensive evaluation.

(a)

SOV99

The SOV9936 is a widely used metric for evaluating the accuracy of protein secondary structure predictions. SOV99 is a specific variant that considers the overlap between predicted and actual segments of secondary structure elements. The main differences between SOV and SOV99 are the normalization procedures and the definition of the degree of variation in the fragment boundaries. SOV99 improves upon the original SOV by normalizing the values to be calculated based on the fragments assigned to the reference, lowering the prediction score to accurately reflect the paired nature of the segment comparison so that the score ranges from 0 to 100%, facilitating direct comparison with other evaluation metrics. In addition, SOV99 places further restrictions on the degree of variation of fragment boundaries so that the variation range cannot exceed half of that of the shorter fragment, thereby more strictly identifying similar and dissimilar fragment distributions. These improvements improve the accuracy and usefulness of SOV99 in protein secondary structure prediction.The 8-state secondary structures are H, G, I, E, B, S, T, and C, and the 3-state secondary structures are H, E, and C. This article uses ${\text{s}}_{1}$ and ${\text{s}}_{2}$ to denote segments of the secondary structure in the conformational state $i$. The segments ${\text{s}}_{1}$ and ${\text{s}}_{2}$ correspond to the two secondary structure assignments being compared. The first assignment is considered a reference and is typically based on experiments; the second assignment is the one being evaluated. The two assignments are further referred to as “observed” and “predicted,”, respectively. ${\text{s}}_{1} ,{\text{ s}}_{2}$ denotes a pair of overlapping segments, $S\left( i \right)$ denotes the set of all the overlapping pairs of segments $\left( {{\text{s}}_{1} ,{\text{ s}}_{2} } \right)$ in state $i$, and $S^{\prime}\left( i \right)$ denotes the set of all segments $s_{1}$ for which there is no overlapping segment ${\text{s}}_{2}$ in state $i$, as shown in Eqs. (23), (24):$$ S\left( i \right) = \left\{ {\left( {s_{1} ,s_{2} } \right):s_{1} \cap s_{2} \ne \emptyset } \right.{, }s_{1} \;{\text{and}} \;s_{2} \;{\text{are}}\;{\text{ both}}\;{\text{ in }}\;{\text{conf}}\left. {{\text{ormational }}\;{\text{state\,i}}} \right\} $$
(23)
$$ S^{\prime}\left( i \right) = \left\{ {s_{1} :\forall s_{2} ,s_{1} \cap s_{2} = \emptyset } \right.{, }\;s_{1} \;{\text{and}}\;s_{2} \;{\text{are}}\;{\text{ both}}\;{\text{ in }}\;{\text{conformational}}\;{\text{ stateij}} $$
(24)
Sov is a metric based on the ratio of overlapping segments, which is defined as:$$ SOV = 100 \times \left[ {\frac{1}{N}\mathop \sum \limits_{i} \mathop \sum \limits_{S\left( i \right)} \frac{{{\text{minov}}\left( {s_{1} ,s_{2} } \right) + \delta \left( {s_{1} ,s_{2} } \right)}}{{{\text{maxov}}\left( {s_{1} ,s_{2} } \right)}} \times {\text{len}}\left( {s_{1} } \right)} \right] $$
(25)
As shown in Eq. (25), len $\left( {{\text{s}}_{1} } \right)$ is the number of residues in segment $s_{1}$, minov $\left( {s_{1} ,s_{2} } \right)$ is the length of the actual overlap of $s_{1}$ and ${\text{s}}_{2}$,${\text{maxov}}\left( {{\text{s}}_{1} ,{\text{ s}}_{2} } \right)$ is the total extent for which either of the segments ${\text{s}}_{1}$ and ${\text{s}}_{2}$ has a residue in state $i$, and $\delta \left( {{\text{s}}_{1} ,{\text{ s}}_{2} } \right)$ is defined as Eq. (26):$$ \delta \left( {{\text{s}}_{1} ,{\text{ s}}_{2} } \right) = {\text{min}}\left\{ {\begin{array}{*{20}c} {\left( {{\text{maxov}}\left( {{\text{s}}_{1} ,{\text{ s}}_{2} } \right) – {\text{minov}}\left( {{\text{s}}_{1} ,{\text{ s}}_{2} } \right)} \right)} \\ {minov\left( {{\text{s}}_{1} ,{\text{ s}}_{2} } \right)} \\ {int\left( {{\text{len}}\left( {{\text{s}}_{1} } \right)/2} \right);int\left( {{\text{len}}\left( {{\text{s}}_{2} } \right)/2} \right)} \\ \end{array} } \right. $$
(26)
where ${\text{min}}\left[ {x1;x2;x3; \ldots ;xn} \right]$ is the minimum of $n$ integers. The normalization value $N{ }$ is defined as Eq. (27):$$ N\left( i \right) = \mathop \sum \limits_{i} \left( {\mathop \sum \limits_{S\left( i \right)} {\text{len}}\left( {s_{1} } \right) + \mathop \sum \limits_{S\left( i \right)} {\text{len}}\left( {s_{1} } \right)} \right) $$
(27)
SOV99 takes into account the continuity of the predicted segments, not just the individual residues. This is crucial for protein structure predictions because it emphasizes the correct prediction of entire secondary structure segments rather than just individual amino acids. A higher SOV99 indicates that the predicted segments align well with the true segments in terms of their boundaries and lengths.

(b)

ACC

Accuracy is a basic metric that measures the proportion of correctly predicted secondary structure elements (such as alpha-helices, beta-strands, and coils) out of the total number of elements. In this article, accuracy (Q8) and accuracy (Q3) are used to measure the goodness of fit of the model. The ACC (Q3) and ACC (Q8) are the ratios of the number of correctly numbered residues predicted to be the number of all the residues S, which are defined as Eqs. (28), (29):$$ {\text{ACC}}\left( {{\text{Q}}_{3} } \right) = \frac{{S_{C} + S_{E} + S_{H} }}{S} \times 100 $$
(28)
$$ {\text{ACC}}\left( {{\text{Q}}_{8} } \right) = \frac{{S_{H} + S_{G} + S_{I} + S_{E} + S_{B} + S_{C} + S_{T} + S_{S} }}{S} \times 100 $$
(29)
where ${\text{S}}_{i} \left( {i\, \in \,\left\{ {{\text{H}},{\text{E}},{\text{C}}} \right\}{\text{or}}\left\{ {{\text{H}},{\text{G}},{\text{I}},{\text{E}},{\text{BC}},{\text{T}},{\text{S}}} \right\}} \right)$ denotes the correct number of individual types i predicted. The accuracy provides an overall measure of how well the model predicts the correct secondary structure elements. It is a straightforward and intuitive metric, but it does not distinguish between different types of errors or account for the balance between classes.

(c)

MiAUC

The microaveraged area under the curve (MiAUC) is an important indicator for evaluating the performance of multiclassification models. The area under the ROC curve (AUC) is calculated based on microaveraging, which is particularly useful for dealing with multiclass classification problems. The MiAUC can provide a more comprehensive model performance evaluation by integrating the prediction results of all categories to calculate the overall AUC value. For each class k, the true positive rate and false positive rate are calculated as shown in Eq. (30):$$ \begin{aligned} TPR_{k} & = \frac{{TP_{k} }}{{TP_{k} + FN_{k} }} \\ FPR_{k} & = \frac{{FP_{k} }}{{FP_{k} + TN_{k} }} \\ \end{aligned} $$
(30)
Next, microaveraging is used to calculate the overall TPR and FPR. The true positives, false positives, true negatives, and false negatives of all categories are summed, and then the microaveraged TPR and FPR are calculated, as shown in Eq. (31):$$ \begin{aligned} TPR_{{\text{micro }}} & = \frac{{\mathop \sum \nolimits_{k} TP_{k} }}{{\mathop \sum \nolimits_{k} \left( {TP_{k} + FN_{k} } \right)}} \\ FPR_{{\text{micro }}} & = \frac{{\mathop \sum \nolimits_{k} FP_{k} }}{{\mathop \sum \nolimits_{k} \left( {FP_{k} + TN_{k} } \right)}} \\ \end{aligned} $$
(31)
The ROC curve is plotted based on the microaveraged TPR and FPR values, as shown in Eq. (32). The area under the microaveraged ROC curve was calculated using the numerical integration method to obtain the MiAUC:$$ {\text{MiAUC }} = \mathop \smallint \limits_{0}^{1} TPR_{{\text{micro }}} \left( {FPR_{{\text{micro }}} } \right)d\left( {FPR_{{\text{micro }}} } \right) $$
(32)
The MiAUC is an important and commonly used indicator for comprehensively evaluating the performance of classification models, especially when dealing with imbalanced datasets.Experimental results and discussionTo understand the validity of the model results in this article, the experiments in this article were performed together on six datasets: TS115, CB513, CASP13, CASP14, CASP15 and PDB (2018–2020). The final model proposed in this article is the distillation-improved TCN-BiLSTM-MHA model mentioned in section “Methods”. Moreover, to verify the validity of the structure and combinations, this article uses multiple modeling approaches for comparison. The results of the models on the TS115, CB513 and PDB (2018–2020) datasets are shown in Table 2.Table 2 Comparative three-state performance results of various models on the TS115, CB513, and PDB (2018–2020) datasets.For the TS115 and CB513 data, the ACC (Q8 and Q3), MiAUC (Q8 and Q3) and SOV99 (Q3) data were used. Due to the large amount of PDB (2018–2020) data, it was used as a training set to verify and compare the results of CASP13, CASP14, and CASP15, and ACC (Q8 and Q3), MiAUC (Q8 and Q3) and SOV99 (Q3) data were collected. The bold text indicates the best performance, and the italics text indicates the second-best performance. The results of all the tables in this section are expressed as follows.

(a)

BiLSTM

Compared with the final model of this article, only BiLSTM is used as the prediction model, which is also the baseline model of this article. Capturing dependencies in both the forward and backward directions to improve contextual understanding in sequence prediction.

(b)

BiLSTM-MHA

Compared with (a), a multi-head attention mechanism is added to the BiLSTM model, allowing the model to focus on different parts of the sequence at the same time, thereby performing more detailed feature extraction.

(c)

Improved TCN-BiLSTM-MHA

Using the improved TCN model proposed in section “Improved TCN” of this article combined with BiLSTM and a multi-head attention mechanism, compared with the final model of this article, this model does not adopt knowledge distillation.

(d)

TCN-GRU

This model combines the TCN model mentioned above with the GRU. Unlike the final model, only the basic TCN is used as the feature extraction model, and the GRU is used as the prediction model.

(e)

BiTCN-BiLSTM-MHA

The bidirectional TCN and BiLSTM are combined with a multi-head attention mechanism. Compared with the final model in this article, its feature extraction model only uses a bidirectional TCN instead of an improved TCN and does not adopt a knowledge distillation method.

(f)

Distillation-TCN-GRU

Compared with (d), knowledge distillation technology is used in the TCN-GRU model.

(g)

Distillation-BiTCN-BiLSTM-MHA

Compared with (e), knowledge distillation technology is used on the BiTCN-BiLSTM-MHA model.Table 2 shows that the distilled model generally works better than the model without distillation; at the same time, as the structure of the model in this article becomes more complex, the model works better, which on average will be 1–2% better than the structure of the previous layer; at the same time, all of the models will be better than the baseline.For the three-state structure data, distillation had good results on both TS115 and CB513, suggesting that distillation will have some beneficial effect on three-state structure data on a certain small dataset. However, by comparison, the enhancement in the PDB (2018–2020) dataset may not be obvious, but the effect on the PDB (2018–2020) dataset is consistently better. Therefore, this article speculates that the first reason may be that the PDB (2018–2020) dataset has already obtained better results on the undistilled modeling method, so there is not much room for improvement; the second reason may be because the PDB (2018–2020) dataset is larger than CB513 and TS115 and may not obtain the same good results on the corresponding small datasets for the large datasets. The eight-state prediction results are shown in Table 3:Table 3 Comparative three-state performance results of various models on the TS115, CB513, and PDB (2018–2020) datasets.According to the eight-state structure data, distillation had a good effect on TS115 and a significantly more pronounced effect on CB513 according to the best model presented in this article. For the predicted eight-state structure, the smaller the dataset is, the more effective the distillation may be for this enhancement.Compared with BiLSTM, the final model in this article has the following maximum improvements: an 8.1% improvement in ACC (Q3) on the CB513 dataset, a 1.0% improvement in MiAUC (Q3) on the TS115 dataset, and a 25.9% improvement in SOV99 (Q3) on the TS115 dataset for three-state structure data. For the eight-state structure data, the final model achieves a 7.9% improvement in ACC (Q8) on the CB513 dataset and a 1.0% improvement in MiAUC (Q8) on the TS115 dataset. These comparisons highlight the significant enhancements achieved by the final model across different datasets and metrics.The results of using the extracted PDB (2018–2020) data as the training set and the three-state structure data of CASP13, CASP14, and CASP15 as the test set are shown in Table 4.Table 4 Performance comparison of various models on the CASP13, CASP14, and CASP15 datasets.Table 4 shows that the performance of the distillation-improved TCN-BiLSTM-MHA model on different datasets is generally better than that of the BiLSTM model. On the CASP13 dataset, the distillation-improved TCN-BiLSTM-MHA model improved the ACC (Q3) by 1.1%, the MiAUC (Q3) by 0.8%, and the SOV99 (Q3) by 17.1%. On the CASP14 dataset, the SOV99 (Q3) improved by 18.8%. On the CASP15 dataset, the ACC (Q3) was 1.3%.Ablation studyTo further study the effectiveness of the model, relevant ablation experiments were performed, as shown in Fig. 6a, b, which show the performances of BiLSTM, BiLSTM-MHA, the improved TCN-BiLSTM-MHA and the distillation-improved TCN-BiLSTM-MHA on different datasets of eight-state and three-state structures of proteins. According to the ablation experimental study, each added module improved the model performance, indicating the validity of the model structure.Figure 6Ablation study: (a) comparison of ablation study results of different datasets on three-state structure; (b) comparison of ablation study results of different datasets on eight-state structure; (c–f) visualization of t-sne dimensionality reduction of BiLSTM, BiLSTM-MHA, improved TCN-BiLSTM-MHA and distillation-improved TCN-BiLSTM-MHA learned embedding features.Moreover, to better understand the complementarity of each functional module, taking the TS115 dataset as an example, the embedded representations learned by these four models are visualized by t-distributed stochastic neighbor embedding (t-SNE) after dimensionality reduction, as shown in Fig. 6c–f:In this article, it was found that both eight-state and three-state structures had better results on different datasets as the complexity of the model increased; on the other hand, the results of the t-sne dimensionality reduction visualization showed that the BiLSTM model alone could not extract the features very well, and as the model’s complexity increased, the features of the protein’s amino acid sequences could be extracted in a better way to obtain a more effective modeling result.Comparison with other methods

(a)

Comparison with advanced algorithms

To better analyze the effectiveness of the proposed model in this article, the proposed model is compared with several state-of-the-art algorithms. Next, this article will briefly introduce the models in this chapter. CNN-LSTM and the Transformer are introduced in section “Advanced algorithms”.MLPRNN37: The model consists of two multilayer perceptrons (MLPs) and a two-layer stacked bidirectional gated recurrent unit (BGRU). MLPRNN uses two types of input features: position-specific score matrix (PSSM) and hidden Markov model (HMM) features. These features are input into the first MLP block, which expands the input dimension from 41 to 512. Then, these expanded features are input into the BiGRU block to capture the long-range dependencies in the sequence. Finally, the output of the BiGRU passes through an MLP block again, which reduces the number of dimensions to 9 and is predicted through the Softmax layer.GAN-BiRNN: The application of a GAN to protein secondary structures was proposed by Jin et al.38. The generator captures the complex features of protein sequences by combining one-dimensional convolution and multiscale convolution. The input features are the one-dimensional encoding and PSSM (position-specific score matrix) of the protein sequence, and the discriminator is used to determine whether the input secondary structure data are real or generated. The input is a combination of the output features of the generator and the real secondary structure data. This article adopts the GAN model and uses three layers of BiLSTM and BiGRU to make predictions to obtain GAN-BiRNN.This article applies these advanced algorithmic changes in the framework of the model proposed in this article, and the structure of the data preprocessing adopted is still the one-hot coding and the physicochemical properties of the features for word2vec segmentation. The results are shown in Table 5:Table 5 Comparison between the model method and advanced algorithms in this article.Compared with other models on the same dataset, the model in this article achieves significant performance improvements in terms of the ACC. For the TS115 three-state structure data, the model in this article has a maximum improvement of 4.7% (Transformer), and for its eight-state structure data, it has a maximum improvement of 3.6% (Transformer); for the CB513 three-state structure data, the model in this article has a maximum improvement of 7.0% (Transformer); for its eight-state structure data, it has a maximum improvement of 5.9% (Transformer); for the PDB (2018–2020) three-state structure data, this model has a maximum improvement of 2.9% (Transformer); and for its eight-state structure data, it has up to 2.6% improvement (Transformer).On the other hand, this paper randomly selected 6 protein sequences with low homology sequences in the first half of 2024 from the CAMEO dataset39 and used Stride to predict the secondary structure sequence of their pdb files, obtaining the test set CAMEO-H (2024) of this paper. The training set is still PDB (2018–2020). This paper used Alphafold2 and the final model of this paper to predict these parameters, and the results are shown in Fig. 7:Figure 7Results of our model and Alphafold2 on the CAMEO-H (2024) test set.According to Fig. 7, the results of this paper are as follows. The ACC (Q3) and SOV99 (Q3) of our model on the CAMEO-H (2024) test set are 48.2% and 42.5%, respectively. The percentages of Alphafold2 in ACC (Q3) and SOV99 (Q3) were 77.9% and 66.3%, respectively.

(b)

Case study

In addition, this paper uses PyMOL to visualize and compare the single-sequence secondary structure data ultimately output by the model and the secondary structure of the original modified sequence to better demonstrate the advantages of the selected comparison model. The visualization data in this paper were selected from protein ID: 1L58 and ID: 1VCA in the RCSB PDB (https://www.rcsb.org/). The prediction models selected are the Transformer, GAN-BiRNN, AlphaFold2, and RGN2 models mentioned in section “Advanced algorithms” and the final model of this paper. In addition, the results of the online protein predictor PSIPRED (http://bioinf.cs.ucl.ac.uk/psipred) were also used. PSIPRED provides a variety of protein structure prediction methods.The case visualization results are shown in Fig. 8, where cyan blue represents the α-helical structure (H), white represents the irregularly curled structure (C), and fuchsia represents the β-folded structure (E).Figure 8Case study: (a) visualization of the amino acid sequence of 1L58 on different models; (b) visualization of the amino acid sequence of 1VCA on different models.As shown in Fig. 8, this article shows that there are better results in the case visualization of the model proposed in this article; on the other hand, Alphafold2 and Rgn2 can also reflect the excellent results of the case visualization and analysis, and this article has the reason that the model proposed in this article has a certain degree of validity; at the same time, the more advanced protein language model can be used well in this problem.

(c)

Comparison with other articles

To explore the effectiveness of the model and structure of this article, not only PDB (2018–2020) data but also two classical datasets, TS115 and CB513, are used; therefore, this article reviews the literature on the accuracy and accuracy of Q8 and Q3 on these two datasets to compare the methods used to optimize the model of this article. Next, this article will briefly introduce these models:Multiple classifiers40: this model processes high-dimensional protein primary sequence data through a two-stage feature selection technique to improve the accuracy of protein secondary structure prediction. The first stage uses unsupervised autoencoders for feature extraction, while the second stage combines three feature selection methods, namely, universal univariate selection, recursive feature elimination, and the Pearson correlation coefficient. By combining these feature selection methods, the model can select the optimal feature subset. Finally, the data based on the selected feature subset are classified using a random forest, decision tree, and multilayer perceptron to achieve protein secondary structure prediction.MCCM41: this model achieves prediction through multilevel feature extraction, a combined classifier module, and a sample difficulty discrimination module. Specifically, the model first introduces a feature extraction module to extract features of different difficulty levels from the data. Then, two classifiers are designed to process simple and difficult samples, where the loss values of difficult samples are weighted to improve the prediction performance of difficult samples. Finally, a sample difficulty discrimination module is designed based on the Dirichlet distribution and information entropy measurement to assign samples to the above classifiers for learning.AttSec42: this model uses the Transformer architecture to capture local features between amino acids through a self-attention mechanism. Specifically, the AttSec model first extracts self-attention maps through a multilayer transformer encoder, which represents pairwise features between amino acid embeddings. These pairwise features are then processed through a two-dimensional convolutional block to detect local patterns. Finally, these local features are converted to one-dimensional features and classified through a fully connected layer to achieve protein secondary structure prediction.NetSurfP-3.043: this model uses a pretrained protein language model (such as ESM-1b) to generate sequence embeddings and combines a one-dimensional convolutional neural network (1D CNN) and a bidirectional long short-term memory network (BiLSTM) for feature extraction and prediction.The specific results are shown in Table 6:Table 6 Comparison between the results of our study and those of other studies.According to the research of existing scholars, the model in this article has better prediction accuracy for Q8, but the prediction accuracy for Q3 may not be sufficient. For the TS115 three-state structure data, the model in this article has a maximum improvement of 5.7% (NetSurfP-3.0), and for its eight-state structure data, it has a maximum improvement of 12.3% (NetSurfP-3.0). For CB513 three-state structure data, the model in this article has a maximum improvement of 11.3% (multiple classifiers), and for its eight-state structure data, it has a maximum improvement of 1.2% (MCCM).Additionally, the model in this article uses a less hierarchical structure and simpler construction method. Compared with previous models that use multiple sets and methods, the model in this article is more lightweight and yields relatively excellent results.

(d)

Discussion of the results

Compared with all the models mentioned in (a-b) of this section, due to limitations in the size of the data, it is difficult for the model in this article to achieve a significantly better model effect than other models. However, it is worth noting that the experimental results obtained in section “Comparison with other methods”(a) show that if different prediction models are used under the same dataset and data preprocessing method, the model in this article can obtain better results.On the other hand, this article uses a knowledge distillation algorithm that aims to reduce the computational burden of complex models. Through knowledge distillation, our model learns rich feature representations from large pretrained models while maintaining low computational complexity. This technology not only improves the predictive performance of the model but also significantly reduces the number of parameters in the model, making the model easier to deploy and apply. This article notes that most protein models have problems with large parameters and difficulty in deployment, and our model has made important improvements in this regard.Compared with other models, the shortcoming of this article is that it does not use more data, and the effect of the model is still limited. However, under the existing datasets and experimental conditions, our model demonstrates excellent performance and high prediction accuracy. Specifically, taking the largest improvement as an example, in the three-state structure prediction of the TS115 dataset, our model has improved by up to 4.7% compared to other models, and in the eight-state structure prediction, it has improved by up to 3.6% compared to other models. In the three-state structure prediction of the CB513 dataset, our model improved by up to 7.0% compared to the other models, and in the eight-state structure prediction, it improved by up to 5.9% compared to the other models. In the three-state structure prediction of the PDB (2018–2020) dataset, in the prediction of the state structure, the model in this article has a maximum improvement of 2.9% compared to the other models, and in the prediction of the eight-state structure, it has a maximum improvement of 2.6% compared to the other models.Although the limitation of data volume has an impact on the model effect, the innovations in the design and algorithm improvement of the model in this article have made significant progress in terms of prediction performance. Our model has powerful capabilities, especially for processing high-dimensional features and capturing complex dependencies in sequences. In addition, the model’s lightweight design and efficient computing architecture give it greater advantages in practical applications.In short, despite the limitations in data volume, the model in this article performs well in protein secondary structure prediction through clever design and advanced algorithms and has high application value and broad application prospects.Parameter sensitivity analysisThis article proposes Distillation-Improved TCN-BiLSTM-MHA as the final model, which is composed of an improved TCN, BiLSTM-MHA and knowledge distillation, while the improved TCN is divided into three main modules: (1) the TCN model; (2) multimodal fusion; and (3) forward and backward propagation. To show the relationship between performance and the number of scales, this article sets five scales [1], [1,9][1,9,81], [1,9,81,729], and [1,9,81,729,6561] to select the optimal multiscale. At the same time, in the distillation model integrating teacher and student losses for backpropagation and optimization, the alpha coefficients are set, and this article sets the alpha coefficients to 0.1, 0.2, 0.3, 0.4, and 0.5 to determine the optimal alpha coefficients and test the parameter sensitivity. Here, this article chooses the eight-state structure of the Ts115 dataset as the predictor because it has more prediction targets and a smaller dataset, which will be more sensitive to parameter changes.As shown in Fig. 9, the model results are obtained in which the scale and alpha coefficients change within the range. This article found that when the scale is [1,9], the change in the alpha coefficient is the most obvious, and the change in Q8 is in the alpha range and reaches a maximum when the coefficient is 0.5; notably, when the scale is 1, the Q8 accuracy is less affected by the alpha coefficient, and its prediction accuracy is the lowest. Therefore, based on sensitivity analysis, this article found that the model fitting effect is best when the scale is [1,9,81,729,6561] and the alpha coefficient is 0.2. Therefore, the scale fixed in this article is [1,9,81,729,6561], and the alpha coefficient is 0.2.Figure 9Parameter sensitivity analysis at different scales and alpha coefficients.

Prediction of protein secondary structure by the improved TCN-BiLSTM-MHA model with knowledge distillation

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis

Chemistry wordoku #062 | Puzzle

Multi-output prediction of dose–response curves enables drug repositioning and biomarker discovery

Hot Topics

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis

Related Articles

Balancing Act: Pregnancy and Bipolar Disorder

Cohesion at the cellular level: flexible yet stable

Gut bacteria influence responses to immunotherapy in patients with asbestos related cancer

Quick Links

Must Read

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis

Chemistry wordoku #062 | Puzzle

Popular Articles

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis