De novo generation of SARS-CoV-2 antibody CDRH3 with a pre-trained generative large language model

The framework of PALM-H3 and A2BinderThe workflow and model framework of the PALM-H3 and A2binder are shown in Fig. 1. The purpose of PALM-H3 is to generate the de novo CDRH3 sequence in the antibody. As illustrated in Fig. 1a, the CDRH3 region plays the most vital role in determining an antibody’s binding specificity against a particular antigen sequence. As illustrated in Fig. 1b–e, PALM-H3 is a transformer-like model35 that uses the ESM2-based Antigen model as the encoder and Antibody Roformer as the decoder. As illustrated in Fig. 1f, we also built the A2binder for predicting the binding affinity of the artificially generated antibodies. The building of PALM-H3 and A2binder consists of three steps: first, we pre-train two Roformer models on unpaired antibody heavy and light chain sequences, respectively. Then we construct the A2binder based on the pre-trained ESM218, antibody heavy chain Roformer, and antibody light chain Roformer, and train it using paired affinity data. Finally, we construct PALM-H3 using the pre-trained ESM218 and antibody heavy chain Roformer and train it on paired Antigen-CDRH3 data for the de novo generation of CDRH3. The data statistics used for training can be found in Supplementary Table S1, while the details of the training and model hyperparameter settings can be found in Supplementary Note 1 and Supplementary Table S2.Fig. 1: Overview of the PALM-H3 and A2binder workflow.a Schematic of an antibody binding to the epitope region of an antigen. The CDRH3 loop, as the third CDR of the antibody heavy chain, plays an essential role in enabling specific antigen binding. b The framework of PALM-H3. It’s a Transformer-like neural network containing an antigen encoder model and an antibody decoder model. It takes the antigen sequence as input and generates a CDRH3 antibody sequence aiming to bind to the input antigen. The antigen encoder model is an ESM2-based model, which is pre-trained using UniRef50 protein sequences and fine-tuned using antigen sequences. The antibody decoder is a RoFormer-based model, which contains 12 antibody layers that were pre-trained and fine-tuned using antibody sequences. The key (K) and value (V) matrices from the last antigen layer are passed to every antibody layer as the input of the cross-attention sub-layer. c Internal architecture of the antigen layer and antibody layer. Both the antigen layer and antibody layer have two basic sub-layers, including a fully connected feed-forward sub-layer and a multi-head self-attention sub-layer. Additionally, the antibody layer uniquely includes cross-attention sub-layers. Input tokens of each layer are represented by the sum of token embeddings and rotary position embeddings, while the output is a high-dimensional vector representation for each input token. d The cross-attention sub-layer is the key to combining the high-dimensional representation of antigen sequence (K and V matrices) and in-context antibody sequence (Q matrix). e Schematic of the self-supervised pre-training of antibody RoFormer. Unpaired antibody sequences were used to pre-train the antibody RoFormer via masked language modeling. The model was trained to predict the identity of the masked tokens, learning generalizable representations of antibody sequences. f The framework of A2Binder. It takes the antigen sequence along with antibody heavy and light chain sequences as input. Each sequence is encoded by passing through a pre-trained encoder and a Multi-Fusion Convolutional Neural Network (MF-CNN) feature extractor. The MLP (a multilayer perceptron) model finally fuses the features from all three sequences to predict antibody-antigen binding affinity. The architecture of the MF-CNN is shown below.To pre-train the antibody Roformer models, over 1 billion unpaired antibody light and heavy chain sequences were collected from the Observed Antibody Space (OAS) database36,37 (Supplementary Table S1). As illustrated in Fig. 1e, the antibody model’s architecture was based on the Roformer26, which encodes the absolute position of amino acids with a rotation matrix, and was trained using the self-supervised pre-training strategy of learning ‘bio-language’ representation patterns of 1 billion antibody sequences in the first round pre-training. Thereafter, we extracted 81,750,886 antibody heavy chain and 17,754,502 antibody light chain sequences from COVID-19 patients for the second-round self-supervised pre-training of antibody heavy chain Roformer and antibody light chain Roformer, respectively. Specifically, the Mask Amino Acid (MAA) task was then applied to obtain neural network models to characterize patterns of antibody light and heavy chain sequences.With pre-trained antibody light and heavy chain Roformer models, we constructed A2binder and fine-tuned it on the antigen-antibody affinity task to enable it to learn the rules of antigen-antibody binding. The architecture of the A2binder is shown in Fig. 1f, it encompasses pre-trained language models for feature extraction, including antibody Roformers and previously pre-trained ESM2 model38, serving to extract information from light chain, heavy chain, and antigen sequences. Following each language model is a multi-layered CNN architecture named MF-CNN. The light-chain and heavy-chain Roformers from pre-training are used to extract information about light and heavy chains. We also employ a large-scale pre-trained model ESM218 for extracting features of antigen sequences. The MF-CNN was designed to combine the sequence feature extraction outputs from pre-training models. The output from the concatenation of the features from MF-CNN was utilized to predict the affinity. Further introduction to the model can be found in the ‘Methods’ section.For constructing PALM-H3, we adopted an encoder-decoder architecture where the encoder was initialized with the pre-trained weights from ESM2. As for the decoder component, we initialized its self-attention layers with the pre-trained weights from the antibody heavy chain Roformer model. We then trained the decoder’s cross-attention layers from scratch using sequence-to-sequence fine-tuning on paired antigen-CDRH3 data. This enabled leveraging the large unlabeled antibody data used to pre-train the Roformer and allowed us to circumvent the limitation of lacking sufficient paired data for full encoder-decoder training. As illustrated in Fig. 1b, d, the antigen and antibody models are stacked by 12 antigen and antibody layers, respectively. For each layer, PALM-H3 incorporates encoder and decoder self-attention sub-layers, with their initial weights inherited from the pre-trained ESM2 model and antibody heavy chain Roformer, respectively. The decoder also includes an Antibody cross-attention sub-layer, which is randomly initialized and fine-tuned using paired CDRH3-antigen sequence data for the sequence-to-sequence task. The last antigen layer passes k, v matrices into all antibody cross-attention sub-layers, while the q matrix comes from the antibody self-attention sub-layer. Through the attention mechanism35, PALM-H3 realizes the transformation task from antigen to CDRH3.Pre-training allows the model to learn a better representation of antibodiesDuring the pre-training stage, the model learns potential representation patterns of antibody sequences through exposure to a diverse range of antibody sequences, facilitating effective feature extraction from the input antibody sequences. The prediction accuracy was 92.74% and 94.14% for heavy chain Roformer and light chain Roformer, respectively, indicating excellent pattern characterization capability of the pre-trained model.We further investigated whether the pre-trained model could differentiate the antigenic region, type, and binding affinity targeted by the antibody. Initially, we utilize the CoV-AbDab39 database, which contains variant and epitope information of the antigens. Subsequently, we input the antibody sequences into both the untrained Roformer with randomly initialized weights and the pre-trained Roformer to obtain the embedding of the antibody sequence. We ran t-SNE to visualize the feature distribution, as shown in Fig. 2a. The feature representation of the untrained Roformer was scattered. Then, we evaluated the pre-trained model’s ability to represent epitopes. We selected antibody data from the CoV-AbDab39 database that target a specific epitope. Similarly, we utilized both the untrained and pre-trained models to obtain embeddings and subsequently employed t-SNE for dimensional reduction visualization. As shown in Fig. 2b, the pre-trained model produced aggregated embeddings for each epitope, contrasting with scattered embeddings from the untrained model. It is noteworthy that the degree of clustering for different variants was not as pronounced as that for different epitopes. This observation may suggest that the model’s ability to capture the features of different epitopes is stronger, as the variations among epitopes are more substantial compared to different variants. Furthermore, the clustering for the Receptor-binding Domain (RBD) epitope was not as strong as that for the Spike Protein Subunit 2 (S2) and N-terminal Domain (NTD) epitopes. This could be attributed to the significantly larger number of antibodies capable of targeting the RBD compared to other epitopes. Among these RBD-targeting antibodies, some may possess the ability to bind multiple epitopes, leading to a more diverse representation and weaker clustering for the RBD epitope.Fig. 2: Comparison of latent capabilities between pre-trained and untrained models and performance comparison of A2Binder versus baseline methods for antibody-antigen binding specificity prediction.a T-SNE projection of sequence embeddings for antibodies selectively targeting distinct SARS-CoV-2 variants. Antibodies that bound to multiple variants were eliminated. b T-SNE projection of model embeddings for antibodies specifically targeting unique epitopes of SARS-CoV-2. Antibodies that bound to multiple epitopes were eliminated. Antibody sequences used in subgraphs a and b are from CoV-AbDab dataset. c T-SNE projection of model embeddings of antibody sequences with different binding affinity. Each point represents a single antibody sequence from the BioMap dataset, with colors indicating the binding affinity, expressed as Delta G. d, e Receiver operating characteristic (ROC) curve and precision-recall (PR) curve evaluating the overall predictive performance of antibody binding specificity. Models compared include A2Binder, AbMAP, AntiBERTa2, ESM-F, Ens-Grad, and Vanilla BERT. Statistical significance was determined using one-sided t-tests. For the ROC metric, A2binder significantly outperformed the next best method AbMAP (p = 0.0308). For the PR metric, the difference was also significant (p = 0.0082). *P < 0.05, **P < 0.01, and ***P < 0.001, in the comparison with A2binder. Performance breakdown of A2Binder in predicting antibody binding specificity by antigen epitope region (f) and variant (g). The x-axis labels indicate the different epitope categories (f) and variants (g). Experiments were repeated 5 times. Dots represent metric values from individual experiments. Data are presented as mean values +/− SD. The CoV-AbDab dataset was split into training (80%), validation (10%) and test (10%) sets. The results shown in this comparison are based on the test set. Source data are provided as a Source Data file.To assess binding affinity characterization, antibodies from the BioMap40 dataset, which includes binding free energy (Delta G) as the antigen-antibody affinity data, were utilized for dimensionality reduction of embeddings (Fig. 2c). The pre-trained model effectively aggregated high and low-affinity embeddings.Collectively, these three comparison results demonstrate that pre-training enhances the model’s ability to extract critical information, such as the antibody’s binding antigen type, region, and affinity.A2binder can accurately predict the antigen-antibody binding probabilityThe performance of A2binder was evaluated by comparing its ability to predict affinity to that of several baseline methods including AbMAP41, a protein language model for antibody hypervariable regions; AntiBERTa242, a pre-trained antibody-specific sequence encoder model; ESM-F, an antigen-antibody affinity predictor based on ESM218; Ens-Grad, a CNN architecture proposed by Liu et al.43 for antibody CDR design; and Vanilla BERT, raw BERT model with randomly initialized weights that were trained for antigen-antibody affinity prediction, on multiple affinity datasets. Section “Baseline” provides detailed information about the baseline methods.Initially, the CoV-AbDab dataset was used39, resulting in 27,324 antigen-antibody pair data for 22 SARS-CoV-2 variants as antigens. Since this dataset does not contain specific affinity values, we used neutralization or non-neutralization as a label to evaluate the performance of the A2binder in a binary classification task. The details of data processing procedures are expounded upon in the dataset subsection of the Methods section.Figure 2d, e, and Supplementary Table S3 illustrate the performance of A2binder and the baseline methods on the CoV-AbDab dataset. A2binder outperformed all the baseline models in terms of the area under the receiver operating characteristic (ROC-AUC) and the precision-recall area under the curve (PR-AUC). A2binder achieved a PR-AUC of 0.922 which was a 2% performance improvement compared with the second-best method AbMAP. It is observed that the BERT model without pre-training performed the worst, which highlights the importance of pre-training in obtaining the characterization of antibody and antigen sequences for model performance. We also compared the model performance under different epitopes and variants, and the results are shown in Fig. 2f, g. The model can achieve good performance under different epitopes and variants.A2binder can accurately predict antigen-antibody affinityThe task of predicting binding affinity values through regression is more challenging than the binary task of predicting neutralization or non-neutralization. To assess the model’s performance in predicting affinity values, we also utilized two datasets, 14H and 14L44, that contain labels for affinity values. Both datasets contain a measure of the affinity of the antibody to a stable peptide in the HR2 region of SARS-CoV-245. The heavy chains of the 14H dataset vary, while the light chain is constant, whereas the 14L dataset is the opposite. Therefore, for the 14H dataset, we used the pre-trained heavy chain Roformer to extract features from the CDRH1, 2, and 3 regions, while the 14L dataset used the pre-trained light chain Roformer. The details of the specific data processing process are in the dataset subsection of the Methods section. Table 1 and Supplementary Figs. S1, S2 illustrate the performance comparison of models on 14H and 14L. A2binder outperformed all baseline models in Pearson’s correlation and Spearman’s correlation metrics. A2binder achieved a Pearson’s correlation of 0.642 on the 14H dataset (3% improvement), and 0.683 (1% improvement) on the 14L dataset. AbMAP and AntiBERTa2 outperformed other baseline methods in all metrics, further verifying the significance of pre-training. Additionally, the sequence pre-training of SARS-CoV-2 antigens may assist the model in learning the characterization of SARS-CoV-2-related antibody sequences more effectively.Table 1 The performance comparison between A2binder and other baseline models on the Antigen-Antibody Affinity data setsTo verify the model’s ability to predict antibody affinities for antigens other than SARS-CoV-2, the BioMap dataset was used to evaluate the model’s prediction performance. Table 1 and Supplementary Fig. S1 illustrate the performance comparison of the proposed model on the BioMap dataset. A2binder achieves a 7% performance improvement in reaching a Spearman’s correlation of 0.746 and a Pearson’s correlation of 0.701. Consistent with previous results, A2binder outperforms the baseline methods on all metrics. This further supports the model’s ability to accurately predict antigen-antibody affinity, regardless of whether the antigen is related to SARS-CoV-2 or not. This may be attributed to the use of MF-CNN architecture in A2binder, which enables the extraction of global feature output from a large-scale pre-trained model.PALM-H3 outperforms baselines in generating antibodies with a high binding probabilityTo benchmark the quality of antibody sequences generated by PALM-H3, we employed SeqDesign16, an autoregressive generative model for protein sequence design, and IgLM23, a language model specifically designed for synthetic antibody library generation, for comparison. Specifically, we selected the CDRH3 sequences of natural antibodies targeting the wild-type SARS-CoV-2 RBD region from the CoV-AbDab database. Subsequently, we employed PALM-H3 and baseline methods to generate 1000 CDRH3 sequences targeting the same epitope. PALM-H3 achieved a perplexity of 4.96 for the generated sequences, lower than baseline methods IgLM and SeqDesign (t-test, p < 0.01). A lower perplexity score indicates better quality of the generated sequences. Therefore, the result suggests that PALM-H3 generates higher-quality sequences than the baseline methods. Then, we evaluated the quality of the generated sequences. Following the previous benchmark method46, we introduced Sequence Recovery Rate (SRR) as a metric to assess the diversity of the generated sequences and their similarity to natural sequences. Additionally, we employed the SOTA antibody-antigen complex structure prediction method, tFold47, to generate complexes between the modeled antibodies and the target antigen, following the benchmark protocol. We then utilized tFold to evaluate the predicted template-modeling score (pTM), the interface pTM (ipTM), and the predicted local distance difference test (pLDDT) for the different methods. The pTM and ipTM provide an estimate of the likelihood that the modeled antibody will bind to the correct epitope and form a stable complex, while pLDDT is a confidence measure for the predicted antibody structure. As shown in Fig. 3a and Supplementary Fig. S4, the results demonstrate that PALM-H3 outperforms the baseline methods in terms of SRR. Furthermore, through the tFold evaluation, PALM-H3 achieved higher pTM, ipTM, and pLDDT scores compared to baseline methods. We have added standard deviations (std) to the metrics in the tables, and conducted t-tests on these metrics, which showed that PALM-H3 significantly outperforms other methods (p < 0.01). This suggests that the sequences generated by PALM-H3 are more likely to target the correct epitope and form stable binding complexes.Fig. 3: Performance Comparison with baseline methods and similarity analysis of artificial and natural antibodies.a Comparison of PALM-H3 with baseline methods, SeqDesign and IgLM, in generating CDRH3 sequences targeting the SARS-CoV-2 RBD. Bold indicates the best results. Values in parentheses represent standard deviation. b Sequence logo of the CDRH3 region in artificial and natural antibodies. The CDRH3 sequences of natural antibodies are sourced from antibodies in the CoV-AbDab dataset that bind to the RBD region of wild-type SARS-CoV-2, while artificial antibodies CDRH3 sequences are obtained by inputting the RBD sequence of the wild-type SARS-CoV-2 to the PALM-H3. c Comparison of A2binder-predicted binding probabilities to the wild-type SARS-CoV-2 RBD region between artificial antibodies and randomly mutated antibodies (n = 800). Artificial antibodies and randomly mutated antibodies with the same Levenshtein distance as natural antibodies are compared. Boxplot showing the distribution of A2binder-predicted binding probabilities across different Levenshtein distances. The x-axis denotes Levenshtein distance and the y-axis shows predicted binding probability. Blue boxes represent artificial antibodies while purple boxes denote randomly mutated antibodies. d A2binder-predicted binding probabilities of artificial antibodies at different BitScore ranges (n = 662). The BitScore measures the sequence similarity between artificial antibodies and natural antibodies binding to the same epitope. The x-axis denotes Bit score ranges and the y-axis shows predicted binding probability. The depth of the color indicates an increase in BitScore. The diamond represents outliers. e A2binder-predicted binding probabilities of artificial antibodies at different Root Mean Square Deviation (RMSD) ranges (n = 662). The RMSD measures the structure similarity between artificial antibodies and natural antibodies binding to the same epitope. The x-axis denotes RMSD ranges and the y-axis shows predicted binding probability. The depth of the color indicates an increase in RMSD value. The diamond represents outliers. In c-e, the top whisker, top of the box, middle line, bottom of the box, and bottom whisker indicate the maximum, 75th percentile, median, 25th percentile, and minimum values, respectively. Source data are provided as a Source Data file.Besides, we created a sequence logo plot for both artificial and natural antibodies. Figure 3b illustrates that the first three amino acids of the generated antibodies are similar to the natural antibodies since ‘ARD’ has the highest probability. The artificial antibodies exhibit greater diversity in their tail sequences, with the most probable tail being ‘DY’. Additionally, the middle regions of the generated antibodies display considerable diversity.To investigate whether dissimilar sequences result in reduced binding probability, we computed the edit distance between generated antibody sequences and natural antibodies. We divided the dataset based on edit distance and employed the A2binder to predict the binding probability, as shown in Fig. 3c. For comparison, we also generated sequences with random mutations and randomly generated sequences in line with the edit distance. The results indicated that the generated antibodies exhibited a higher binding probability and did not exhibit a declining trend in probability as the edit distance increased. In contrast, the random mutation results showed a decrease in affinity probability as the edit distance increased.Furthermore, we obtained the BitScore of the artificial antibody by the Basic Local Alignment Search Tool (BLAST)48. A larger BitScore value indicates a higher similarity with the natural antibody. As shown in Fig. 3d, the artificial antibodies did not exhibit a decrease in binding probability due to low similarity, which is consistent with the previous analysis.To investigate the influence of structure on binding probability, we utilized AlphaFold2 (AF2)49 to generate the structure of the artificial antibody and computed the Root Mean Square Deviation (RMSD) between the artificial and natural antibodies.As depicted ed in Fig. 3e, an increase in RMSD, ranging from 0.625 to 0.829 Å, results in a decrease in the average probability of antibody binding. This may suggest that a decrease in structural similarity could lead to a reduction in the likelihood of antigen-antibody binding. However, even in the interval with the highest RMSD, the binding probability remains higher than 0.5. In conclusion, the PALM-H3 is capable of generating a diverse set of antibody sequences with low sequence similarity, yet still exhibiting high binding probabilities.PALM-H3 can generate antibodies with high binding affinity to diverse SARS-CoV-2 variantsTo comprehensively assess the binding characteristics of antibody sequences generated by PALM-H3, we employed three structure prediction methods: AF249, tFold47, and AbBuilder50 to simulate and compare the binding of PALM-H3 generated antibodies and natural antibodies against four SARS-CoV-2 variants: wild-type, Alpha, Delta, and the emerging XBB variant not included in the training dataset.We first utilized PALM-H3 to generate potentially high-affinity CDRH3 sequences targeting the four variants. Subsequently, we leveraged our A2binder model as an efficient screening tool to predict and rank the binding affinities of these sequences against the target antigens. The top-ranked sequences exhibiting the highest predicted binding scores were then prioritized for comprehensive experimental validation. It is worth noting that A2binder is a fast-screening method that helps us quickly screen for potential high-affinity antibodies. One CDRH3 region generated by PALM-H3 for the HR2 region of SARS-CoV-2 wild-type was GRREAAWALA, of which the predicted binding free energy is 1.70, smaller than the other generated CDRH3 and the natural CDRH3, GKAAGTFDS (Supplementary Fig. S5). Besides, one CDRH3 region generated by PALM-H3 for the XBB variant, AKDSRTSPLRLDYS, exhibited a predicted neutralization degree of 3.01 from A2binder, higher than the natural antibody (Supplementary Fig. S6). We selected the highest-affinity antibodies from the A2binder’s predictions and conducted structural modeling using AF2, AbBuilder, and tFold. We employed ClusPro to perform antigen-antibody docking. For comparison, we also performed the same docking process on the natural antibodies. To investigate the ability of the A2binder, we employed SnugDock to adjust the pose of the antigen-antibody complexes. It is worth noting that the validation of docking may not be entirely accurate, but it is a widely used computational method for antibody assessment. Docking has aided in the development of numerous antibody design approaches. We employ docking as an external validation tool to further discern the affinity of antibodies selected through A2binder screening.As illustrated in Fig. 4, across all four SARS-CoV-2 variants, the selected high-affinity artificial antibodies generated by PALM-H3 consistently exhibited lower interface binding energies. These energies were calculated by SnugDock after structural optimization, compared to natural antibodies. AbBuilder and AF2 results revealed significantly lower energies for the artificial antibodies, although, for the tFold-based results on the XBB variant, interface binding energies showed no significant difference between artificial and natural antibodies. Furthermore, Supplementary Fig. S7 displays the differences in Interface RMDS (IRMSD) between artificial and natural antibodies. For at least one of the structure prediction methods, the IRMSD values for the artificial antibodies were significantly lower than those of the natural antibodies across all four variants. Notably, there was no combination of variant and prediction method for which the natural antibody’s IRMSD was significantly lower than that of the artificial antibody. Collectively, this trend held true regardless of the structure prediction method used, suggesting the robustness of PALM-H3’s ability to design high-affinity binders against diverse viral targets.Fig. 4: Comparison of interface energies between the selected high-affinity artificial antibodies predicted by A2binder and natural antibodies targeting the SARS-CoV-2 spike protein across different variants and computational structure generation methods.Density distribution plots of interface energies for artificial (blue) and natural (red) antibodies binding to the wild-type (a), Alpha (b), Delta (c), and XBB (d) variants of SARS-CoV-2. Results are shown for three different antibody structure generation methods: tFold (left), AbBuilder (middle), and AF2 (right). Interface energies were calculated from 1000 optimized antibody-antigen binding poses using SnugDock. Lower interface energy values indicate more favorable binding. The distributions highlight the ability of computational methods to generate artificial antibodies with binding properties comparable to natural antibodies across multiple spike variants. Source data are provided as a Source Data file.Notably, for the emerging XBB variant, the PALM-H3 generated CDRH3 AKDSRTSPLRLDYS exhibited exceptionally low interface energies, outperforming the natural antibody in structure prediction methods (Supplementary Fig. S6). Visual inspection of the docked complexes revealed that this artificial antibody formed concentrated interactions with key light chain residues (A25, Q27, S28, Y32, and Y92) on the XBB spike protein. The shortest hydrogen bond measures 1.8 Å, while the longest extends to 2.6 Å.These comprehensive results, supported by multiple structure prediction approaches and docking simulations, suggest that PALM-H3 can reliably generate antibodies with high binding affinities against not only the wild-type SARS-CoV-2 but also its rapidly evolving variants of concern, such as Alpha, Delta, and XBB. This capability is particularly valuable for developing therapeutic antibodies targeting the relatively conserved epitopes on continually mutating viral antigens.Comparison of PALM-H3 with previous methods in antibody designDe novo generation of antibody CDRH3 has lots of advantages compared to traditional methods in antibody design. The first advantage is efficiency. For example, widely used antibody-antigen binding structure modeling tools, such as Rosetta12 and Absolute!51, have been used to design antibodies. The general idea of using Rosetta12 and Absolute!51 in antibody design is to replace amino acids of natural antibody CDRH3 and subsequently assess the efficacy of the modified antibody using these tools. Exploring all possible combinations of amino acid changes is impossible due to the computational resources required by these tools. To illustrate this, we used 1000 PALM-H3-generated antibody CDRH3 sequences for the HR2 region of the SARS-CoV-2 spike protein, which were found to have Levenshtein distances from 4 to 10 from natural antibodies. As shown in Fig. 5a, b, to obtain artificial antibodies with the same Levenshtein distances to natural ones, both Rosetta and Absolute! require over 200 times more time consumption compared to PALM-H3. PALM-H3 exhibits great advances in saving the computational resources of antibody design. Notably, due to the direct generation of results with different edit distances, PALM-H3’s computational consumption is not affected by an increase in distance.Fig. 5: Comparison between PALM-H3 and traditional computational antibody design methods.a Distribution plot showcasing the Levenshtein distance among antibodies generated using PALM-H3. b A comparison of the time expenditure for antibody design at varying Levenshtein distances from natural antibodies is conducted among Rosetta, Absolute!, and PALM-H3. The top row illustrates various Levenshtein distances, while the subsequent three rows represent the time required by each method to design antibodies at these distances to natural antibodies, measured in CPU hours. c Comparison of the binding affinity, indicated by interface energy, between antibodies produced by PALM-H3 and those generated by E-EVO and EvoEF2. The interface energy values were determined independently through SnugDock.Exploring all possible combinations of amino acid changes is impossible due to the computational resources required by traditional methods. Thus, a popular strategy is to change the amino acid sequentially28. Such a strategy saves computational resources but has limitations, such as the potential to become trapped in local optima, which hinders exploration of the global fitness landscape of the sequence space. As a result, the traditional strategy may lead to suboptimal or ineffective antibody designs. To illustrate this, we employed E-EVO21, a language-model-guided affinity maturation approach, and EvoEF252, an efficient protein design tool based on the EvoEF energy function, to design antibodies using the traditional strategy and compared their antigen-binding affinity to those generated by PALM-H3. We selected the antibody generated by PALM-H3 which was deemed optimal by A2binder and had an edit distance of 7 to natural antibody. Therefore, we utilized EvoEF2 to perform seven rounds of single-point mutation on natural antibodies and selected the generated antibody with the highest affinity. Furthermore, we employed E-EVO for the mutational optimization of natural antibodies, resulting in the creation of an artificial antibody. Then we used SnugDock13, which is an antibody-antigen docking tool developed by the Rosetta group, to evaluate the antigen-binding affinity of antibodies generated by PALM-H3, E-EVO, and EvoEF2, respectively. Figure 5c displays the comparison of the interface binding energies, indicating that the antibodies generated by PALM-H3 exhibited significantly lower interface binding energies compared to those generated by EvoEF2 and E-EVO. This comparison further emphasizes the advantages of PALM-H3 in antibody design.PALM-H3 is highly interpretableTo validate the interpretability of PALM-H3 and its ability to focus on crucial interaction sites during the learning process, we performed statistical analyses on structures from the BioMap database. Specifically, we used PyMOL to identify potential hydrogen bond locations between antigen-antibody chains in the structures. We then compared the mean attention weights from PALM-H3 at these hydrogen bond sites versus other residue positions. This allowed assessing whether the model attends to structurally interacting residue positions.We divided attention weights into two groups: those at hydrogen bond sites versus those elsewhere. An t-test revealed that PALM-H3’s attention weights were significantly higher at the identified hydrogen bond locations compared to other sites (p < 0.01). This statistically significant difference provides strong evidence that the model’s attention mechanism can effectively capture key interacting residue positions between antigens and antibodies. Supplementary Fig. S8 further illustrates this finding through a boxplot comparing the distribution of attention weights at hydrogen bond sites versus non-bonding sites. The bond sites had a higher average attention weight (0.20109) compared to other sites (0.00007), and the median attention weight was also greater at bond sites (−0.09) versus other sites (−0.15).To provide specific examples, we inputted artificial antibody sequences generated by PALM-H3 and their target antigen sequences into the model. Figure 6a illustrates the attention weights output by PALM-H3, with red indicating high attention weights and blue indicating low attention weights. The intensity of the color represents the strength of attention. Our analysis revealed that the attention weights of the correct docking sites in PALM-H3’s output were generally high, with the highest attention values observed at the R residues in the CDRH3 region, which forms hydrogen bonds with D residues in the HR2 peptide segment. This suggests that PALM-H3 can correctly capture key contact sites, providing insight for further research and optimization of antigen-antibody binding.Fig. 6: Interpretability analysis of PALM-H3 in generating antigen-specific antibody CDRH3 sequence.a Heat maps displaying cross-attention values of PALM-H3 when generating CDRH3 sequence “GRREAAWALA” that targets the epitope “PDVDLGDISGINAS” of SARS-CoV-2. Notably, residue D of the epitope and residue R of the CDRH3 region of the antibody exhibit the highest interaction attention values. Consistent with the cross-attention values, in the binding complexes shown on the right, these two residues form a hydrogen bond link between them. b Heat maps displaying cross-attention values of PALM-H3 when generating CDRH3 sequence “AKDSRTSPLRLDYS” that targets the SARS-CoV-2 variant XBB. c Consistent with the high cross-attention values of the residue 167–177 in the SARS-CoV-2 variant XBB, these residues play important roles in binding to the generated CDRH3. Source data are provided as a Source Data file.Moreover, we analyzed the ability of the model to generate high-affinity antibodies against the new variant XBB. Figure 6b illustrates the attention weights generated by PALM-H3. We observed that the model exhibited a higher attention weight on the region 167–177 of the antigen, specifically corresponding to the binding pocket of XBB and the antibody. Figure 6c shows a zoomed-in view of this region, which indicates that the attention weights are generally higher than the average. Additionally, the key positions for hydrogen bond formation between the antigen and the antibody, S168–C170, and Q175–S176, were found to have high attention values. Among these key positions, only C170 had an attention weight lower than the average, while all other key positions had attention weights higher than the average. We observed that the region 167–177 of the antigen contains XBB-specific mutation sites: S168, N169, and Q175. A previous study has shown that S168 may confer resistance to RBD class 1 and 2 mAbs, while N169 contributes to resistance against RBD class 3 mAbs53. Additionally, previous studies have indicated that the Q175 mutation in XBB restores its receptor affinity, thereby restoring its fitness54,55. These findings further suggest that the model may be able to correctly identify and capture key positions of antigen-antibody interaction, pointing the direction for further investigation of the XBB variant.While the interpretation of attention mechanisms remains an active area of research, our statistical and visual analyses provide compelling evidence that PALM-H3’s attention patterns have the potential to meaningfully highlight key structural contacts between antigens and antibodies.In-vitro assays of artificial and natural antibodiesTo further validate the effectiveness of antibodies generated by PALM-H3 against the wild-type spike protein of SARS-CoV-2, we selected the top-ranked Artificial 1 antibody along with Artificial 2 antibody based on their predicted binding probabilities by A2binder and two natural antibodies, Natural 1 and 2. We then evaluated their binding ability using in-vitro assays. The Western blot analysis demonstrated that Artificial 1 and 2 were capable of binding to the spike protein at levels similar to or even surpassing, those of natural antibodies (Fig. 7a). To further determine their binding affinity and neutralization capability, we conducted surface plasmon resonance analysis and pseudovirus neutralization. Artificial 1 demonstrated high binding affinity with an equilibrium dissociation constant (KD) of 0.05 nm, and superior neutralization potency with a half maximal inhibitory concentration (IC50) of 0.023 μg/ml, compared to all the tested natural antibodies (Fig. 7e).Fig. 7: In-vitro assays of the binding affinity and neutralization of artificial and natural antibodies.Western blot analysis of artificial and natural antibodies binding to the spike protein of (a) wild-type, (b) Alpha variant, (c) Delta variant, and (d) XBB variant of SARS-CoV-2. HEK293T cells are used to produce pseudotyped vectors. The x-axis indicates the sample of each band, and the y-axis shows the position of antigen binding. Band intensity demonstrates the affinity between the corresponding antibody and antigen. β-Actin bands at the bottom monitor loading consistency across samples. e The result of surface plasmon resonance analysis and pseudovirus neutralization assays of artificial and natural antibodies, and A2binder predictions of the binding probability for the tested artificial and natural antibodies. The color legend on the right indicates value ranges for different colors. Lower KD and IC50 values signify stronger binding affinity and more potent neutralization capability, respectively. Experiments were repeated 3 times independently with similar results. Source data are provided as a Source Data file.Next, we evaluated PALM-H3’s performance on two other variants, Alpha and Delta. For the Alpha variant, we selected the top artificial antibody Artificial 1 predicted by A2binder, along with three other randomly selected artificial antibodies (Artificial 2–4) with moderate and lower predicted binding probabilities and a natural Alpha antibody. Western blot analysis validated their binding to the Alpha spike protein (Fig. 7b). To further quantify their functional activity, surface plasmon resonance analysis and pseudovirus neutralization assays were performed. As shown in Fig. 7e, Artificial 1 had a high binding affinity with a KD of 0.29 nM, outperforming the natural antibody (0.32 nM). Pseudovirus neutralization assays further demonstrated Artificial 1’s potent neutralization capability against Alpha with an IC50 of 0.006 μg/mL, superior to the natural antibody (0.02 μg/mL) (Fig. 7e). The other artificial antibodies exhibited much lower neutralization potencies, consistent with their predicted binding probabilities.Similar experiments were conducted for the Delta variant. Western blot analysis validated their binding to the Delta spike protein (Fig. 7c). Besides, the top artificial antibody Artificial 1 exhibited strong binding affinity (KD 0.89 nM) and neutralization potency (IC50 0.26 μg/mL) against Delta, which were comparable to the natural Delta antibody. Moreover, Artificial 3 also demonstrated moderate neutralization with an IC50 of 0.57 μg/mL (Fig. 7e). These results validated PALM-H3’s ability to generate highly effective antibodies against known viral variants. The above assays demonstrated that PALM-H3 could generate antibodies surpassing natural antibodies for antigens known in training.We next evaluated PALM-H3’s ability to generate artificial antibodies against the novel SARS-CoV-2 Omicron variant XBB, which represents a more challenging test case as the model did not see this antigen during training. Western blot analysis validated the binding of these antibodies to the XBB spike protein (Fig. 7d). Besides, as shown in Fig. 7e, Artificial 1 demonstrated higher binding affinity, with a KD of 0.13 nm, compared to the natural antibody, and superior neutralization potency against XBB, with an IC50 of 0.00301 μg/ml. The improved performance of Artificial 1 despite no prior exposure to XBB proved PALM-H3’s capacity to generate highly potent antibodies even against novel antigen variants. Consistent with the lower bind probabilities predicted by A2Binder, Artificial 2–4 showed much lower affinities and neutralization than Artificial 1 and natural XBB antibodies. This demonstrated A2binder’s capability to effectively guide the antibody selection for further wet-lab investigations.

De novo generation of SARS-CoV-2 antibody CDRH3 with a pre-trained generative large language model

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis

Chemistry wordoku #062 | Puzzle

Multi-output prediction of dose–response curves enables drug repositioning and biomarker discovery

Hot Topics

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis

Related Articles

Balancing Act: Pregnancy and Bipolar Disorder

Cohesion at the cellular level: flexible yet stable

Gut bacteria influence responses to immunotherapy in patients with asbestos related cancer

Quick Links

Must Read

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis

Chemistry wordoku #062 | Puzzle

Popular Articles

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis