ImmuneApp for HLA-I epitope prediction and immunopeptidome analysis

Mass spectrometry-eluted HLA ligandsMono-allelic dataSingle-allelic EL datasets were collected and processed from the training data of NetMHCpan-4.149 and MHCflurry-2.051, which was carefully processed and filtered from publications by Sarkizova et al.22 and Abelin et al.46 as well as MS hits from the IEDB43, SysteMHC Atlas60, and established datasets from their previous versions. Moreover, multiple HLA-I peptidomics from studies by Jappe et al.61 and Faridi et al.62 were obtained. These datasets were then integrated and duplicate entries were removed. All peptides employed in the new training dataset were filtered to only include 8 to 15 amino acid long peptides, resulting in 349,650 ligands restricted to 149 distinct HLA-I alleles. We referred these to the MONOALLELIC training data (Supplementary Data 1). To benchmark the predictors in this study, we collected an external single-allelic dataset from a recently published HLA-I peptidomics63. This dataset contained 43,866 HLA-I ligands; it was not included in the training of any previous predictors. This dataset was referred to as MONOALLELIC-testing data (Supplementary Data 2).Multi-allelic dataTo increase the number of ligands and encompass a wider array of HLA-I alleles, we incorporated publicly available multi-allelic HLA peptidomics data, where the precise HLA class I restrictions had not been experimentally established. Only samples with confirmed HLA-I typing were utilized. We categorized all curated samples into two groups. MULTIALLELIC-Recent included 47 samples from four recently published HLA-I peptidomics studies35,63,64,65. It contained 435,397 eluted ligands covering 86 different HLA alleles (Supplementary Data 3). This dataset was not used to train any previous predictors. Therefore, it was employed to benchmark the predictors developed in this study and others in a multi-allelic setting. This benchmarking involved assessing the predictors’ performance in identifying true ligands within extensive random peptide libraries and comparing the consistency of HLA binding motifs derived through deconvolution with established motifs. MULTIALLELIC-All comprised 948,160 identified ligands from 216 samples representing 110 different HLA alleles (Supplementary Data 4, MUTLIALLELIC-Recent was included). This dataset was transformed into pseudo-mono-allelic data using our developed deconvolution method, which was then combined with the actual mono-allelic data to train a comprehensive model. Notably, all data were obtained from the original publications without being filtered by any HLA-I ligand predictors. This approach ensures that our dataset remains free from biases introduced by such filtering.Quantitative binding affinity measurementsThe most widely used dataset of MHC-I binding affinity was originally acquired from the IEDB43. To develop a model capable of predicting peptide binding to various MHC molecules, especially in humans, we also incorporated another dataset from Pearson et al.66. The final dataset included over 200,000 quantitative BA measurements across peptides and 190 MHC-I alleles (Supplementary Data 7). The IPD-IMGT/HLA database was used to retrieve the MHC molecule sequences. The following equation was used to convert the peptide–MHC binding affinities represented as IC50 in nM units:$${{{\rm{Transformed\; score}}}}=1-\frac{\log \left({{{\rm{Affinity\; measurement}}}}\right)}{\log \left(50000\right)}$$
(1)
The neoepitope immunogenicity datasetWe extracted immunogenicity data from the training datasets of PRIME-1.055 and PRIME-2.050, as well as data obtained from the IEDB on December 19, 2023. The training datasets used from PRIME-1.0 and PRIME-2.0 included neoepitopes, viral antigens, and cancer-testis antigens. The first training datasets used in PRIME-1.0 and PRIME-2.0 included neoepitopes, viral antigens, cancer-testis antigens, and 9-mer peptides that were chosen at random from the human proteome to serve as negative examples. Objectives of our queries to the IEDB included human hosts, infectious illnesses, T-cell tests, linear peptides, and MHC-I restriction of our queries to the IEDB included human hosts, infectious illnesses, T-cell tests, linear peptides, and MHC-I restriction. After eliminating overlapping data with all previously curated datasets, the final immunogenicity data included a total of 5050 immunogenic neoepitopes and 7745 non-immunogenic ones. We refer to these as the IMMUNOGENIC training data. We employed deep transfer learning on this immunogenicity data, resulting in the creation of a novel immunogenicity predictor. Further, to benchmark the immunogenicity predictors developed here, and others, an external immunogenicity dataset was compiled by collecting NEPdb67, Neopepsee68, TESLA54, and the data from 16 cancer patients using the MANAFEST assay69,70. After excluding overlaps with all other single- and multi-allelic peptidomics, as well as the IMMUNOGENIC-training data, we compiled 349 immunogenic neoepitopes and 1838 non-immunogenic ones (see Supplementary Data 6, IMMUNOGENIC-testing data).Decoy selectionThe negative datasets were constructed by randomly picking peptides (decoys) in the UniProt human reference proteome (UP000005640_9606) that did not overlap with the identified ligands (hits). We constructed a pool of random peptides (8–15 amino acids long) and sampled a large number of length-matched decoy peptides with the observed allele to avoid bias. We excluded all peptides, including both hits and decoys, that contained non-canonical amino acids. Decoy generation for benchmarking purposes was conducted separately from the decoy generation employed during model training.Peptide representationMHC sequences and peptides are used as input by the ImmuneApp architecture. These sequences were both encoded using the common BLOSUM50 substitution matrix, with each residue represented by its corresponding row in the matrix. peptides with 8–15 amino acids long were converted as a 30-mer sequence by combing left and right-aligned representations, since our deep learning networks demand fixed-length inputs. The central gaps of peptides with less than 15 residues were filled with zero padding. For instance, “FLLVTLAIL” is represented by concatenating “FLLVTLAILXXXXXX” (left aligned), and “XXXXXXFLLVTLAIL” (right aligned), yielding the 30-mer sequence “FLLVTLAILXXXXXXXXXXXXLIALTVLLF”. This design was primarily motivated by structural research on peptide-MHC complexes. Previous studies revealed that the termini of peptides often play a more critical role in binding than the central regions, as they are typically positioned in two specific binding pockets within the peptide binding grooves. Therefore, each ligand is encoded into a 30 × 20 matrix using such a pair-end approach.HLA allele representationThe 34 amino acids derived from the multiple sequence alignment were used by the neural network to represent MHC-I molecules. According to the NetMHCpan tool, this representation is referred to a “pseudo-sequence”. These residues were in close proximity to the peptide residues, specifically within 4.0 angstroms. The entire set of chosen locations were 31, 33, 48, 69, 83, 86, 87, 90, 91, 93, 94, 97, 98, 100, 101, 104, 105, 108, 119, 121, 123, 138, 140, 142, 167, 171, 174, 176, 180, 182, 183, 187, 191, 195, based on HLA-A*01:01 protein residue numbering (IMGT accession HLA00001), starting from 1. Using the BLOSUM50 substitution matrix, each amino acid was converted to a 36 × 20 matrix-vector, much as the ligand encoding.Deep learning techniquesImmuneApp implements a novel pan-allele MHC-I binding model that supports variable-length peptides of 8–15 amino acids. This model is developed using a hybrid deep learning architecture, which autonomously identifies critical residues and distinguishing features within the peptides. The architecture consists of five primary parts: a feature encoding module, a convolutional module, an LSTM module, an attention module, and an output layer. The model initially runs via the convolutional module (ConV) for initial feature extraction after passing the first feature encoding module through it:$${Convol}{\left(L\right)}_{{ij}}=\,{\sum }_{r=0}^{R-1}{\sum }_{c=0}^{C-1}{K}_{{rc}}^{M}{L}_{i+r,c}$$
(2)
where L stands for the input antigen or MHC pseudo-sequence, i and j represent the indices for the output position and the kernel, respectively. KM serves as a convolutional kernel by a R × C weight matrix, where R denotes the kernel’s window size, and C represents the input dimension. To reduce the dimensionality of the MHC initial feature map, a max-pooling operator was implemented after the convolutional layer.To effectively capture the intricate long-range relationships within the sequence, the model passes the extracted feature maps into an LSTM layer. The LSTM unit consists of four components: an input gate, a forget gate, an output gate, and a single cell that can recall characteristics across any time period. Specifically, taking a peptide with length T as input \({\left\{{x}_{p}\right\}}_{{p}=\,1}^{T}\) in LSTM, and for each position t, define the input gate as It, forget gate as Ft, output gate as Ot, hidden state as Ht and cell state as Ct. The steps in the LSTM training procedure are as follows:$${F}_{t}=\sigma \left({W}_{f}\,\times \left[{x}_{t},\,{H}_{t}-1\right]+{b}_{t}\right)$$
(3)
$${I}_{t}=\sigma \left({W}_{I}\,\times \left[{x}_{t},\,{h}_{t}-1\right]+{b}_{I}\right)$$
(4)
$${C}_{t}=\,{F}_{t}\times {C}_{t-1}-{I}_{t}\times \tanh \left({W}_{C}\,\times \left[{x}_{t},\,{h}_{t}-1\right]+{b}_{C}\right)$$
(5)
$${O}_{t}=\sigma \left({W}_{O}\,\times \left[{x}_{t},\,{h}_{t}-1\right]+{b}_{O}\right)$$
(6)
$${H}_{t}=\,{O}_{t}\times \tanh \left({C}_{t}\right)$$
(7)
To learn all the hidden features within the LSTM layer and assign greater weight to critical locations, recurrent outputs are densely connected to an attention module. Mathematically, the attention mechanism generates an output vector by using the variables \({\left\{{B}_{t}\right\}}_{{t}=\,1}^{T}\) from LSTM layer. As demonstrated below:$${\alpha }_{t}=\,\frac{\exp \left(w\left({B}_{t}\right)\right)}{{\sum }_{i=1}^{T}\exp \left(w\left({B}_{i}\right)\right)}$$
(8)
$${As}={\sum }_{t=1}^{T}{\alpha }_{t}{B}_{t}$$
(9)
where w denotes a neural network calculating a scalar weight. A fully connected layer is formed by concatenating the outputs from both the LSTM and attention modules. The output layer applies a sigmoid nonlinear transformation to generate the probability of antigen presentation for specific HLA class I alleles.Deconvolution of multi-allelic immunopeptidomics dataUsing our curated MONOALLELIC training data, a new pan-binding prediction model (ImmuneApp-EL) was generated to estimate the likelihood that a query peptide is presented by an HLA-I allele. This training dataset encompassed 149 alleles and comprised 349,650 EL alongside 17,482,200 decoy peptides. To address the sample imbalance issue and enhance the robustness of the model, ImmuneApp-EL was implemented using a balanced class-weight approach and the ensemble learning strategy. Specifically, “compute_class_weight” function (Python package sklearn) was applied to calculate class weights, which were added during the model fitting. Moreover, different batch sizes (2048, 4096, 8192, 16,384, and 32,768) were set. For each batch size, the MONOALLELIC-training dataset was divided in a 4:1 ratio, allocating data for training and validation. The number of training epochs was determined based on the maximization of AUPRC on the validation dataset, with a cap of 100 epochs imposed. The training process is repeated five times to ensure every subset is used for both training and testing (like fivefold cross-validation), resulting in the generation of 25 models. The final prediction score for each query was the average of the 25 models’ outputs.Using ImmuneApp-EL, we developed a model-driven deconvolution method to transform immunopeptidomics as pseudo-single-allelic ligands. For each sample, we initially made predictions for HLA alleles. To make the predicted scores for different alleles comparable in a sample set, we calibrated raw scores using percent rank values. To this end, percentileofscore function (Python package stats) was used to compute the percentile rank of a score for each allele relative to a list of scores in a background set of 500,000 decoy peptides. For each individual sample, we eliminated all allele-peptide pairs that had a predicted binding rank exceeding the 20% threshold, thereby discarding peptides that were unlikely to bind to any of the specified alleles. In instances where multiple alleles were predicted to bind with a particular peptide, the allele-peptide pair that had the lowest binding rank (indicating the strongest binder) was chosen.Development of integrative antigen presentation model and immunogenicity predictorEncouraged by the previous evidence that integrating multi-allelic ligands could improve the performance of antigen presentation prediction, we further processed and incorporated available immunopeptidomics data to enhance model training. The final curated multi-allelic dataset comprises 969,435 ligands restricted to 110 HLA-I molecules from 216 samples. By employing our model-driven deconvolution method, we mapped 835,551 ligands to 104 alleles and obtained 328,227 unique HLA allele-ligand pairs. Subsequently, the mono-allelic and pseudo-mono-allelic datasets were merged. After the removal of duplicates, we compiled a total of 573,453 unique allele-ligand pairs representing 162 alleles (Supplementary Data 5). This comprehensive dataset was utilized as the final training set for a mixed prediction model, ImmuneApp-MA, following the aforementioned training strategy. In addition, accurate prediction of immunogenic neoepitopes, especially within the top-ranked outputs, helped in identifying potential targets for immunotherapeutic approaches, representing a challenge for most antigen-binding related predictors. Using the integrative antigen presentation model as a pre-trained model, we leveraged deep transfer learning into a curated dataset of immunogenicity to develop a new immunogenicity predictor. The training strategy involved fine-tuning the final three dense layers of the mixed prediction model using the immunogenicity dataset (ImmuneApp-Neo). We chose the five models with the best performance as the base model for transfer learning. In this study, neural networks are built with Keras 2.3 (https://keras.io/) and the Tensorflow backend in Python 3.7. To accelerate the gradient descent during training, we run on an NVIDIA Tesla T100 GPU server with CUDA 7.5 on our GPU clusters.Comparison to existing methodsTo further assess the performance of our models, we conducted several benchmarking analyses using external immunopeptidomics samples, encompassing both mono-allelic and multi-allelic datasets, as well as neoepitope immunogenicity data. We compared with seven methods: NetMHCpan-4.149, MHCflurry 2.051, MixMHCpred 2.1 and 2.250, HLAthena22, MHCnuggets-2.471, TransPHLA72, and PRIME 1.0 and 2.050. The PRIME tool was added only for the comparison of immunogenicity. These methods are well-established and widely used in the field. Both percentile rank outputs and prediction scores were used for comparative analysis. Three evaluation metrics, AUROC, AUPRC, and PPV, were calculated as follows:$${{{\rm{Sensitivity}}}}=\,\frac{N\left({{{\rm{correct}}}}\; {{{\rm{predicted}}}}\; {{{\rm{hits}}}}\right)}{N\left({{{\rm{all}}}}\; {{{\rm{hits}}}}\right)}$$
(10)
$${{{\rm{Specificity}}}}=\,\frac{N\left({{{\rm{correct}}}}\; {{{\rm{predicted}}}}\; {{{\rm{decoys}}}}\,\right)}{N\left({{{\rm{all}}}}\; {{{\rm{decoys}}}}\right)}$$
(11)
$${{{\rm{Recall}}}}=\,\frac{N\left({{{\rm{correct}}}}\; {{{\rm{predicted}}}}\; {{{\rm{hits}}}}\right)}{N\left({{{\rm{correct}}}}\; {{{\rm{predicted}}}}\; {{{\rm{hits}}}}\right)+\,N\left({{{\rm{incorrect}}}}\; {{{\rm{predicted}}}}\; {{{\rm{decoys}}}}\right)\,}$$
(12)
$${{{\rm{Precision}}}}=\,\frac{N\left({{{\rm{correct}}}}\; {{{\rm{predicted}}}}\; {{{\rm{hits}}}}\right)}{N\left({{{\rm{correct}}}}\; {{{\rm{predicted}}}}\; {{{\rm{hits}}}}\right)+N\left({{{\rm{incorrect}}}}\; {{{\rm{predicted}}}}\; {{{\rm{hits}}}}\right)}$$
(13)
$${{{\rm{PPV}}}}=\,\frac{N\left({{{\rm{correct}}}}\; {{{\rm{predicted}}}}\; {{{\rm{hits}}}}\,\right)}{N\left({{{\rm{all}}}}\; {{{\rm{hits}}}}\; {{{\rm{predicted}}}}\; {{{\rm{positive}}}}\right)}$$
(14)
N represents the total predicted results. AUROC scores were derived from the area under the curves representing sensitivity and 1 − specificity. AUPRC scores were determined from the area under the precision and recall curves. PPV highlighted the predictor’s ability to prioritize true hits.Benchmarking of antigen presentation predictionThe MONOALLELIC-testing dataset comprises 43,866 HLA-I ligands. This dataset was excluded from the training sets of all previous predictors so that it could provide an unbiased evaluation. A set of randomly selected peptides from the human proteome was utilized as negatives, with a 50-fold excess, to compute AUROC, AUPRC, and PPV for all predictors evaluated in this study (ImmuneApp-EL and ImmuneApp-MA) and other tools, including MixMHCpred 2.1&2.2, HLAthena, NetMHCpan-4.1, MHCflurry 2.0, MHCnuggets-2.4, and TransPHLA.Benchmarking of immunopeptidomics deconvolutionTo assess model performance with multi-allelic samples, 435,397 eluted ligands obtained from 47 recently published samples (the MULTIALLELIC-Recent benchmark) were used. These ligands were considered positives and were combined with a large number of randomly selected peptides. We evaluated our approach and other tools in two ways. First, AUROC, AUPRC, and PPV were calculated to assess the effectiveness of predictors in identifying true ligands within large random peptide libraries. Moreover, we examined the congruence between HLA binding motifs obtained through deconvolution and motifs identified by single-allelic ligands. Average PCC values among alleles were calculated. We assessed our methods in comparison with NetMHCpan4.1 and MixMHCpred 2.2, which employ NNalign-MA and MixMHCp for immunopeptidomics deconvolution, respectively.Benchmarking of neoepitopes immunogenicityTo assess the potential clinical significance, we performed a comparative analysis of all predictors developed in the present study against peer tools for screening immunogenic neoepitopes. The dataset under evaluation included 349 immunogenic and 1838 non-immunogenic neoepitopes collected from diverse databases and studies (IMMUNOGENIC-testing data). It is crucial for antigen prediction methods to prioritize a significant proportion of immunogenicity in their top-ranking prediction scores, as only a select few candidate neoantigens ranked at the top undergo clinical testing and practical application. Therefore, PPV was computed to evaluate the immunogenicity prediction for ImmuneApp-Neo, ImmuneApp-MA, ImmuneApp-EL, PRIME 1.0 and 2.0, HLAthena, NetMHCpan-4.1, MixMHCpred 2.1&2.2, MHCflurry 2.0, TransPHLA, and MHCnuggets-2.4.Implementation of ImmuneApp online platformImmuneApp implements four main modules: “Discovery”, “Analysis”, “Results” and “Controller”. In the backend, three well-trained deep learning models (ImmuneApp_BA, ImmuneApp_MA and ImmuneApp_Neo) are used for the predictions of binding affinities, ligand probabilities, and immunogenicity as well as immunopeptidomic analysis, respectively. The “Controller” module checks the input data format, sends the data from frontend interfaces to the backend, creates the results using models, and then provides the results on the “Results” page. The “Discovery” module accepts two input types: “FASTA” and “Peptide”. Users can directly copy the input data to an online submission text box. Moreover, MHC molecules and the peptide length (only FASTA input) need to be specified for running prediction. The “Analysis” module accepts clinical immunopeptidomic samples as input, together with MHC molecules. The input sample(s) can be directly copied to an online submission text box or uploaded from the user’s local disk. Sample identity should be specified. This module provides intuitive report for personalized analysis, statistical reports, and visualization of results for immunopeptidomic data. We implemented both pages in a responsive manner by using the HTML5, CSS, Bootstrap3, and JavaScript. Additionally, the “Controller” is called through Ajax technology to submit jobs, retrieve data, and show results. There is no limit to the number of tasks submitted by each user. ImmuneApp can automatically handle the jobs in a queue, which allows up to five jobs to execute concurrently.Motif analysis and discovery for immunopeptidomics dataWe implemented both unsupervised gibbscluster and supervised allele-specific approaches for motif analysis. The unsupervised GibbsCluster employs a standard GibbsCluster execution utilizing all available peptides. For this analysis, the parameters set were based on the recommended defaults for class I peptides provided by the GibbsCluster-2.0 server: “-g 1-6 -T -j 2 -C -D 4 -I 1”. The grouping exhibiting the highest Kullback–Leibler distance (KLD) score will be detailed in the report. Additionally, the allele-specific approach relies on the outcomes of our model-driven deconvolution method, which transforms immunopeptidomics into pseudo-mono-allelic data for each allele. For peptides not predicted to bind to any allele, GibbsCluster was executed with the previously mentioned parameters and a range of “-g” values from 1–5. This approach enabled GibbsCluster to identify multiple groups within these unannotated peptides, with the grouping displaying the highest KLD score being highlighted.Reporting summaryFurther information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Hot Topics

Related Articles