MHC-I-presented non-canonical antigens expand the cancer immunotherapy targets in acute myeloid leukemia

ncMAPs are indispensable tumor neoantigensRecent studies have highlighted the importance of ncMAPs7,16, which represent a rich source of tumor neoantigens. A plethora of non-canonical translation products of transcripts have been discovered by Ribo-seq, including lncRNAs, pseudogenes and UTRs of coding genes, and out-of-frame alternative ORFs in annotated protein-coding genes. We collected and integrated the non-canonical ORFs supported by Ribo-seq from RPFdb17, nuORFdb11 and TransLnc14, which were used as a customized ncORF reference library to expand immunotherapy targets in AML. To comprehensively identify the MAPs in cancer, we integrated the human proteome obtained from UniProt with our customized ncORF reference library. We next performed MHC-I immunopeptidome MS/MS spectra searching to identify MAPs in 19 AML patients derived from all possible genomic regions, including non-coding regions (Fig. 1A). We used three methods (MixMHCpred, MHCflurry and NetMHCpan) to predict the binding between peptides and MHC-I. The peptides predicted to bind MHC-I by at least two methods were considered as MAPs (Fig. 1A). As a result, we identified 6,426 cMAPs and 1,838 ncMAPs in AML, respectively (Fig. S1A,B). A similar number distribution was discovered for identified cMAPs and ncMAPs from all samples. Moreover, we found that 49.1% of MAPs can be detected in published mono-allelic MHC-I MS data (Fig. S1C), indicating the high reliability of both cMAPs and ncMAPs11.Fig. 1Comparison of different characteristics between ncMAPs and cMAPs. (A) The workflow for identification of cMAPs and ncMAPs. (B) Difference of main characteristics between ncMAPs and cMAPs which was calculated as the median of cMAPs minus the median of ncMAPs. Red points indicate difference in favor of ncMAPs, blue points indicate in favor of cMAPs, and grey indicates no difference. (C,D) Boxplot showing the difference of PEP (posterior error probability) and T-cell contact hydrophobicity between ncMAPs and cMAPs. The PEP is a statistical measure that estimates the probability of a peptide being wrongly identified in proteomics analysis with MaxQuant. The Wilcoxon test was utilized for statistical analysis. (E) Pearson correlations between observed retention times and predicted retention time. (F) The length distribution of source ORFs. (G) The percentage distribution of the number of MAPs identified across different numbers of samples. The x-axis represents how many samples MAPs will be detected in, while the y-axis displays the percentage of MAPs appearing in n samples (n = 1, 2,…, 18).Next, the characteristics of ncMAPs were compared with those derived from protein-coding regions (Fig. 1B). We found that ncMAPs exhibit the comparable characteristics as cMAPs, a little better in the predicted MHC binding affinity, %rank in the top 2% for the corresponding HLA-A and C alleles, and hydrophobicity (Fig. 1B–F). Previous studies have discovered that high degree of T-cell contact hydrophobicity and hydrophobic fraction are associated with T-cell response18,19. We found that ncMAPs exhibited higher hydrophobicity scores compared to cMAPs in T-cell contact residues, suggesting stronger immunogenicity of ncMAPs (Fig. 1D and Fig. S1D). The immunogenicity has been predicted using DeepImmuno20, BigMHC-IM21 and TLImm22. We discovered that the distribution of these features was similar between cMAPs and ncMAPs (Fig. S1E–G).We also compared the distribution of MS/MS observed MAP retention times (RT) with the distribution of RT predicted based on the DeepLC algorithm23. Significantly positive correlations between experimental and predicted RTs were observed for both ncMAPs (R = 0.96, p < 2.2e-16) and cMAPs (R = 0.97, p < 2.2e-16, Fig. 1E). In addition, we investigated the length of MAPs and found that the 9mer peptides were prevalent (Fig. S1H), which was consistent with the observations in previous reports24. ncMAPs tended to be derived from shorter proteins than cMAPs (Fig. 1F). Notably, we found that approximate 69.24% of cMAPs and 66.27% of ncMAPs were shared among at least two AML patients (Fig. 1G). Together, these results suggested that ncMAPs are not only a better source of neoantigen-targets in personalized immunotherapy treatment efforts, but also a source of shared neoantigens across AML patients.Higher generation efficiency of source genes deriving ncMAPsIt has been observed the association between the expression levels of source genes and the specificity and sensitivity of neoantigen detection6,7,25,26, we hypothesized that the characteristics of source genes might be related with the detection of ncMAPs. We first annotated the ncMAP to genomic regions and identified the source genes. As a result, 1,299 customized ncORFs were mainly from lncRNAs (39.95%), 3′ overlap dORF (19.86%) and 5′ uORF (15.78%) regions (Fig. 2A and Fig. S2A). Compared to the distribution of references, we found that ncMAPs were likely to be derived from 3′ overlap dORF and lncRNAs (Fig. 2B and Fig. S2C). It is notable that 78.37% of source genes only produced one ncMAP, whereas more than half of source genes could produce multiple cMAPs (Fig. S2B).Fig. 2Comparison of source proteins encoding ncMAPs and cMAPs. (A) Diverse non-canonical ORFs encoding ncMAPs. (B) Compared with reference library, the tendency of peptides MS-identified. (C,D) The expression levels of source genes encoding MAPs. (E–G) The expressions in TCGA samples, the length of source ORFs and the ratio of the length covered with non-coding derived ncMAPs, with protein-coding derived ncMAPs, with protein-coding derived cMAPs, or with protein-coding derived MAPs.Previous studies have been reported that the expression levels of source genes encoding cMAPs and ncMAPs were significantly different. We thus divided source genes into four groups (Fig. 2C). Indeed, the higher expressions were observed for source genes, in particular the expressions of genes generating both cMAPs and ncMAPs were the highest, followed by protein-coding genes derived cMAPs, protein-coding genes derived ncMAPs and ncORFs derived ncMAPs (Fig. 2C–E and Fig. S2D). For instance, PRPF8, as an essential RNA binding protein in pre-mRNA splicing, generates the highest number of MAPs, including 25 cMAPs and one ncMAP (Fig. S2E). Moreover, the expression levels of source genes were positively correlated with the number of MAPs encoded (Pearson correlation coefficient R = 0.32, P < 2.2e-16).We next analyzed the length of source genes and found that the ncORFs deriving ncMAPs were the shortest, whereas protein-coding genes generated both kinds of MAPs were the longest (Fig. 2F). We also calculated the generating efficiency of source genes, which were defined as the ratio of the number of amino acids detected in the immunopeptidome to the number of amino acids in the source protein. Surprisingly, the efficiency of generating MAPs exhibited the opposite trend and the ratios were much higher for ncORFs than others (Fig. 2G). These results suggested that although both low expression levels and shorter lengths of source ncORFs that generate ncMAPs, the generation efficiency of ncMAPs was much higher than those of cMAPs.Prevalence of ncMAPs and cMAPs in AMLThe HLA genes are the most polymorphic across the human population, which can complicate the design and development of worldwide covering T-cell epitope-based vaccines. Because of antigens with specific HLA-I restrictions, such vaccines may be only effective across particular populations. For the vaccine to apply to a larger population, the development of MHC-mediated peptide vaccine needs to consider the population coverage of alleles and allele binding motifs27. To evaluate the coverages of HLA alleles in the AML population, we obtained the global frequencies of sample-specific HLA alleles from IEDB28. We observed that the cumulative coverages were high, reaching 90.3%, 76.1% and 94.0% for HLA-A (n = 11), HLA-B (n = 21) and HLA-C (n = 13) alleles (Fig. 3A), respectively. Moreover, MAPs presented by the same allele tend to be prevalent among patients. For example, more than 1,500 MAPs were presented by HLA-A*02:01, and >75% of these MAPs occurred in at least two samples (Fig. 3A). In addition, approximate 60% MAPs presented by HLA-A*01:01 were observed in at least two AML patients. These results indicated the prevalence of shared MAPs across AML patients.Fig. 3The distribution of MAPs across patients in AML. (A) The top barplot shows the proportion of MAPs presented by different alleles in the samples. The left barplot shows the proportion of MAPs shared by different samples. The middle heatmap shows the percentage of allele-specific presented MAPs to total MAPs in the patient. The right barplot shows the number of MAPs by HLA alleles. (B) The pairwise similarity between samples. (C) The boxplot shows the distribution of Simpson coefficient with shared 0, 1, 2 or > 2 alleles.Next, we calculated the Simpson index to further evaluate the shared degree of MAPs between patients. We found that the similarity scores were highly correlated, in particular for patients with high number of shared alleles (Fig. 3B). For instance, patients AML13 and AML17 shared 5 HLA-I alleles, which exhibited the highest similarity among all pairs of patients. Moreover, with the increasing number of shared alleles, the MAPs identified were likely to overlap with each other in AML patients (Fig. 3C). Together, these results suggested that the MAPs were prevalent across AML patients, making them potential therapy targets in clinical.Global identification of shared motifs across allelesIt is well known that the biochemical properties of the peptides are important determinants of peptide binding to HLA I molecules. We thus calculated the pairwise sequence similarity between both alleles using HLA ligands as a proxy to measure HLA similarity. The average pairwise correlation values for HLA-A, B, and C have reached 0.51, 0.48 and 0.80, respectively. We found that HLA alleles belonging to the same supertypes tend to cluster together as expected, such as HLA-B27, HLA-B44, HLA-C03 and HLA-C07 (Fig. 4A, left panel). That is, supertypes as the dominant groups exhibited higher similarity, which was consistent with previous studies29. Supertypes have been found to be associated with clinical characteristics. For example, B44 supertype was found to effectively predict the progression-free survival (PFS) and OS of immune checkpoint blockade therapy in non-small cell lung cancer and melanoma30. Meanwhile, the correlation has attained 0.98 between HLA-C*02:02 and HLA-C*12:03, indicating that they could share motifs. Moreover, there was similar situations among HLA-C*01:02, HLA-C*06:02 and HLA-A02:01.Fig. 4Alleles Similarity and identified motifs. (A) Pairwise correlations between alleles, based on a vector of frequencies of the 20 amino acids at every peptide position(left), and the Simpson coefficient (right). (B) 2D-visualization of sub-motifs identified across alleles (left), colored according to HLA locus and scaled in size according to the number of underlying peptides making up the sub-cluster. (C) Number of alleles sharing a sub-motif colored according to HLA locus.On the other hand, the enriched peptide repertoire against the specific alleles is helpful to understand the functions of MAPs in shaping the cancer immunopeptidome. We thus evaluated the similarities between alleles based on the degree of shared binding MAPs. We obtained the similar clustering results (Fig. 4A, right panel), indicating that the sharing degree of MAPs could reflect the similarity of HLA alleles to some extent. In particular, a cluster composed of HLA-A and HLA-C alleles was observed, suggesting the potential sharing mechanism of MAPs across HLA-I supertypes.To understand the preference for HLA presentation, we have compared the spectrum of distinct and shared sub-motifs across HLA-I alleles, revealing the diversity and complexity of endogenous HLA ligands in AML patients. Firstly, we explored the motifs of ncMAPs and cMAPs across different HLA-I alleles. The similar motif mode was identified between ncMAPs and cMAPs (Figure S3), but there were differences between the position probability matrix of amino acids (Figure S4D). For example, tryptophan and methionine had significant change at different HLA-binding position. Through the motif analysis, we also discovered some sub-motifs were shared among multiple HLA alleles. Next, the distance was calculated between pairwise peptides of a given length against the same allele. We exploited NMDS to reduce peptide distance matrices and DBScan to reveal the sub-motifs of peptides. In total, 254 sub-motifs were identified, which contained at least 20 peptides across alleles. To examine whether the sub-motifs detected were associated with multiple haplotypes, these sub-motifs were projected onto a 2-dimensional space to identify distinct motifs that were specific and shared among alleles (Fig. 4B,C). Of these, 188 sub-motifs were shared across alleles, and 66 were allele-specific. We identified the two largest allele-shared motifs (xLxxxxxxL and xLxxxxxxV), consisting of 23 and 18 alleles, respectively. Furthermore, we also discovered that the sub-motif (xExxxxxxY) with polar primary anchors was shared across HLA-A*26:01 and HLA-B44 supertypes (Fig. S4A). These results were consistent with previous reports that the second and C-terminal positions of epitopes are the most critical residues for HLA I binding that fit into the anchor pockets in the HLA I groove31,32. However, we also identified the some interesting sub-motifs, such as KxxxxxxxF and xxDxxxxxY, which have distinct dominant patterns different from regular p2 and p9 (Fig. S4B,C). These findings are helpful for us in identifying peptides with allele-shared motifs that enable the development of T-cell epitope vaccines with high-population coverage using a limited number of peptides. We found that although HLA-A/B alleles solely contributed half of the motifs, respectively, most of the HLA-C sub-motifs overlapped with sub-motifs from HLA-A and/or B alleles, which account for 71% of all HLA-C sub-motifs (Fig. 4C). Notably, both ncMAPs and cMAPs were merged and contributed to identifying the same anchor residue sub-motifs (Fig. 4B). Altogether, the close correlation between HLA alleles shared motifs provided new insight into why samples with only one shared allele had a high similarity (Fig. 3B).Neoantigens identified by integration of multi-omicsOwing to the characteristic of low mutational burden in AML, mutation-derived neoantigens are mostly unique to individuals. However, lots of evidence demonstrated tumor-specific antigens (TSAs) generation by cancer-specific translation may be a sizable additional source of potential neoantigens11. Given that neoantigens that aberrant translation in cancer has been more attractive targets for immunotherapy, we integrated multi-omics data to identify potential neoantigens accurately, which minimizes the risk of on-target/off-tumor adverse events by limiting the expression of neoantigens in normal tissues32. To more extensively identify the neoantigens originated from non-canonical translation, we focused on the TSAs, aberrant expressed TSAs (aeTSAs) and CTAs (Fig. 5A). As a result, we identified 21 potential neoantigens from 14 genes, there are 8 TSAs, 4 aeTSAs and 9 CTAs, 61.9% of which were derived from non-coding regions (Fig. 5B).Fig. 5Accurately identify tumor-specific antigens by integrative multi-level omics profiling. (A) Tumor specific antigen identification pipeline at the transcription level. (B) The number of identified antigens is shown according to ncMAP and cMAP. (C) Antigens are distributed in normal tissues and AML, and the multi-omics screened antigens were highlighted in red and bold. (D) Multi-omics screening antigen source gene expression levels are displayed by heatmap. Red is the example shown next. (E) ILPSYQLFL combined with HLA-A02:01 schema diagram. (F) The Scatter plot of neoantigens, x-axis is PEP, y-axis is Andromeda score. (G) ILPSYQLFL mass spectra.Furthermore, immunopeptidomes from healthy tissues were re-analyzed against the same reference libraries to further filter out tumor-associated antigens, targeting which could result in severe and sometimes fatal toxicities by T cells16,33. Finally, thirteen neoantigens from eight genes, including RP11-9L18.2, RP11-328J14.2, RP11-65E22.2, GLRX3P2, LA16c-306E5.1, HNRNPA1P38, OR10A5 and DNTT, were identified that were neither expressed in healthy tissues nor detected in normal tissue immunopeptidomes (Fig. 5C,D). For example, the protein-coding gene DNTT, encoding the highest number of TSAs, was normally expressed in primitive hematopoietic cells and B-cell progenitors, whereas it was also over-expressed in older AML patients with NPM1 wild-type reported by previous studies34.In particular, we identified a ncMAP, ILPSYQLFL, which was uniquely mapped to the pseudogene RP11-9L18.2. The robust binding to HLA-A02:01 was predicted by NetMHCpan (v.4.1)35, MHCflurry (v2.0)36 and MixMHCpred (v2.1)37 (Fig. 5E). Meanwhile, binding stability of the peptide binding to HLA-A02:01 was 3.13 h estimated by NetMHCStabPan and TCR recognition probability was 3.47*10−9 calculated using the multistate thermodynamic model. Thus, the peptide was considered immunogenic. Moreover, the ILPSYQLFL displayed the typical anchor motif xLxxxxxxL, which covered a set of 23 allele-shared sub-motifs, therefore likely to be binders and not contaminants. The PEP and Andromeda score for the best associated MS/MS spectrum of ILPSYQLFL were 0.029 and 91.069 (Fig. 5F). The peptide spectra were also detected in multiple samples/replicates and possessed high coverage across b and y ions (Fig. 5G). Other neoantigens predicted binding to HLA alleles also showed similar features (Figs. S5,6). Therefore, these results demonstrated that ncMAPs without expression and binding by MHC alleles in normal cells might be a sizable additional source of potential neoantigens and provide new antigenic targets for immunotherapy.Expressions of neoantigen source genes associated with immune evasionT cells recognize tumor cells by monitoring short peptides presented by the HLA-I molecules on the AML cell surface. In addition, macrophages and dendritic cells, as part of the innate immune cells, colud intake exogenous antigens and present them on the cell surface in the form of HLA-I-peptide complexes through the cross-antigen pathway, which could enhance the killing effect of CD8 + CTLs. It is vital to assess the association between neoantigens and the immune cell abundance8,38. Having determined that neoantigens could be presented by specific HLA alleles, we next evaluated the association of neoantigen-related genes with the abundance of immune cells across the samples using TCGA-LAML RNA-seq. The abundance of immune cells was assessed by xCell and CIBERSORT algorithm39,40. The xCell is based on ssGSEA of marker genes to estimate the abundance of immune cells, whereas the CIBERSORT measures immune cell abundance in samples based on a deconvolution algorithm. It is not difficult to find that there was a co-expression trend between genes encoding the neoantigens (Fig. 6A). In addition, these genes exhibited significant correlations with the immune score, stroma score, microenvironment score and cell stemness (Fig. 6A). The recognition of neoantigens by T-cell receptor (TCR) relies on the presentation of antigen-presenting cell (APC), thus the correlations between antigens and APCs were calculated. In fact, antigen genes were significantly related to antigen-presenting cells (e.g., macrophages, dendritic cells), T cells and other immune cells (Fig. 6B and Fig. S7A). Moreover, in order to examine whether the expressions of the neoantigen-related genes could recruit immune cells against the tumor, we also calculated the correlation between the expressions of genes and immune cell marker genes. We found that these genes were significantly related to the marker genes of CD4+ T cells, CD8+ T cells, memory T cells, helper T cells, dendritic cells, macrophages, and natural killer cells (Fig. 6C). These results indicated that the neoantigen-related genes were highly related to immune cell abundance.Fig. 6Neoantigen expressions correlated with immune evasion. (A) Correlation of expression values between genes, and Correlation between gene expression values and various indexes. ImmuneScore, StromaScore and MicroenvironmentScore were calculated by xCell. (B) The correlation between the infiltration proportion of immune cells predicted by xCell and gene expression values in AML. (C) Correlation between gene expression values and immune cell marker gene expression values. (D) ROC curve for predicting OS in AML patients by risk scores. (E) Gene expression levels are displayed according to low, middle and high groups, and the Count value represents the number of genes expressed in each sample. (F) Kaplan-Meier survival analysis between risk groups.To further explore the contributions of neoantigen-related genes to clinical prognosis, lasso regression analysis was performed on 8 genes that encode the neoantigens in AML. The risk score was obtained according to the product of the lasso regression coefficient and gene expressions, which performed well in predicting clinical outcomes, with areas under the ROC curve (AUC) of 0.631, 0.65, and 0.729 for 1-year, 3-year, and 5-year survival, respectively (Fig. 6D). We next ranked the patients by the risk scores and divided them into three groups according to the gap of scores shown by the ranking (Fig. S7B). The expression levels of neoantigen-related genes were shown in the low-, middle- and high-risk groups (Fig. 6E and Fig. S8). Clinical prognostic analysis indicated the high-risk group was associated with a poor prognosis compared with the low- and middle-risk group (Fig. 6F, P = 0.0015, log-rank test). Moreover, patients in the high-risk group had lower mRNAsi scores41 than the low-risk group (Fig. S7C, P = 0.032, Wilcoxon’s rank sum test). The abundance of granulocyte-monocyte progenitor cells in the low-risk group were significantly higher than those in the high-risk group (Fig. S7D, P = 0.038) with the consistency of previous reports42. However, the expression levels of HLA-A, B and C genes in the high-risk group were all significantly higher than those in the low-risk group (Fig. S9, P < 0.01). Meanwhile, we discovered that a higher number of neoantigens was associated with lower risk, and the expression levels of HLA-I genes were negatively correlated with the number of neoantigens. That is, although HLA-I genes were expressed in the high-risk group, the expression of neoantigens was lacking (Figs. S10,11).In considering the high-risk group exhibiting the immune escape phenomenon, we first evaluated the expression of genes encoding chemokines that recruit immune cells and immune inhibitory genes, high expression of which could change the bone marrow microenvironment and shorten the survival time of patients38. We found that the expressions of both chemokines and immune inhibitory genes in the high-risk group were higher than those in the low-risk group (Fig. S12A). There was a strong positive correlation between the risk score and the WNT-β-catenin signaling pathway (Fig. S12B), the activation of which is known to inhibit T cell activation and infiltration38. Consistent with the previous report that the IFN-γ response pathway can induce immune checkpoint protein PD-L1 in tumor cells by activating STAT1, thus leading to immune escape and tumor recurrence43, the pathway and the expressions of STAT1 were upregulated in the high-risk group (Fig. S12C,D).Taken together, we found that HLA-presented neoantigens can elicit the immune system to recognize the tumor cells. The phenomenon of immune escape occurred in the high-risk group of AML patients, therefore, WNT-β-catenin/IFN-γ inhibitors or immune checkpoint blockade treatment combined with neoantigen-targeted immunotherapies can be potentially used to treat high-risk patients.

Hot Topics

Related Articles