The integrated molecular and histological analysis defines subtypes of esophageal squamous cell carcinoma

Transcriptomics subtypes of ESCC with distinct prognosisTo fully understand the transcriptomic heterogeneity and molecular signatures of ESCC associated with differences in prognosis, we investigated the transcriptomic landscape of 120 treatment-naive ESCC tumours prospectively collected under strict protocols (Supplementary Data 1). Unsupervised clustering of RNA-seq profiling using non-negative matrix factorisation revealed four stable clusters (Fig. 1a and Supplementary Fig. 1a, Supplementary Data 2). Functional annotation of representative genes in each cluster annotated these subtypes as ‘differentiated’, ‘immunogenic’, ‘metabolic’ and ‘stemness’ (Fig. 1b, c, Supplementary Data 2 and 3). Gene signatures of these four subtypes were validated in three independent patient cohorts12,13,14 (Supplementary Fig. 1b). Keratinocyte differentiation and epidermis development genes such as LCE3D, CDSN and KLK5 defined the differentiated subtype. B-cell surface markers MS4A1, CD79A and MZB1 and T-cell chemokine ligands CXCL9, CXCL13 and CXCL10 characterised the immunogenic subtype. The metabolic subtype is associated with the upregulation of genes involved in drug metabolism by cytochrome P450 and retinol metabolism, such as GSTA1, ADH7, UGT1A10 and UGT1A3. High expression of WFDC2, PEG10, Wnt signalling modulator SFRP1 and squamous cell carcinoma (SCC) stem cell marker LGR615 defined the stemness subtype (Fig. 1a and Supplementary Data 2). All immune-associated pathways, such as the interferon-gamma pathway, TCR pathway and chemokine-signalling pathway, were significantly downregulated in the stemness subtype (Fig. 1b). The transcription factor (TF) profiling further highlighted subtype-specific TFs, including the upregulated activity of MYB, SOX10, SP5 and ARNT2 in the stemness group (Supplementary Fig. 2).Fig. 1: Transcriptomic subtypes of Chinese ESCC.a Four distinct transcriptomic subtypes were identified using non-negative matrix factorisation (NMF). The expression heatmap of all representative genes from the four clusters is displayed, and the top five representative genes are shown next to each cluster. Each row represents a representative gene, and each column represents a patient. b Heatmap of top enriched pathways for each subtype is shown. Each row represents a significant pathway curated from the mSigDB database (v.6.2). Four subtypes are shown in the column. The ‘−log10’ transformed p-values from the hypergeometric test were used to generate this heatmap. Red indicates that the pathway is highly enriched for the gene set. Blue indicates that no enrichment was observed for the gene set. For the stemness subtype, results of three mostly downregulated pathways, ‘interferon gamma signalling’, ‘TCR pathway’ and ‘chemokine signalling pathway’ from GSEA are shown. Normalised enrichment scores (NES) and FDR values are also displayed. c Representative histopathology images for the four subtypes are shown. A deep-learning model was developed to extract and compare subtype-specific histological features based on histology slides. The high-magnification pictures were shown with arrows indicating their locations in the slides in the right panel. These features clearly discriminate the molecular subtypes. d A Kaplan–Meier curve is shown comparing patients from the four subtypes with a log-rank p-value calculated. e A Kaplan–Meier curve is shown for patient samples with high and low stemness signatures in an independent cohort (n = 63). The stemness signature was measured as the average expression readout of four genes, LGR6, VWA2, WFDC1 and SFRP1, by RT-PCR. The patients were split into high and low groups based on an optimal cut-off with R survminer package (see Methods). For all survival curves, significance was determined using a two-sided log-rank test. f The effect of SFRP1 overexpression (in KYSE-70, n = 6) or knock-down (in KYSE-520, n = 6) on the tumour growth of ESCC was evaluated by the tumour growth of SFRP1-modified ESCC cells in immune-deficient mice. All mice in the overexpression group developed tumours, while two mice in the knockdown group had no tumour formation. The tumour size is presented at the end time point of the study (30 days after transplantation of the ESCC cells). The box bounds the interquartile range divided by the median, with the whiskers extending to the min and max values. Significance was determined using a two-sided Wilcoxon test. Source data are provided as a Source Data file.We next utilised a large single-cell RNA-seq data set of samples from 60 individuals with ESCC16 to validate our subtype specific gene signatures and identify associated cell types in the cancer tissue. Out of the 208,659 cells, 44,122 were epithelial cells that were dominantly cancer cells (Supplementary Fig. 3), and signature genes from differentiated, metabolic and stemness subtypes were mainly expressed by epithelial cells, while most signature genes of our immunogenic subtype were all expressed by non-tumour cells. For example, MS4A1, CD79A and MZB1 were expressed by B cells, CXCL9 was expressed mainly by myeloid cells, with some in fibroblasts, endothelial and pericytes (Fig. 1a and Supplementary Fig. 3b). To further dissect the heterogeneity of ESCC epithelial cells, the NMF clustering (with k = 10 factors) was performed on epithelial single cells to identify diverse transcriptional programmes (Supplementary Fig. 4). Based on shared signature genes and pathway activities, their corresponding programmes of Zhang et al.16, and transcriptomic subtypes were assigned. Reassuringly, all previous eight expression programmes of epithelial cells16 were identified in NMF programmes, and these NMF transcriptional programmes seemed to capture all our four transcriptomic subtypes derived from bulk RNA-seq. For example, the NMF cluster 5 and 10 corresponded to our differentiated subtype and the epithelial differentiation (Epi1/2) programme identified by Zhang et al.16, with the overlap of many signature genes, such as LGALS7, LGALS7B, KRT16, KRT6B/C, FABP5 and LY6D of the Epi1 programme, S100A7/8/9, SPRR1A/B and SPRR2D of the Epi2 programme. Our metabolic subtype corresponded to the NMF cluster 4 and the oxidative stress or detoxification (Oxd) programme, with shared genes as CES1, ALDH1A1, ALDH3A1, AKR1C1/2/3 and GPX2. The NMF cluster 6 had the most activated immune and cell adhesion pathways, and shared many mucosal immunity-like (e.g., S100P, CXCL17, AGR2 and MUC20) and antigen presentation programme genes (e.g., CD74, HLA-DPA1, HLA-DRA/B1/B5, HLA-A/B/C and B2M). Thus, this cluster mostly likely corresponded to our immunogenic ESCC cells. Interestingly, our stemness subtype was captured by the NMF cluster 1 with many shared genes, such as SFRP1, COL9A3, WFDC2 and LGR6, although this was not characterised by the eight epithelial programmes of Zhang et al., (Supplementary Fig. 3e and Supplementary Fig. 4). Cluster 1 also had significantly upregulated Wnt signalling and NCAM1 interactions, and the most downregulated keratinisation/cornified envelope and metabolism pathways, which were all signature pathway activities for the stemness subtype. Therefore, the single-cell results further validated our findings derived from bulk tissue RNA-seq and supported our four distinct transcriptomic subtypes.Importantly, the stemness subtype was associated with the worst overall survival of all subtypes (Fig. 1d, log-rank test P = 0.028). The significance as a prognostic biomarker of the top four genes (WFDC1, SFRP1, LGR6 and VWA2) related to the stemness signature was further proven in an independent cohort of 65 ESCC patients using RT-PCR (Fig. 1e, P = 0.031). Subsequently, we investigated the SFRP1 expression in ESCC by Immunohistochemistry (IHC) and Western blot assay. The frequency of SFRP1 protein expression was low in human primary ESCC tumour, showing that SFRP1 protein was positive in 4.3% (3/70) of ESCC tumour tissues and no positive in the matched normal samples, as demonstrated in Supplementary Fig. 5a, b, SFRP1 exhibited positive expression in part clinical specimens and cell lines. Further functional experiments demonstrated that the overexpression of SFPR1 in KYSE-70 and KYSE-140 cell lines could significantly increase cell proliferation in vitro or xenograft tumour growth in vivo, while SFRP1 knockdown in KYSE-450 and KYSE-520 cells exhibited the opposite effects (Fig. 1f and Supplementary Fig. 5c–j).Distinct histological features among transcriptomic subtypesAs we also matched Hematoxylin and Eosin (H&E) stained pathology slides for samples that were profiled by sequencing, we next explored if there were any unique histological features specific to each molecular subtype. To quantify and compare the histopathological data/features extracted from the scanned H&E histology images, we developed a deep-learning model using five state-of-the-art convolutional networks, namely Inception-V3, Inception-ResNet-V2, DenseNet-121, VGG16 and ResNet-50, and performed feature extraction on selected tiles from each whole slide image (WSI) (Supplementary Fig. 6a). This model diversity can enhance performance by capturing different predictive elements and building more enriched representations into the system. This ensemble approach was reported to retain more informative features for the final retrieval and achieve better accuracy than a separate feature extraction17,18,19. We then compared features among the four transcriptomic groups (Supplementary Data 4), and identified meta features strongly associated with each group (i.e., combined features that were significantly higher in one group compared to the rest) (see Methods, Supplementary Fig. 6, Supplementary Data 5). Imaging tiles with the highest scores of each feature were selected for review by a pathologist (Fig. 1c). Indeed, tiles with the highest subtype-specific features all contained the distinct histopathological features corresponding to each subgroup (Fig. 1c, Supplementary Fig. 6b). ‘DIFF-Feature’, characterised by keratin pearls within tumours, is significantly higher in the differentiated samples than the non-differentiated samples. The ‘MET-Feature’, marked by eosinophilic cytoplasm with less immune cell infiltration, is higher in metabolic than in the immunogenic, stemness and differentiated groups. By contrast, the ‘IMM-Feature’ with extensive immune cell infiltration within the stroma of tumours, is higher in the immunogenic than in the non-immunogenic groups. Finally, the STEM-Feature associated with poorly differentiated tumour cells and few immune cells infiltrating within the tumour is the highest in the stemness group compared to other groups (Fig. 1c and Supplementary Fig. 6b, c).Tumour immune cell infiltration and tumour cells expressing NK marker genesWe next examined the tumour immune environment of patient tissues based on RNA-seq and determined how this correlated with the four transcriptomic subtypes and prognosis. Several in silico immune deconvolution published methods20,21,22,23,24,25 were benchmarked against molecular protein staining of CD4+ and CD8+ cells, tumour cellularity and copy number profiles (Methods, Supplementary Fig. 7), and the best-performing method, the Danaher et al.22 signature was used to investigate the tumour-infiltrating immune cell populations within the 120 ESCC tumours. Consensus clustering based on the immune cell estimates revealed three distinct clusters: C1 is associated with a generally low level of immune cell infiltration, but with relatively high levels of neutrophils, dendritic and mast cells; C2 is marked by a high level of immune infiltration of B cells, T cells and macrophages; C3 is correlated with a low level of infiltration of all immune cell populations, except for NK cell markers (Fig. 2a). The C3 cluster is significantly associated with high expression of NK cell markers XCL1, XCL2 (Fig. 2b) and CD160 (Supplementary Fig. 8a).Fig. 2: Heterogeneity of immune cell infiltration in ESCC.a The immune cell infiltration profiling across our cohort is shown, clustered by the level of estimated immune cell infiltration. Each row represents an immune cell type as estimated by the method used by Danaher et al. Immune cells are natural killer (NK) cells, neutrophils, B cells, macrophages, CD4+ mature cells, regulatory T cells (Treg), CD56dim NK cells, total T cells (T cells), CD8+ T cells, cytotoxic cells, exhausted CD8+ T cells (exhausted CD8), dendritic cells (DC) and mast cells. Consensus clustering was performed. Each column represents a patient sample. Three immune infiltration clusters were identified: C1, C2 and C3. b Levels of gene expression of XCL1 and XCL2 for n = 120 samples are shown among the three immune subtypes as a box and whisker plot. Significance in each pairwise comparison is shown using the two-sided Wilcoxon rank-sum test. c IHC analysis revealed that the C2 immune subtype had significantly increased levels of CD8 (67 samples) and CD56 (75 samples) expression. Significance was determined using a two-sided Wilcoxon test; *p < 0.05, ***p < 0.001. d The survival analysis of all profiled immune cell types against overall survival for 102 samples is shown. The hazard ratio (HR) derived from the multivariate Cox regression model is shown as a whisker plot. The blue square indicates the HR value, and the error bars represent 95% confidence intervals. Significance is determined using a two-sided log-rank test (■ p < 0.1; * p < 0.05). e A Kaplan–Meier curve is shown for NK cell estimates against overall survival for our cohort (China, 102 samples) and TCGA (90 samples). Multivariant survival analysis was performed for the China cohort. HR and p-value derived from the log-rank test are shown. f The number of cases of the four transcriptomic subtypes is shown among the three immune subtypes C1, C2 and C3. Fisher´s exact test was used to test if there is any difference in the proportion of transcriptomic subtypes between different immune subtypes (**** p < 0.0001). g The scatter plot of expression levels between LGR6 and three NK cell markers, XCL1, XCL2 and CD160, is shown. Two-sided Pearson’s correlation coefficient and associated p-value are displayed. h IHC (Immunohistochemistry) staining of XCL1 and LGR6 from one patient, Sample 427, and IHC of CD160 and LGR6 from a different patient, Sample 341, are shown. The IHC results show that XCL1 and LGR6, CD160 and LGR6 are co-expressed in tumour cells. Furthermore, to provide a more comprehensive understanding of our findings, we included a larger visualisation of IHC results depicting CD160, LGR6, XCL1, and CD56 in both normal control and tumour samples for Sample 333 in Supplementary Fig. 11a. In b, c, the box bounds the interquartile range divided by the median, with the whiskers extending to a maximum of 1.5 times the interquartile range beyond the box. Source data are provided as a Source Data file.We further performed IHC of immune cell markers of CD4, CD8 and CD56 (Supplementary Fig. 9), quantified and compared the IHC measurements among the transcriptomic and immune subtypes. The results demonstrated that the immunogenetic subtype and the C2 subtype indeed had the highest level of CD8+ and CD56+ cell infiltration, slightly less so for CD4+ cells (Fig. 2c, Supplementary Fig. 10a, b). It is noteworthy that the C3 cluster had a low level of CD56 by IHC, but high mRNA levels of XCL1, XCL2 and CD160. The mRNA expression of other NK markers, such as NKG7 and KLRC1, was the highest in the C2 cluster but low in C3 (Supplementary Fig 8a). This interesting but conflicting observation of NK marker genes warrants further investigation later.The prognostic values for all profiled immune cells were then assessed in our cohort using a multivariate Cox regression analysis, accounting for patient age, drinking and smoking history, tumour stage and grade, and the higher NK cell abundance was the strongest poor prognostic factor (Fig. 2d, e (China), log-rank test P = 0.019). This negative correlation with overall survival for NK cell markers was also seen in the TCGA cohort of 90 ESCC samples (Fig. 2e (TCGA), P = 0.015). The comparison between the four transcriptomic subtypes and three immune profile clusters demonstrated a non-random correlation (Fisher’s exact test, p = 9.37e−11). Fifteen of 31 cases (48.4%) from the C3 cluster were stemness tumours, while the differentiated and immunogenic subtypes were overrepresented in the C1 and C2 clusters, respectively (Fig. 2f). Of note, we also observed positive correlations of mRNA expression between LGR6 and XCL1 (r = 0.40, p < 0.0001), XCL2 (r = 0.39, p < 0.0001) and CD160 (r = 0.32, p = 0.0003) (Fig. 2g), suggesting a degree of certain association between stemness and NK cell estimates.To further investigate this correlation, IHC staining for XCL1, CD160 and LGR6 was performed in the matched serial sections of tumour tissues in order to determine the spatial composition and cell types that expressed these markers (Supplementary Fig. 11). Surprisingly, XCL1 and CD160 were predominantly expressed in tumour cells (Fig. 2h), with a few positive immune cells infiltrated in the stroma. We also observed a positive correlation between XCL1/2 gene expression and tumour cellularity based on sequencing data (Supplementary Fig. 8b). XCL1 or CD160 expression co-localised with LGR6 expression (Fig. 2h, Supplementary Fig. 8c), and co-expression within tumours was seen in 16.3% (16/98) of our cohort for XCL1 and LGR6, and 27.8% (25/90) for CD160 and LGR6 (Supplementary Data 6 and 7). When assessing all 98 samples with available IHC, the LGR6 and XCL1 IHC staining appeared to be significantly associated (co-expressed, two-tailed Fisher’s exact test, P < 0.0001), while assessing all 90 samples with IHC, LGR6 and CD160 staining was not significantly associated (P = 0.93, Supplementary Data 7).Of note, XCL1 was predominantly expressed in cancer cells showing adenocarcinoma morphology and dysplastic cells in the submucosa glands, while CD160 could be expressed in both squamous carcinoma and adenocarcinoma cells (Fig. 2h, Supplementary Fig. 8c) as well as in the proliferative and dysplastic cells of the submucosa glands (Supplementary Figs. 8d and 11). Interestingly, we observed that XCL1-expressing dysplastic cells in the submucosal gland in most of the cases are completely separated from the squamous cell carcinoma, suggesting that this subgroup of patients might concurrently have both squamous cell carcinoma and adenocarcinoma, or this adenosquamous carcinoma might be derived from submucosa gland epithelial cells. Our finding of tumour cells overexpressing NK cell markers XCL1/2 and CD160 correlating with the lowest level of immune cell infiltration suggests a tumour-intrinsic immune evasion mechanism of ESCC. Furthermore, we also found that both XCL1 and XCL2 had much more elevated expression in patient tumour samples compared to their matched normal (P < 0.0001), and within tumour samples, high XCL1 expression was significantly associated with worse overall survival (Supplementary Fig. 12). All these observations suggest that XCL1 expressed by tumour or dysplastic cells may have a tumour-promoting role in ESCC.Molecular characteristics of XCL1-high ESCC tumour cellsCharacterisation of tumour cells expressing NK marker genes, especially XCL1, is essential to uncover molecular mechanisms and pathways which may offer alternative therapeutic targets and candidate drugs for this subgroup of ESCC. We first utilised the Cancer Cell Line Encyclopaedia (CCLE) RNA-seq data to analyse the association of XCL1/2 with ESCC. Interestingly, EC cell lines had the highest level of XCL1 mRNA expression out of all cancer types profiled (Fig. 3a). Hierarchical clustering based on differentially expressed genes (n = 413, P < 0.01) between XCL1 high (n = 11) and low (n = 11) ESCC cell lines clearly separated the two groups (Fig. 3b, Supplementary Data 8). The GSEA results further showed that XCL1-high cells exhibited upregulation of drug metabolism cytochrome P450, retinol metabolism and biological oxidation pathways (false discovery rate, FDR < 0.0001), while asparagine N-linked glycosylation, N-Glycan biosynthesis, cytosolic tRNA aminoacylation and signalling by NTRK1 were highly over-represented in XCL1-low cells (Fig. 3c). It is worth noting that different subsets of genes contributed to the upregulation of drug metabolism by cytochrome P450 and retinol metabolism were seen in XCL-high cells and the metabolic subtype (Supplementary Fig. 13). We also observed a significantly positive correlation in mRNA expression between XCL1 and LGR6 for ESCC cell lines (Supplementary Fig. 12d, Pearson’s correlation r = 0.59, p = 0.004), further highlighting the association between NK marker XCL1 and the stemness signature in ESCC cell lines. Interestingly, XCL1-high cells also showed significantly downregulated cell cycle gene set enrichment scores compared to XCL1-low cells (Fig. 3d). This was further seen in the single cell data16, which showed that XCL1 positive epithelial cells (n = 515) had significantly lower cell cycle gene set enrichment scores than XCL1 negative epithelial cells (n = 32,944) (Fig. 3e, Wilcoxon rank test, P = 7.48e−06 for cell cycle checkpoints; P = 0.001 for cell cycle mitotic). Although mRNA expression levels of cell cycle-related genes were reduced in XCL1-high cells compared to XCL1-low cells, XCL1 overexpression in ESCC cells did not functionally affect the cell cycle (Supplementary Fig. 14). More work is needed to further elucidate its role in the cell cycle and other pathways associated with ESCC.Fig. 3: Characteristics of XCL1-high ESCC cells.a The expression of XCL1 across all 1,019 profiled cell lines in the Cancer Cell Line Encyclopaedia (CCLE) is shown. The expression was measured in log2 transformed RPKM. The whiskers extend to a maximum of 1.5 times the interquartile range beyond the box. b The heatmap of significantly differentially expressed genes between ESCC cells of XCL1 high and low groups is shown. c The top upregulated and downregulated pathways (GSEA) derived from the XCL1 high and low differential expression analysis is shown. The pathways were sorted based on the normalised enrichment score (NES). d The cell cycle gene sets were downregulated in XCL1-high cells compared to XCL1-low cells based on GSEA of RNA-seq data. e The violin plots depict the distribution of cell cycle gene set enrichment levels between 515 XCL1-positive cells and 32,944 XCL1-negative cells obtained from single-cell data collected by Zhang et al.16. The levels of cell cycle activity were compared between these two groups using Wilcoxon rank-sum test (**P < 0.01, ****P < 0.0001). The inset box bounds the interquartile range divided by the median, with the whiskers extending to a maximum of 1.5 times the interquartile range beyond the box. f The cytotoxicity of 5-FU in a panel of human ESCC cell lines is shown. The cells are divided into XCL1 high (red) and low (blue) groups based on the CCLE separation. For each cell line we conducted three repeated experiments to determine the IC50, here log2 transformed mean IC50 scores and standard deviation are shown, the P value was calculated using two-sided Mann Whitney test. g The mean IC50 value of 5-FU and their standard deviation derived from three repeat experiments between control and XCL1 overexpressing cells of KYSE-150 (P = 3.11e−5), KYSE-180 (P = 6.5e−4) and KYSE-410 (P = 0.025) is shown. The IC50 difference was compared using a two-sided t-test. h The drug screening profiling between ESCC XCL1-high and low cells is shown, based on data generated by the Genomics of Drug Sensitivity in Cancer (GDSC) resource. The drugs that show significant differences in IC50 (Student’s t-test, P < 0.05) between the two groups are selected. High and low levels of resistance are indicated in red and blue, respectively. Source data are provided as a Source Data file.Given the upregulation of drug metabolism cytochrome P450 in XCL1-high ESCC cells, we next explored the drug sensitivity to fluorouracil (5-FU), the first-line chemotherapy drug for ESCC, between XCL1-high and -low cell lines. All XCL1-high cell lines tested exhibited a much higher IC50 than XCL1-low cells (Fig. 3f, Wilcoxon rank test, P = 0.007). Impressively, over-expression of XCL1 in XCL1-low cell lines, KYSE-150, KYSE-180 and KYSE-410, resulted in a significant increase in IC50 of 5-FU compared to their control cells (Fig. 3g, Supplementary Fig. 15a, b) while overexpression of XCL1 did not affect the cell proliferation (Supplementary Fig. 15d). We then explored the cancer drug sensitivity data of 367 compounds from the Genomics of Drug Sensitivity in Cancer (GDSC) resource26. Compared to XCL1-low cells, XCL1-high cell lines displayed a higher level of resistance to almost all drugs screened (Fig. 3h and Supplementary Data 9, IC50 z-score t-test p < 0.05). XCL1-low cell lines exhibited a significantly higher sensitivity to Wnt-C59, LGK974 (both targeting PORCN and WNT signalling) and CP724714 (targeting ERBB2 and RTK signalling). Deregulated pathways associated with XCL1-high cells may represent therapeutic targets for this subtype of ESCC with poor prognosis.The genomic landscape among molecule subtypes of ESCCGiven the heterogeneous transcriptomic and immune signatures observed in our cohort, we next examined whether the genomic landscapes differ among ESCC subtypes. Whole-exome sequencing of 103 samples with matched RNA-seq was performed and analysed (Supplementary Data 10). We identified, on average, 358 somatic mutations (range 21–3583) and 152 non-silent mutations (range 9–1242) per sample, respectively. Overall and subclonal mutation burden and tumour purity were not correlated (Supplementary Fig. 16a). The overall mutational load, the number of non-silent mutations, and somatic copy number profiling were similar among the four ESCC subtypes (Supplementary Fig. 16b). Three driver gene detection methods, MutsigCV27,28, dNdScv29, OncodriveFM30, were applied, and 10 ESCC driver genes were identified by at least two methods (Supplementary Fig. 16c), which had all been detected in ESCC previously5,7,8,9,31,32. Incorporating representative genomic alterations of the three molecular subtypes ESCC1/2/3 previously identified from 90 TCGA ESCCs13 and driver genes that have been implicated in a large series of Chinese ESCCs5,32, we found that although the alteration frequencies (i.e., mutations, amplifications and deletions) of most significant genes in ESCC were similar among our subtypes, the stemness type had a higher frequency of NOTCH1 and EP300 alterations, affecting 39% and 26% of stemness tumours, respectively (Fig. 4a, Fisher’s exact test, P = 0.018 and 0.008 for NOTCH1 and EP300, respectively). Interestingly, no alterations in NOTCH1 and EP300 were detected in the metabolic subtype, and no ZNF750 alterations were detected in the differentiated subtype. Stemness tumours seemed to have the highest overall frequencies of ESCC1 and ESCC2 alterations (74% and 65% of cases, respectively). Across our cohort, although the highest overall frequency was observed for ESCC1 alterations (61%), other subtype alterations also occurred substantially, 50% for ESCC2 and 52% for ESCC3 (Supplementary Fig. 16d).Fig. 4: The landscape of genomic alterations in ESCC.a The genomic landscape of ESCC driver genes and previously reported ESCC subtype-specific genes is shown among four transcriptomic subtypes in our cohort. ESCC1/2/3 denotes the ESCC subtypes identified by the TCGA 2017 EC study. Additional significant genes reported from previous large Chinese ESCC cohort studies were also included. For copy number aberrations, only amplifications and deletions were included. Cases of copy gain or loss were not counted. b The genomic landscape of significant ESCC genes across XCL1-high and low ESCC cell lines. c The overall survival of EP300-mutated (Mut) versus wildtype (Wt) cases. d The overall survival of EP300 and/or CREBBP mutated (Mut) versus wildtype (Wt) samples. e The EP300 gene expression levels for 102 samples were compared between EP300-mutated and wild-type samples using the Wilcoxon rank sum test to assess the significance of expression differences between groups. (**P < 0.01). Two cases with amplification or deletion only were excluded. The box bounds the interquartile range divided by the median, with the whiskers extending to a maximum of 1.5 times the interquartile range beyond the box. f Reference and alternative allele counts and percentages between DNA WES and RNA-seq for three EP300-mutated cases, 369, 390 and 463, with missense and splice site mutations. Fisher’s exact test was performed to determine the allelic imbalance between DNA and RNA, *P < 0.05, ***P < 0.001. Source data are provided as a Source Data file.Next, we investigated the genomic alterations of ESCC functional genes identified above across ESCC cell lines. Interestingly, in XCL1-high cells, alterations in EP300 occurred in 4/11 (36%) cases, compared to zero cases in XCL1-low cells (Fig. 4b), corresponding to our result of the highest EP300 alternation frequency in the stemness group. Whereas alterations in NOTCH1, FAT1 and KDM6A seemed to be more prevalent, 64%, 55% and 45%, in XCL1-low cells, respectively, than in XCL-high cells, 18%, 9% and 9%, respectively (Fig. 4b). EP300 is a histone acetyltransferase, and its mutation has been shown to be associated with poor outcome in ESCC and HNSCC33,34. We also observed significantly poorer overall survival for patients with EP300 mutations in our cohort (Fig. 4c, HR = 5.248, P = 8e−06). When combined with CREBBP mutations, the EP300/CREBBP mutation status was still predictive of worse survival (Fig. 4d, HR = 2.229, P = 0.02). Although CREBBP mutations were not enriched in the stemness subtype, alterations in EP300 and CREBBP tended to co-occur (n = 4) only in stemness samples (Fig. 4a). Interestingly, EP300-mutated patient samples (n = 8, 7.8%) had significantly higher EP300 expression levels than EP300-wildtype samples (Fig. 4e, Wilcoxon rank test, P < 0.01). Out of eight EP300 mutations, three single nucleotide mutations (2 missense mutations, c.4312 T > C, c.4355 C > G and one splice site mutation, c.4617+1 G > A) showed significantly elevated levels of the alternative (mutated) allele in RNA- compared to the DNA-level (Fig. 4f). While for two codon-affecting mutations, one nonsense (c.3244 C > T) and one frame-shift deletion (c.1914_1915del), the opposite pattern was observed, with decreased levels of the alternative allele in RNA than in DNA (Supplementary Data 11).
EP300 mutations and overexpression and their relationship with stemness and NK marker genes in ESCCWe next examined whether EP300 mutations and expression had any associations with stemness and NK marker genes. We first performed differential expression analysis between EP300-mutated and wildtype samples, followed by GSEA against representative (i.e., upregulated) gene sets of four transcriptomic subtypes and XCL1-high genes derived from cell lines. We found that the gene set representative of stemness subtype was upregulated in EP300-mutated compared to wildtype samples (NES = 1.36, FDR = 0.05), while all gene sets of other three subtypes were massively downregulated in EP300-mutated samples (Fig. 5a, b, FDR < 0.001, Supplementary Fig. 17a). Although there was no upregulation of XCL1-high genes, we observed significantly increased gene expression of CD160 in EP300-mutated samples (Fig. 5c, P = 0.003). Interestingly, the EP300 gene expression was also the highest in the stemness subtype, and there was a positive correlation in expression between EP300, XCL1/2 and CD160 (Supplementary Fig. 17b, 17c).Fig. 5: EP300 mutations, overexpression in ESCC and pathway functional mutation enrichment.a GSEA normalised enrichment score (NES) of EP300 Mut (mutated) versus Wt (wildtype) samples (green bars) and EP300 high versus low expression samples (red bars), against representative (upregulated) gene sets of four transcriptomic subtypes and upregulated genes in XCL-high ESCC cell lines (XCL_up). The values of FDR were shown within the bars. b GSEA plots of ‘stemness_up’ and ‘differentiated_up’ gene sets for the EP300 Mut (mutated) versus Wt (wildtype) comparison, and the ‘stemness_up’ and ‘XCL1_up’ gene sets for the EP300 high versus low-expression comparison. FDR q-values were shown. c Gene expression of CD160 between EP300 mutated and wildtype samples. Wilcoxon rank-sum test was used to compare the level between groups. d Pathway functional mutation enrichment adjusted for tumour mutation burden and the correlation between functional mutation enrichment ratio and tumour cellularity are plotted together for each hallmark gene set. P-values derived from the Kruskal–Wallis test comparing the enrichment ratio among the four subtypes were used for y-axis values. For significant hallmark gene sets, they were coloured to represent the subtype which had the highest enrichment for this gene set. e Significance P-values (−log10 transformed) comparing the enrichment scores among the four subtypes against P-values comparing GSVA values among the four subgroups are plotted for each hallmark gene set. For significant hallmark gene sets that passed both significance thresholds (P < 0.05), they were coloured to represent the subtype which had the highest enrichment for that gene set in both functional mutation and pathway expression levels. f The intra-tumour heterogeneity for 102 samples, measured as the Shannon density, is shown across the four transcriptomic and immune subtypes, using the Wilcoxon rank-sum test. g The survival analysis of the Shannon density index against overall survival is shown using a multivariate analysis. The P-value derived from the log-rank test was shown, along with the hazard ratio and 95% confidence interval. All significance is shown in the figure, *P < 0.05, **P < 0.01, and ****P < 0.0001. In c, f, the box bounds the interquartile range divided by the median, with the whiskers extending to a maximum of 1.5 times the interquartile range beyond the box. Source data are provided as a Source Data file.Next, focusing on EP300-wildtype samples, we selected the top 20 EP300-high and 20 EP300-low expression samples, and found that the stemness gene set and XCL1-high genes were significantly upregulated in the EP300-high group (Fig. 5a, b, ‘stemness up’ gene set, NES = 2.49, FDR < 0.0001; XCL1-high gene set, NES = 1.47, FDR = 0.014), whereas the differentiated gene set was greatly downregulated (NES = −3.47, FDR < 0.0001). Interestingly, we found that the metabolic gene set was significantly upregulated in EP300-high samples (NES = 2.71, FDR < 0.0001), while the immunogenic gene set was not notably affected by EP300 expression levels (Fig. 5a). Taking all these together, our results demonstrate a strong positive association between EP300 mutations, overexpression and stemness/NK marker XCL1 related signatures, suggesting that EP300 may have a role promoting tumours to become a more aggressive subtype in ESCC.Functional mutation enrichment in pathways among ESCC subtypesTo further investigate whether specific pathway-related mutations were associated with transcriptomic subtypes, we performed functional mutation enrichment analysis adjusted for sample mutation load using the 50 cancer hallmark gene sets (see Methods). We identified functionally relevant mutations in Wnt/β-catenin signalling which were significantly more enriched in the stemness samples, while functional mutations in inflammatory response, angiogenesis and hypoxia were highly enriched in the immunogenic tumours (Fig. 5d, Kruskal–Wallis test, P < 0.05, Supplementary Fig. 18a). Of note, the hallmark gene set expression profile using GSVA also confirmed the functional differences identified among the four subtypes (Supplementary Fig. 18b). We then evaluated whether the pathway functional mutation enrichment affected the expression of pathway genes, and found that mutation enrichment in Wnt/β-catenin signalling, inflammatory response and hypoxia were positively correlated with pathway gene expression activities for the corresponding ESCC subtypes (Fig. 5e), suggesting that high level of Wnt signalling expression in the stemness group could be a consequence of functional mutations in regulators of the Wnt pathway, and these functional mutations were the most enriched in stemness samples. This is also the case for the high pathway activity of inflammatory response and hypoxia for the immunogenic subtype. Interestingly, the stemness subtype also seemed to have the highest proportion of nonsense mutations in Wnt pathway genes. The missense mutations in the Wnt signalling pathway were highly deleterious, as predicted by SIFT35 and PolyPhen-236, and the stemness group seemed to have the highest proportion of deleterious Wnt mutations (93%), significantly higher than that of the differentiated subtype (60%, Fisher’s exact test, P = 0.035, Supplementary Fig. 18c).Clonality analysis among molecule subtypes of ESCCGiven the well-documented interplay between immune infiltration and tumour clonal evolution37,38, we finally investigated the level of intratumoural heterogeneity (ITH, measured by the Shannon diversity index) based on variant allele frequencies of all mutations in each tumour among the immune and transcriptomic subtypes. Our results showed the greatest ITH in the C3 cluster, whereas in C2 where the immune infiltration was the highest, the diversity was greatly reduced (Fig. 5f). In addition, the highest ITH presented in the stemness subtype while the immunogenic subtype had the lowest ITH (Fig. 5f). The negative correlation between ITH and immune infiltration was further confirmed by IHC of CD8 and CD56 (Supplementary Fig. 10c). Our results suggest that the immune microenvironment strongly influences the tumour evolution and (sub)clonal architecture. The stemness / C3 group with the highest level of immune exclusion also had the highest tumour (sub)clonal diversity. Importantly, high ITH was also a significant marker of poor prognosis in ESCC (Fig. 5g, multivariate analysis, hazard ratio HR = 2.214, log-rank P = 0.007).

Hot Topics

Related Articles