Dichotomous intronic polyadenylation profiles reveal multifaceted gene functions in the pan-cancer transcriptome

Alterations in cellular mTOR activity contribute to the dichotomous pattern of intronic APAIn our previous study, we discovered that the activation of mTOR is linked to the widespread occurrence of 3’-UTR APA events, resulting in increased expression of transcript isoforms with shortened 3’-UTRs throughout the transcriptome10. To explore another type of APA occurring within intron regions, namely, intronic APA events, we devised a workflow that leverages existing RNA-seq datasets and genome annotations, specifically RefSeq24 and UCSC25. This workflow enables the quantification of intronic APA and full-length transcript isoforms, allowing us to compute the truncation ratio (TR) by comparing the expression of the APA isoform to the total transcript amount (Fig. 1a).Fig. 1: Bidirectional profile of intronic APA (alternative polyadenylation) events in response to changes in mTOR signaling in cells.a A schematic representation of the profiling of intronic APA events using RNA-seq data from cell lines. The RefSeq and UCSC gene structures were merged, and the RNA-seq data were quantified based on the collected structures. The chi-square test was used to determine significant intronic APA events between the control and the case based on the truncation ratio (TR), which was calculated as follows: TR = [quantity of intronic APA transcript] / [quantity of total transcript (intronic APA + full-length)]. The intronic APA events with a chi-square test p < 0.05 and TR difference >0.1 were considered significant events. b Scatter plots of the TRs of genes with low and high mTOR expression in cells. The analyses included wild-type (WT; low mTOR) and Tsc1-/- (high mTOR) MEFs as well as the breast cancer cell lines MCF7, BT549, and MDA-MB-361. Cells were treated with 100 nM Torin 1 for 24 h to inactivate mTOR signaling in these cells. c, d Examples of RNA-seq read alignments of genes showing enriched intronic APA events in high-mTOR-Tsc1-/- MEFs and low-mTOR-WT MEFs, respectively. The alignments are color-coded to highlight the regions with intronic APA events. e Real-time quantitative PCR (RT‒qPCR) validation of genes showing dynamic intronic APA events upon changes in mTOR signaling in cells. The analyses included WT, Tsc1-/-, and Tsc1-/- cells treated with Torin 1 (100 nM, 24 h). The Y-axis scale is presented on a log scale. Four technical repeats were conducted, and Student’s t tests were performed for statistical analysis. The data are presented as the mean (SD). f Validation of protein expression of intronic APA mRNAs in WT and Tsc1-/- MEFs. (Left panel) Full-length (FL) and truncated (TR) isoforms of SIN3B and AGAP3 were detected using western blotting. (Right panel) Confirmation of FL and TR isoforms using RNAi. RNAi-mediated knockdown of SIN3B and AGAP3 FL or TR isoforms in Tsc1-/- MEFs. g A mass spectrometry approach, namely, selected reaction monitoring (SRM), was used to detect TR isoforms in the Tsc1-/- proteome. A peptide for TR PDXDC1 is highlighted as an example, with the peptide sequence originating from exonized intron sequences in the intronic APA transcript highlighted in the yellow box. The gene structure of the intronic APA transcript of Pdxdc1 is also shown. Fragmented ion spectra for the C-terminus of PDXDC1 from Tsc1-/- cell extracts and synthesized peptide are shown as an example.In our analysis, we utilized RNA-seq data obtained from two sets of mouse embryonic fibroblasts (MEFs): wild-type (WT) cells with basal mTOR activity and Tsc1-/- cells with hyperactivated mTOR10. We calculated the TR for various genes and identified significant genes by applying specific criteria (chi-square t test p < 0.05 and TR difference >0.1). These significant genes are color-coded, with blue representing low mTOR activity and red indicating high mTOR activity (Fig. 1b). Interestingly, in contrast to the observed increase in 3’-UTR shortening APA events in the Tsc1-/- transcriptome10, we observed distinct intronic APA events in both the WT and Tsc1-/- transcriptomes. Specifically, there were 188 events in WT and 253 events in Tsc1-/- MEFs (as shown in Fig. 1b and Supplementary Table 1). This dichotomous pattern of intronic APA events was similar between Tsc1-/- cells treated with Torin 1 (low mTOR activity) and mock-treated cells (high mTOR activity), with 251 events in the Torin 1 group and 90 events in the mock treatment group (Supplementary Table 2 and Supplementary Fig. 1a).To investigate the consistency of the dichotomous intronic APA pattern across species, we conducted similar experiments in human breast cancer cell lines MCF7, BT549, and MDA-MB-361. Intriguingly, we observed a consistent intronic APA pattern in all the tested breast cancer cell lines (Fig. 1b and Supplementary Table 3). Specific examples of RNA-seq read alignments and real-time quantitative polymerase chain reaction (RT‒qPCR) analyses also confirmed the presence of dichotomous intronic APA events in both high- and low-mTOR environments (Fig. 1c−e and Supplementary Fig. 1b, c). To validate the role of mTOR, we performed RNAi-mediated knockdown of the mTOR gene in Tsc1-/- MEFs, resulting in a replicated intronic APA pattern similar to that observed with pharmacological inhibition of mTOR activity (Supplementary Fig. 1d). Moreover, the datasets obtained from the three breast cancer cell lines revealed unique intronic APA events, with some overlap observed among the different cellular models (Supplementary Fig. 1e). Finally, when we conducted KEGG pathway analyses using the gene list of intronic APA events, we identified both unique and overlapping enrichment of biological pathways (Supplementary Fig. 1f). These findings suggest that intronic APA events are dynamic and have the potential to shed light on specific and common biological pathways in various cellular and biological models.To determine the translational implications of intronic APA transcripts, we conducted western blot experiments, focusing on AGAP3 and SIN3B as representative examples. These proteins were chosen because intronic APA resulted in significant alterations in their protein lengths, and the antibodies were specific to the N-terminal regions of these proteins, enabling us to distinguish between full-length and truncated forms. Consistent with our RNA-seq data, our western blot results showed increased expression of C-terminal truncated AGAP3 and SIN3B proteins in Tsc1-/- cells compared to WT MEFs, whereas the levels of full-length proteins remained unchanged (Fig. 1f). Intronic APA introduces “exonized” intron sequences into the 3’-most exon of the transcribed gene, which has the potential to give rise to C-terminal peptide sequences derived from the open reading frame through these exonized intron sequences (Supplementary Fig. 1g). Indeed, many of these translated peptide sequences from intronic APA transcript isoforms are documented in the RefSeq database24.To validate the presence of these peptides in the proteome, we employed selected reaction monitoring (SRM), a targeted peptide detection method based on liquid chromatography-tandem mass spectrometry (LC‒MS/MS) (Supplementary Fig. 1h). Using this approach, we tentatively identified some of these peptide sequences (76 peptides in 40 proteins) in the WT and Tsc1-/- cell extracts. This identification was further supported by synthesized peptides, with 10 out of 41 peptide sequences validated (Fig. 1g and Supplementary Fig. 1i). In summary, our findings suggest that intronic APA plays a role in shaping the C-terminome characteristics of the mTOR-regulated proteome.Pan-Cancer data analyses reveal distinct tumor- and normal tissue-enriched intronic APA profilesGiven the frequent dysregulation of mTOR in various cancers21,22,23,24, we hypothesized that the distinctive intronic APA pattern might also manifest in TCGA datasets. To delve into intronic APA profiles within TCGA datasets, we established a workflow aimed at examining intronic APA events and their clinical relevance across ten different cancer types, employing RNA-seq data from TCGA (Fig. 2a). We scrutinized transcript expression profiles from a comprehensive pool of 6099 samples encompassing both tumor and normal tissues across these diverse cancer types (Supplementary Fig. 2a). Through a comparative analysis of the expression profiles of intronic APA isoforms and full-length transcripts utilizing the GENCODE.v23 annotation, we identified significant intronic APA events (with a stringent p value threshold of <0.01) and illustrated them alongside their expression levels (Fig. 2b and Supplementary Fig. 2b). Using breast invasive carcinoma (BRCA) as a representative example, we identified significant intronic APA events in 3504 genes among 1061 tumor samples and 2635 genes among 109 normal samples (Fig. 2b). Our observations aligned with the findings from our prior cell line experiments (Fig. 1b) and provided evidence that widespread intronic APA events were present in both tumor and normal tissues across all analyzed cancer types (Fig. 2b and Supplementary Fig. 2b). Importantly, a substantial portion of these intronic APA events (e.g., 2070 out of 2633 cases in normal tissues and 2662 out of 3504 cases in BRCA tumors) did not exhibit differential gene expression. This observation underscores the absence of a correlation between differential gene expression and intronic APA events, underlining the limitations of conventional differential gene expression analyses in identifying intronic APA events.Fig. 2: Discrete profile of intronic APA events in TCGA data.a A workflow depicting the pan-cancer data analyses for intronic APA events. b Differential expression analyses of genes with annotated intronic APA events in normal tissues and tumor samples. Three representative TCGA datasets (TCGA-BRCA (breast invasive carcinoma), TCGA-KIRC (kidney renal clear cell carcinoma), and TCGA-LUAD (lung adenocarcinoma)) are shown. The x-axis shows the significance of intronic APA events calculated using −log10(p value), whereas the y-axis shows the fold changes in gene expression in tumors compared with normal samples. Red dots indicate the genes showing significant intronic APA events conserved in 80% or more of tumor samples (i.e., 80% or more of tumor samples have higher TRs than the mean TR of normal tissue samples). The blue dots indicate the genes showing significant APA events for which 80% or more of the tumor samples had TRs lower than the mean TR of the normal tissue samples. c H2AZ2 and LRRFIP1 genes showing intronic APA events in BRCA tumors are presented as examples. The left panel displays the expression profiles of intronic APA (int APA; red) and full-length (FL; blue) transcripts across tumor samples and normal tissues. The middle panel presents the TRs of the corresponding genes in an individual sample using a box plot. The cyan bar indicates the median TR in tumor and normal samples. The right panel shows the distribution of H2AZ2 and LRRFIP1 TRs across the samples. The fraction of samples indicates the percentage of tumors that display a higher TR compared to the median of TR in the normal samples or vice versa. d CXCL12 and DST genes showing intronic APA events in normal tissues in BRCA are presented as examples. Each panel shows the same data analyses as described in (c). e Scatter plots for intronic APA events in BRCA, KIRC, and LUAD. The TR means for genes with significant intronic APA events are color-coded. Genes showing intronic APA events in more than 88% of the samples are color-coded as blue (normal) or red (tumor). Genes with intronic APA events in 80−88% of the samples are shown in cyan (normal) or orange (tumor). f Overall distribution of intronic APA events in the TCGA pan-cancer cohort and corresponding normal tissue data. g Heatmap of the KEGG pathways enriched in intronic APA events across 10 cancer types. The KEGG pathways that are common to two or more cancers are displayed. The color scale represents −log10(p value) for pathways enriched in tumors and log10(p value) for pathways enriched in normal tissues: red-colored KEGG pathways are enriched in tumor samples, and blue-colored KEGG pathways are enriched in normal tissues. h Significant intronic APA genes associated with the ribosome or oxidative phosphorylation pathway in tumor samples according to the pan-cancer data. The scale of the circle indicates [the TR mean in tumors]/[the TR mean in normal tissues]. i Significant intronic APA genes associated with the cell cycle or focal adhesion pathways in normal tissues according to the pan-cancer data. The scale of the circle indicates [the TR mean in normal tissues]/[the TR mean in tumors].The intronic APA events depicted in Fig. 2b and Supplementary Fig. 2b represent significant events observed across all samples, including individual variations. However, specific genes within the BRCA dataset such as H2AZ2 (H2A histone family member V) and LRRFIP1 (leucine-rich repeat flightless-interacting protein 1) exhibited a significant increase in TR in most tumor samples compared to normal tissues (Fig. 2c and Supplementary Fig. 2c). In contrast, the expression of genes such as CXCL12 (C-X-C motif chemokine ligand 2) and DST (dystonin) exhibited an increase in TR in most normal tissues compared to tumor samples (Fig. 2d and Supplementary Fig. 2d). These examples illustrate that certain genes display intronic APA events consistently across both BRCA tumor samples and normal tissue collections. Furthermore, we identified conserved intronic APA events enriched in 80% or more of the tumors and normal tissues for each cancer type (Fig. 2b, e, Supplementary Fig. 2e, and Supplementary Table 4).Our analysis of pan-cancer data revealed a prevailing pattern of discrete intronic APA profiles in both tumor and normal tissues across all investigated cancer types (Fig. 2f and Supplementary Table 4). We further examined the KEGG pathways associated with these APA profiles in each cancer type and generated a heatmap of the enriched pathways across the ten cancer types. Significantly, certain KEGG pathways, such as ribosome and oxidative phosphorylation, were consistently enriched in tumor samples of five or more cancer types, whereas KEGG pathways, such as cell cycle and focal adhesion, were commonly enriched in five or more normal tissue types (Fig. 2g). Additionally, we observed that some enriched KEGG pathways were unique to the tumor and normal tissues of each cancer type (Supplementary Fig. 2f). Intriguingly, although these KEGG pathways displayed conservation across different cancer types, the genes associated with these pathways varied in each cancer. For instance, the ribosome pathway was enriched by intronic APA events in six different cancer types, yet the profile of ribosomal genes differed among these cancer types (Fig. 2h). Similar observations were made for other KEGG pathways enriched by intronic APA events, including the cell cycle, oxidative phosphorylation, and focal adhesion (Fig. 2h, i). In summary, these findings suggest that mRNA truncation through intronic APA is a common characteristic observed in both cancer and normal tissue transcriptomes. This observation implies the potential significance of intronic APA events in targeting shared cellular pathways that may have relevance in cancer biology.Discrete intronic APA profiles reveal dynamic reorganization of protein functional domains in cancer proteomesIn our comprehensive analysis of cancer transcriptomes, we observed a substantial number of genes displaying intronic APA events. To investigate whether these events carry unique molecular signatures in cancer transcriptomes, we compiled a dataset encompassing 5400 significant intronic APA events occurring in more than 80% of normal tissues or tumors for each cancer type (Fig. 2e, Supplementary Fig. 2e, and Supplementary Table 4). Among these events, 2991 were specific to individual cancer types, whereas 2409 were common across two or more cancer types (Fig. 3a). Notably, more than 10% of these shared events (279 intronic APA events) were conserved in five or more cancer types. Among these conserved events, 37 were found in either tumor or normal tissues, 118 were exclusive to tumors, and 124 were exclusive to normal tissues (Fig. 3a). Furthermore, we identified unique intronic APA events specific to each cancer type (Supplementary Fig. 3a).Fig. 3: Intronic APA events in pan-cancer data and their associated molecular characteristics.a Frequency profile of intronic APA events in TCGA data. The number of intronic APA events is displayed by the occurrence in each cancer. b Heatmap of collective genes displaying a significant intronic APA event enriched in either tumor or normal samples across five or more cancer types. The color scale represents the differences in the mean TR between tumor and normal tissues. Select genes are highlighted in a zoomed-in heatmap as an example. Red and blue represent significant intronic APA events in tumors and normal tissues, respectively. c Two examples of intronic APA events from BRCA TCGA data. RNA-seq read alignments of two matched tumor and normal pairs are shown along with a schematic of the registered Pfam domains. Yellow boxes highlight the locations of intronic APA events. d Registered Pfam domains lost in tumors or normal tissues due to enrichment of intronic APA events. A Venn diagram showing the overall distribution of Pfam domains affected by intronic APA in the pan-cancer dataset. The heatmaps show the list of representative Pfam domains lost in tumors (red) or normal tissues (blue). The Pfam domains in the red text represent overlapping domains between tumors and normal tissues. The scale of the heatmap was calculated as follows: log2[# of APA events + 1]. Significant intronic APA events with domain changes in the exclusive exons of the full-length transcripts were considered. e Schematic representation of Pfam domain swapping in tumors and normal tissues by intronic APA events. Zinc finger domain (zf-C2H2) proteins and serine/threonine protein kinase (PKinase) proteins are shown as examples. f Differential gene expression for genes showing zf-C2H2 and PKinase domain swapping in BRCA and KIRC. The x-axis presents the significance of intronic APA events calculated as −log10(p value). g Heatmap of pan-cancer GO terms enriched in Pfam domains lost in normal tissues and tumor samples. The scale of the heatmap was calculated as log2[# of APA events + 1]. The GO terms common to both normal tissues and tumor samples are highlighted in violet font. h A representation of common and exclusive Pfam domains in cancer and normal tissues. Pfam domains associated with nucleic acid binding and protein binding are shown.Our analysis highlighted ADAMTS2 (a disintegrin-like and metalloprotease with thrombospondin type 1 motif 2) as one of the most consistently conserved intronic APA events across all 10 normal tissues, suggesting that the truncated ADAMTS2 transcripts (ENST00000274609.5 and ENST00000518335.3) are more abundant in normal tissues than in tumor tissues (Fig. 3b, c). Conversely, PHF19 (PHD finger protein 19) exhibited increased expression of the intronic APA transcript (ENST00000312189.10) in tumor samples from nine different cancer types (Fig. 3b, c). These intronic APA events in genes such as ADAMTS2 and PHF19 generate C-terminal truncated proteins. For example, ADAMTS2 produces two truncated isoforms, one (ENST00000274609.5) of which lacks functional domains like ADAM_spacer1, TSP1, and TSP1_ADAM (thrombospondin type 1 domain), which are found in the full-length ADAMTS2 protein (Fig. 3c). Similarly, the tumor-enriched PHF19 intronic APA transcript generated a truncated PHF19 protein lacking one PHD finger domain and the Mtf2_C (polycomb-like MTF2 factor 2) domain (Fig. 3c). This finding indicates that widespread intronic APA events across the transcriptome can result in a broad-scale reduction in protein functional domains. This reduction may lead to the loss of conventional protein functions. However, it also opens the possibility of gaining new functions through the remaining functional domains in the truncated proteins.To explore the global changes in protein functional domains associated with our pan-cancer analysis, we mapped the missing mRNA regions due to intronic APA events to the Pfam database for functional domain identification26. Our analysis revealed that many Pfam domains were absent in both tumors (254 domains) and normal tissues (317 domains) (Fig. 3d), with 34 and 52 domains, respectively, being common to five or more cancer types (the heatmap in Fig. 3d and Supplementary Table 5). Surprisingly, a significant portion of these missing Pfam domains (152 domains) overlapped between tumors and normal tissues, many of which were common to five or more cancer types (highlighted in the red font in the heatmap of Fig. 3d).One notable functional domain that is missing across all proteomes in both tumor and normal samples is the Cys2His2-type zinc finger domain (zf-C2H2). In this case, a group of zinc finger protein (ZNF) genes within tumors and normal tissues lost the zf-C2H2 domain, retaining only the Krüppel-associated box (KRAB) domain (Fig. 3e). Importantly, most ZNF genes identified in this analysis did not exhibit differential expression between tumor and normal tissue samples. This finding suggests that traditional differential gene expression analyses are insufficient for identifying such functional loss of ZNFs in cancer and normal tissue proteomes (Fig. 3f). These findings also imply that this zf-C2H2 domain-containing protein family employs intronic APA to reconfigure the inherited functionality of zinc finger proteins in both tumor and normal tissues without altering their expression levels.Similarly, multiple Pfam domains, such as protein kinases (PKinase) and tryptophan-aspartate repeats (WD40), exhibited comparable restructuring of functional domains in both tumor and normal tissue samples (Fig. 3e, f and Supplementary Fig. 3b). To further investigate the association between this intronic APA-driven domainomics atlas and biological functions, we employed Gene Ontology (GO) analysis as a reference. Similar to our findings in the Pfam domain analysis, we discovered that numerous GO terms linked to the affected Pfam domains are common to both tumor and normal tissue samples (Fig. 3g and Supplementary Table 6).For instance, the GO term related to nucleic acid binding was enriched by Pfams such as RNA recognition motif 1 (RRM_1), zf-C2H2, DEAD (DEAD/DEAH box helicase), and Krüppel-associated box (KRAB) (Fig. 3h). This finding suggested that genome-wide transcriptional regulation might also be orchestrated by the intronic APA-driven reorganization of KRAB domains in both cancer and normal tissue proteomes. Nevertheless, there are Pfam domains specific to the nucleic acid binding GO term in either tumors (zf-CCHC (zinc knuckle) and R3H) or normal tissues (Piwi and DDE_1 (endonuclease)). This highlights Pfam domains unique to either tumor or normal tissue that represent distinct molecular pathways in pathological and nonpathological states (Fig. 3h). A similar pattern of Pfam domain redistribution was observed for the protein binding GO term (Fig. 3h). In summary, these results reveal an unexpected finding of intronic APA-driven domainomics that influences the regulation of biological functions through proteome-wide restructuring of functional domains. Importantly, this reorganization occurs independently of differential gene expression.Intronic APA patterns are linked to clinical variablesIn prior investigations of pan-cancer data, researchers identified signatures that were common to various cancers as well as those specific to individual cancer types. These findings were based on differential gene expression and alternative polyadenylation (APA) events occurring in the 3’-UTR27,28. Our analysis of intronic APA data within the TCGA dataset revealed distinctive molecular characteristics associated with the reprogramming of protein functional domains and the regulation of biological pathways through intronic APA. Notably, we observed no significant correlation between differential gene expression and intronic APA across either tumor or normal tissue samples (Fig. 2b and Supplementary Fig. 2b). This led us to explore whether the molecular signatures derived from intronic APA events could be linked to clinical parameters. Specifically, we focused on genes exhibiting significant differences in the TR and assessed their clinical relevance, particularly concerning hormone receptor phenotypes (ER, HER2, PR, and triple negative) in breast cancer.We observed that the TRs of certain genes, such as SNX5 and NGEF, strongly correlated with the estrogen receptor (ER) phenotype in breast cancer. This association was particularly notable when differential gene expression failed to distinguish between tumor samples based on hormone receptor phenotypes (as shown in Fig. 4a and Supplementary Fig. 4a). Similarly, the TRs of genes such as TP53RK and CNOT6L were linked to progesterone (PR) and human epidermal growth factor (HER2) receptor phenotypes, respectively (Fig. 4a). Notably, the TRs of several genes, including SYNGR1, GTF2IRD2, and FAM120C, exhibited increased levels in the tumor samples of triple-negative breast cancer patients, whereas genes such as TVP23C, ICAM3, and RIMKLB displayed decreased TRs in triple-negative breast cancer tumor samples (Fig. 4a and Supplementary Fig. 4a).Fig. 4: Intronic APA is associated with clinical variables.a Boxplots present four examples where intronic APA events are associated with hormone receptor phenotypes but not with the corresponding gene expression levels in BRCA data. The significance of these associations was assessed using unpaired t tests, and the p values were as follows: SNX5 TR p = 6.24e-18, gene expression (GE) p = 0.529; TP53RK TR p = 1.02e-16, GE p = 0.851; CNOT6L TR p = 4.65e-6, GE p = 0.922; SYNGR1 TR p = 1.05e-27, GE p = 0.701. b Violin plots illustrating two exemplary genes demonstrating significant intronic APA events but not significant differential gene expression in cancer stages. c Kaplan‒Meier (KM) plots illustrating the correlation between the TR of PHF19 and the disease-free rate or survival rate of HNSC and KIRC patients. d KM plots for the high- (red line) and low-risk (blue line) groups generated based on gene expression, TR of intronic APA events, and the combination of gene expression and TR. The upper three KM plots represent case 1, and the lower three KM plots represent case 2. The p values were determined using the log-rank test. e Prediction results of hormone receptor phenotypes based on gene expression, TR of intronic APA events, and the combination of gene expression and TR. SVM was used for the prediction task, and the mean AUC of 100 repeats (i.e., splitting of trainings, validation, and test sets) is reported.Furthermore, we found that the TRs of genes was associated with disease stage in various cancer types (Fig. 4b and Supplementary Fig. 4b). Overall, there was an increase in the TR (e.g., WFDC6, GATS, FMNL3, COQ4, and MEOIC) as the cancer stage progressed, whereas a decrease in the TR was noted for PARVA (Fig. 4b and Supplementary Fig. 4b). Notably, the differential expression of these genes did not demonstrate any significant associations with these clinical parameters. These results indicate that the TRs of genes are linked to disease phenotypes and unveil previously unnoticed molecular signatures.In light of these findings, we conducted further investigations into the correlation between TR and long-term clinical outcomes, such as disease-free survival time and survival rate. Our analyses revealed that certain genes, such as PHF19 and VRK3, which are truncated in 9 and 5 different cancers, respectively, exhibit a negative correlation between the TR and disease-free survival time or survival rate in specific cancer types (Fig. 4c and Supplementary Fig. 4c). To evaluate the prognostic significance of intronic APA events and gene expression profiles, we employed a Cox proportional hazards model with an elastic net penalty29. Initially, we utilized only the TR values of intronic APA events or gene expression profiles as molecular covariates to generate low- and high-risk patient groups. We then assessed the significance of our findings using Kaplan‒Meier (KM) plots and the log-rank test. In case 1 (the first row in Fig. 4d), intronic APA exhibited superior prognostic power compared to gene expression alone in predicting patient survival probability (Fig. 4d). Conversely, in case 2 (the second row in Fig. 4d), the performance of intronic APA was weaker than that of gene expression (Fig. 4d). Given the variable performance of intronic APA as a standalone variable, we asked whether intronic APA events combined with gene expression covariates exhibit an improved ability to predict survival. In both cases where intronic APA events showed mixed performance compared to gene expression, the combination of intronic APA and gene expression significantly improved the ability to predict patient survival compared to the prediction generated using either covariates of intronic APA events or the gene expression signatures alone (Fig. 4d).We also explored the potential of enhancing clinical outcome predictions in breast cancer by integrating intronic APA data into gene expression analysis. To do this, we applied a conventional classification method, support vector machine (SVM), to investigate the role of intronic APA events as a molecular signature in predicting clinical outcomes in breast cancer patients. Our analyses revealed that when gene expression and intronic APA were integrated, they outperformed gene expression or intronic APA alone, particularly in predicting outcomes related to estrogen receptor (ER), human epidermal growth factor receptor 2 (HER2), and progesterone receptor (PR), as indicated by the higher area under the curve (AUC) score. Remarkably, in regard to triple-negative breast cancer, the performance of the integrated transcriptomic signatures was superior to that of gene expression alone, with intronic APA showing the most significant improvement in prediction performance (Fig. 4e). In summary, our findings suggest that intronic APA events can function as a valuable component of clinical signatures, offering a substantial contribution to the prediction of survival and cancer outcomes. This contribution goes beyond the capabilities of commonly used genomic data analyses, such as gene expression alone.Intronic APA highlights the importance of C-terminomics in the cancer proteomeTraditionally, intronic APA has been viewed as a process leading to the premature termination of transcription, resulting in the production of truncated mRNA molecules that can subsequently impact the length of proteins30. This phenomenon has subsequent implications for domainomics, potentially affecting the entire proteome. Our data, illustrated in Fig. 3c, d, indicate that intronic APA can cause significant alterations in domainomics in both tumor and normal tissues. However, it is important to note that not all cases of intronic APA result in substantial changes in protein length, as exemplified by the case of ADAMTS2 (the second transcript isoform, ENST00000518335.3), which maintains all Pfam domains despite intronic APA (as shown in Fig. 3c).To gain insights into how transcriptome-wide intronic APA events influence protein length within both the cancer and normal tissue proteomes, we examined the relative position of the last exon containing the termination codon in truncated transcripts compared to fully annotated full-length transcripts using data from various cancers and normal tissues. The overall distribution of these last exons, stemming from transcript truncation, exhibited a bimodal pattern in the current genome annotation (Supplementary Fig. 5a). Notably, this bimodal distribution of intronic APA events was consistent across both tumor and normal tissue data analyses. However, the enrichment of APA events varied significantly between the two pathological conditions. In normal tissues, intronic APA events tend to cause earlier termination of transcription (up to 40% of the length of full-length transcripts); however, in tumors, termination due to intronic APA yields transcripts that closely resemble that of full-length transcripts (Fig. 5a and Supplementary Table 7).Fig. 5: Intronic APA renders diverse C-terminome features.a The histogram displays the distribution of relative positions of the 3’-last exon of intronic APA transcripts to the full-length transcript in the same gene in the pan-cancer dataset. Enriched KEGG pathways corresponding to early or late termination of transcripts are presented. b Comparison of full-length vs. truncated proteins grouped by the relative position of intronic APA events. The gray shaded area denotes the length of truncated proteins that fall within ±10% of the corresponding full-length proteins. c The histogram shows the distribution of intronic APA positions in BRCA and KIRC. d A schematic representation of Pfam domains that are lost by intronic APA and enriched in select KEGG pathways. Yellow boxes where the open reading frames of intronic APA transcripts end are linked to the positions of the truncated proteins in the diagram. e Differential gene expression for select genes enriched in the KEGG pathways shown in (a). The selected genes are color-coded by cancer type where intronic APA events are statistically significant.We conducted KEGG pathway analysis on intronic APA events belonging to each group, classified by the relative length of transcriptional termination in tumors and normal tissues. This analysis highlighted specific biological pathways. Interestingly, the focal adhesion and cell cycle pathways were enriched in the group of intronic APA events occurring in earlier exons in normal tissues, whereas the oxidative phosphorylation pathway exhibited enrichment in the same group characterized by earlier terminations in tumors. Conversely, the ribosome pathway was enriched in the group of intronic APA events occurring in later exons (beyond 80% of transcript length) in tumors (Fig. 5a and Supplementary Table 7).Although earlier termination of transcription results in truncated mRNAs, these truncated mRNAs contain the essential elements required for translation into proteins. To understand how these uneven distributions of terminal exons affect protein length in the cancer proteome, we calculated the lengths of the truncated and corresponding full-length proteins and classified them into three different groups based on the relative ranges of truncation positions: 0.0−0.4, 0.4−0.8, and 0.8−1.0. The scatter plots depicting protein lengths show that intronic APA events causing truncation of up to 40% of mRNA length mostly yield significantly truncated proteins in both tumors and normal tissues (as observed in the left plots of Fig. 5b). As the relative position of intronic APA events extends into the range of 0.4−0.8 truncation, the trend of producing substantially truncated proteins gradually diminishes, and longer proteins start to appear (middle plots in Fig. 5b). Intronic APA events that lead to similar mRNA lengths (relative truncation position ranges between 0.8 and 1.0) do not significantly affect protein length; many truncated proteins fall within the gray area, indicating ±20% variation in the length of corresponding full-length proteins (right plot in Fig. 5b). It is also worth noting that numerous genes produce much longer proteins due to intronic APA events, regardless of the relative truncation position (as evident in the bottom half area below the diagonal gray region in Fig. 5b). Further examination of these findings in individual cancer types confirmed the presence of similar disproportionate distributions of relative intronic APA positions between tumors and normal tissues (Fig. 5c and Supplementary Fig. 5b).We also noticed that specific conserved KEGG pathways, such as focal adhesion, cell cycle, and ribosome, which were highlighted in pan-cancer and normal tissue intronic APA events (as shown in Fig. 2g−i), also appeared in the data analysis related to early or late-terminating intronic APA events in tumors and normal tissues (Fig. 5a). Given this overlap, we explored whether genes within these pathways demonstrated any connection between protein length and the reorganization of Pfam domains, which could impact protein function. Intriguingly, genes associated with focal adhesion and cell cycle pathways produced truncated proteins that lost their C-terminal functional domains due to intronic APA, suggesting potential compromises in the functionality of these proteins (Fig. 5d). Similar observations were made for genes linked to the oxidative phosphorylation pathway (Fig. 5d). However, in the case of genes associated with ribosomes, the C-terminal sequences were altered without affecting the arrangement of known functional domains (Fig. 5d).Furthermore, most of the genes presented in Fig. 5d did not exhibit differential gene expression in the context of 10 different cancer types (Fig. 5e), which aligns with our previous observations (Fig. 3f). Consequently, our findings suggest that intronic APA is employed differently in tumor and normal tissue gene expression and in the regulation of biological pathways. In normal tissues, diverse biological pathways are influenced by an interruption of transcriptional elongation (Fig. 5a−d). In contrast, in tumors, intronic APA seems to primarily modify the C-terminal amino acid sequence by altering the choice of the terminal exon, resulting in proteins of similar length without significant alterations to their functional domains.JMJD6 C-terminal variants generated by intronic APA confer opposing functionsThe findings presented in Fig. 5 raise an important question about the biological significance of C-terminal sequence exchanges through intronic APA events. The fact that such exchanged peptide sequences are often similar in length and possibly unstructured in 3-dimensional protein structure analyses may lead to the underestimation of their significance. However, our observations regarding JMJD6 (jumonji domain-containing 6, arginine demethylase and lysine hydroxylase) indicate that intron 6 APA-driven short isoform expression is almost exclusively observed in tumor samples, whereas both long and short isoforms are expressed in normal tissues (Fig. 6a). Furthermore, our pan-cancer data analysis revealed that JMJD6 exhibits a significant intronic APA event that is common to seven different cancer types (Fig. 6b). Notably, several studies have investigated the role of JMJD6 in cancer biology but have suggested contradictory models for its function in cancer pathogenesis19,31,32,33,34,35,36. We hypothesize that one reason for these contradictory findings is the different isoform usages in these studies. To test this idea and explore the functional differences between the two possible isoforms of JMJD6, we overexpressed short or long isoforms of GFP-tagged JMJD6 in the MCF7 breast cancer cell line and assessed their effects on colony formation and cell migration. Our results demonstrated that while overexpression of the JMJD6 long isoform impaired colony formation and migration, overexpression of the JMJD6 short isoform promoted colony formation and enhanced migration (Fig. 6c, d, e). These findings provide evidence that the two JMJD6 isoforms exhibit distinct functions in MCF7 cancer cells due to changes in amino acid sequences in their C-termini. Therefore, these results strongly suggest that alterations in the C-terminal amino acid sequences of the two isoforms result in the multifaceted functionality of the gene and highlight the potential role of the intron APA-driven C-terminome in the regulation of disease pathogenesis.Fig. 6: An intronic APA-driven JMJD6 C-terminal change is pathogenic.a Exemplary RNA-seq read alignments for JMJD6 in BRCA data. Two pairs of patient tumor samples and corresponding normal tissue samples are shown. b TRs of JMJD6 in 7 types of cancer. p values for each cancer type are as follows. BRCA: 7.435e-17, COAD: 2.017e-4, HNSC: 5.634e-3, KIRC: 1.383e-8, LUAD: 9.592e-11, LUSC: 9.482e-12, PRAD: 1.544e-4. c western blot analyses of JMJD6 overexpression in MCF7 breast cancer cells. GFP alone (mock), GFP-JMJD6 long, and GFP-JMJD6 short were overexpressed in MCF7 cells, and their effects on cancer cell behaviors were measured. Endogenous JMJD6 expression was also examined in the same cell extracts. Beta-tubulin was used as a loading control for the western blot analyses. Anchorage-independent growth assessed using soft agar assays (d) and migration assays (e) in mock, GFP-tagged JMJD6 long isoform, or GFP-tagged JMJD6 short isoform-expressing MCF-7 cells. Representative images are shown. The quantification of the images is shown in the bar graph. Scale bar = 100 nm. The data are shown as the mean (SD). **p < 0.01, two-tailed Student’s t test.

Hot Topics

Related Articles