Development and validation of a novel liquid-liquid phase separation gene signature for bladder cancer

Classification of LLPS-related gene samples into two subtypes based on NMF clustering molecular typingThe TCGA-BLCA cohort comprised 433 RNA sequencing samples, including 414 tumor and 19 normal samples. Differential expression analysis identified 579 LLPS-related genes (|log 2 FC|>1, FDR < 0.05) between conventional bladder tissue and BLCA patients. Volcano plots illustrated the differential expression, with 364 upregulated and 215 downregulated genes (Fig. 1A). Clustering of 406 bladder cancer samples was performed using the NMF R package (Fig. 1B,C). Evaluation of cophenetic correlation, silhouette distribution, and residual sum of squares (RSS) determined the optimal number of clusters as two, designated as C1 and C2 subtypes. Progression-free survival (PFS) and overall survival (OS) significantly differed between C1 and C2 subtypes (p < 0.05), with the C2 group exhibiting a poorer prognosis compared to the C1 group (Fig. 1D,E). Molecular subtyping results were also compared with previous studies (Fig. 1F).Fig. 1(A) Volcano plot illustrating differentially expressed genes in BLCA using TCGA data. (B) Consensus map clustered via the NMF algorithm. (C) Rank (2–10), cophenetic correlation, residual sum of squares (RSS), and silhouette distribution plots. (D) Progression-free survival (PFS) exhibited significant differences between C1 and C2. (E) Overall survival (OS) showed notable disparities between C1 and C2. (F) Alluvial plot depicting the distribution of C2 and C1 across two molecular subtypes.Analysis of immune infiltration in LLPS-related genesThe prognosis for BLCA patients differed markedly between the two subtypes, reflected in the immune cell infiltration data between C1 and C2. MCPcounter scores indicated substantial differences in cytotoxic lymphocytes, B lineage, monocytic lineage, CD8+ T cells, endothelial cells, neutrophils, fibroblasts, NK cells, T cells, and myeloid dendritic cells across both subtypes (Fig. 2A-J).Fig. 2Significant variations observed in the Immune scores of cells of the liquid-liquid phase separation (LLPS). (A) Cytotoxic lymphocytes (B) B lineage (C) Monocytic lineage (D) CD8 + T cells (E) Endothelial cells (F) Neutrophils (G) Fibroblasts (H) NK cells (I) T cells (J) Myeloid dendritic cells.Construction and evaluation of gene signatures in the TCGA training groupThe TCGA differential gene expression file was integrated with survival data using the “limma” package, resulting in a risk file comprising 406 cases. Samples were randomly divided into a testing set (120 cases) and a training set (286 cases), ensuring no significant differences between the groups. Univariate Cox proportional hazards regression analysis on the training set identified genes with significant expression (P < 0.05). Subsequently, the LASSO regression model was developed, and cross-validation determined the point of least error (lambda.min) (Fig. 3A,B). Based on LASSO regression results, a multivariate Cox proportional hazards model identified LLPS-related genes with significant differences. Ultimately, a risk score formula incorporating nine genes was derived.$$\begin{aligned} {\text{Risk score = }} & {\text{ 0}}{\text{.568*MTHFD1L + 0}}{\text{.438*P4HB }} \\ & +{\text{ 0}}{\text{.197*PDGFRA – 0}}{\text{.317*OAS3 + 0}}{\text{.358*AHNAK}} – 0.419 \\ & *{\text{PDXK}} – {\text{0.320*IGSF8 + 0}}{\text{.429*FASN + 0}}{\text{.268*NT5DC3}} \\ \end{aligned}$$The ROC curve was used to evaluate the accuracy of the RS model in the training set. The model demonstrated an area under the curve (AUC) greater than 0.7 over 1, 3, and 5 years (Fig. 3G). Based on the median RS, the sample was divided into HRG and LRG. The KM survival analysis showed that LRG had a better prognosis compared to HRG (Fig. 3C).Fig. 3(A) LASSO coefficient profiles were generated using TCGA data. (B) The optimal penalization coefficient (λ) was identified via threefold cross-validation based on partial likelihood deviance. (C–F) Construction and validation of the nine-gene risk score (RS) related to LLPS for BLCA using Kaplan–Meier (KM) curves in different cohorts: (C) TCGA training cohort, (D) GEO cohort, (E) entire TCGA cohort, and (F) TCGA testing cohort. (G–J) Establishment and validation of the nine-gene RS signature associated with LLPS for BLCA with 1, 3, and 5-year ROC analyses across different cohorts: (G) TCGA training cohort, (H) GEO cohort, (I) entire TCGA cohort, and (J) TCGA testing cohort.Validation of prognostic models in different cohortsThe robustness of the LLPS-related genetic model was assessed by applying the same model and coefficients from the training set to compute risk scores for each sample. BLCA patients were stratified into low-risk and high-risk groups based on median risk values. Survival curves for the GEO cohort (165 cases), the entire TCGA cohort (406 cases), and the TCGA testing cohort (120 cases) were plotted using the survival and survminer packages (Fig. 3D-F). While the survival curves for the GEO and entire TCGA cohorts showed significant differences, the testing cohort did not exhibit notable variations. Consequently, the analysis indicated that HRG had a significantly poorer long-term prognosis compared to the LRG.Subsequent ROC analysis evaluated the prognostic classification of risk scores and assessed the classification efficiency for 1, 3, and 5-year survival. These analyses aligned with the performance observed in the TCGA training cohort. Utilizing the timeROC package, ROC curves (Fig. 3H-J) revealed that the areas under the 1, 3, and 5-year ROC curves for the GEO and entire TCGA cohorts exceeded 0.6, while the TCGA testing cohort surpassed 0.5. This suggested the constructed model predicted 1, 3, and 5-year survival with high accuracy.The relationship between OS and potential variables was examined using univariate and multivariate Cox regression models, incorporating gender, age, grade, pathological grade, TNM stage, and RS (Table 1). The results demonstrated that age (HR: 1.029, 95% CI: 1.012–1.965, p < 0.0001), TNM stage (HR: 1.613, 95% CI: 1.323–1.965, p < 0.0001), and RS (HR: 1.339, 95% CI: 1.257–1.426, p < 0.0001) significantly impacted OS.Table 1 Univariate and multivariate Cox regression models are developed to investigate the correlation between the clinical prognosis and RS.Risk model assessment for prognosis predictionThe risk model was evaluated against previously reported BLCA feature models by comparing survival and ROC curves to validate its prognostic accuracy. Three reports were selected through a targeted literature review18,19,20 (Fig. 4). Comparative analysis of prognostic models for bladder cancer indicated that the current model exhibited the largest area under the ROC curve (Fig. 4A-D) and the smallest p-value (Fig. 4E-H). The C-index and RMS for each model were calculated, revealing that the developed risk model had the highest values in both metrics (Fig. 4I-J), highlighting its superior performance in predicting long-term prognosis.Fig. 4(A) ROC curves of the constructed model. (B–D) ROC curves of three previously published gene signatures. (E) KM curves of the constructed model. (F–H) KM curves of three previously published gene signatures. (I) C-index of all four prognostic risk models, with the LLPS model exhibiting the highest value. (J) RMS time curve of all four prognostic risk models, indicating a 60-month overlap.Enrichment analysis between subtypesA correlation analysis was performed between typing results from the TCGA cohort and published immunotyping results. GO functions and KEGG pathway enrichment annotations were applied to 406 sample risk files. This analysis identified the top five pathways enriched in the HRG, based on NES (normalized enrichment score) and q-value (adjusted p-value) (Fig. 5A,B). The pathways were listed below;$$\begin{gathered} {\text{KEGG}}\_{\text{ECM}}\_{\text{RECEPTOR}}\_{\text{INTERACTION }}\left( {{\text{NES }} = {\text{ 2}}.{\text{51}}0{\text{5}},q = {\text{ 4}}.{\text{67E}} – 0{\text{9}}} \right), \hfill \\ {\text{KEGG}}\_{\text{FOCAL}}\_{\text{ADHESION }}\left( {{\text{NES }} = {\text{ 2}}.{\text{4974}},q = {\text{4}}.{\text{67E}} – 0{\text{9}}} \right), \hfill \\ {\text{KEGG}}\_{\text{REGULATION}}\_{\text{OF}}\_{\text{ACTIN}}\_{\text{CYTOSKELETON }}\left( {{\text{NES }} = {\text{2}}.{\text{2715}},{\text{ q}} = {\text{4}}.{\text{67E}} – 0{\text{9}}} \right), \hfill \\ {\text{KEGG}}\_{\text{PATHWAYS}}\_{\text{IN}}\_{\text{CANCER }}\left( {{\text{NES }} = {\text{ 1}}.{\text{9946}},q = {\text{3}}.{\text{85E}} – 0{\text{7}}} \right), \hfill \\ {\text{KEGG}}\_{\text{MELANOMA }}\left( {{\text{NES }} = {\text{2}}.{\text{2135}},{\text{ q }} = {\text{3}}.{\text{18E}} – 0{\text{5}}} \right), \hfill \\ {\text{GOBP}}\_{\text{EXTERNAL}}\_{\text{ENCAPSULATING}}\_{\text{STRUCTURE}}\_{\text{ORGANIZATION }}\left( {{\text{NES }} = {\text{2}}.{\text{7154}},q = {\text{2}}.{\text{31E}} – {\text{ }}0{\text{8}}} \right), \hfill \\ {\text{GOBP}}\_{\text{KERATINIZATION }}\left( {{\text{ NES }} = {\text{ 2}}.{\text{6598}},q = {\text{2}}.{\text{31E}} – {\text{ }}0{\text{8}}} \right), \hfill \\ {\text{GOBP}}\_{\text{KERATINOCYTE}}\_{\text{DIFFERENTIATION }}\left( {{\text{NES }} = {\text{ 2}}.{\text{6184}},q = {\text{2}}.{\text{31E}} – {\text{ }}0{\text{8}}} \right), \hfill \\ {\text{GOBP}}\_{\text{SKIN}}\_{\text{DEVELOPMENT }}\left( {{\text{NES }} = {\text{2}}.{\text{5581}},~q = {\text{2}}.{\text{31E}} – 0{\text{8}}} \right), \hfill \\ {\text{GOCC}}\_{\text{INTERMEDIATE}}\_{\text{FILAMENT }}\left( {{\text{NES }} = {\text{2}}.{\text{5114}},q = {\text{2}}.{\text{31E}} – 0{\text{8}}} \right) \hfill \\ \end{gathered}$$Fig. 5(A) The top 5 pathways gene signature assessment in the HRG using KEGG analysis. (B) The top 5 pathways gene signature assessment in the HRG using GO analysis.Clinical, genetic, and immune cell correlation analysisBox plots for clinical correlation analysis were generated using the ggpubr R package. RS varied across clinical subgroups of BLCA patients, showing higher values in patients over 65 years of age, with high grade, stage III-IV, and high TNM staging (Fig. 6A-D). Kaplan–Meier (KM) survival curves for stage I-II and III-IV subtypes further validated the model (Fig. 6E,F), indicating that risk characteristics were reliable predictors across clinical subgroups.To examine the correlation between RS and target genes, expression levels of key pathway target genes were extracted and analyzed. A positive correlation was observed between RS and LLPS-related genes, including PHGDH, MMP9, ADIPOQ, and NAMPT (Fig. 7A).In the TCGA-BLCA cohort, immune cell fractions were calculated for each sample and their correlation with RS was evaluated. The analysis revealed a positive correlation between RS and both endothelial cells (p < 0.05) and fibroblasts (p < 0.05) (Fig. 7B).Fig. 6(A–D) RS comparison of BLCA samples with their corresponding clinical data of BLCA: (A) Age (B) Grade (C) Stage (D) T1-T4. (E, F), Assessment of the nine-gene risk model in various stage groups in LRG and HRG using OS KM curves.Fig. 7(A) Association between RS of BLCA samples and expression levels of representative genes for generic pathway targets in oncology. (B) Association between RS of BLCA samples and immune scores.

Hot Topics

Related Articles