Identification and immune landscape of sarcopenia-related molecular clusters in inflammatory bowel disease by machine learning and integrated bioinformatics

Identification of DESRGs in IBDWe investigated the expression landscape of SRGs in samples with IBD and observed differential expression patterns. Specifically, using a significance level of p < 0.01, we successfully identified a comprehensive set of 87 DESRGs that exhibited significant variations in expression levels (Fig. 2A,B). This finding suggested that genes associated with sarcopenia may play a role in the development of IBD. Furthermore, to provide valuable insights into the genomic landscape of DESRGs, we mapped the position of these genes onto chromosomes, revealing that, with the exception of FOXO4, AR, DMD, ACE2, and ATP6AP2 on the X chromosome, the other genes were located on autosomes (Fig. 2C). Subsequently, we performed Spearman correlation analysis and observed complex synergistic or antagonistic effects among these DESRGs (Fig. 2D).Figure 2Identification of DESRGs in IBD. (A and B) Using p < 0.01 as the criterion, 87 DESRGs were selected. Boxplots were used to visualize the expression pattern of DSERGs in normal and IBD tissues. (C) The location of 87 DESRGs on chromosomes. (D) Correlation analysis of 87 DESRGs. Red and green colors represented positive and negative correlations, respectively. The correlation coefficients were marked with the area of the pie chart.Enrichment analyses of DESRGs and assessment of immune landscape in patients with IBDTo gain further insights into the signaling pathways and biological processes associated with DESRGs, we performed gene enrichment analyses. The GO enrichment analysis revealed the involvement of DESRGs in diverse biological processes, encompassing responses to extracellular stimuli and nutrient levels, as well as hypoxia. Additionally, DESRGs were found to positively regulate cytokines, influence smooth muscle proliferation, and mediate DNA transcription processes. The results were presented in Fig. 3A. KEGG pathway analysis demonstrated significant enrichment of DESRGs in pathways such as apoptosis, HIF-1 signaling pathways, Jak-Stat signaling pathways, and FoxO signaling pathway (Fig. 3B). Furthermore, through the utilization of DO enrichment analysis, we have successfully identified a correlation between DESRGs and various age-related ailments such as atherosclerosis, osteoporosis, and cerebral infarction. The comprehensive depiction of these findings can be observed in Fig. 3C.Figure 3Enrichment analyses of DESRGs and assessment of immune landscape in patients with IBD. Dot plots showed the results of GO enrichment analysis (A) and KEGG pathway enrichment analysis (B) of DESRGs. (C) DO enrichment analysis of DESRGs. (D) The correlation between infiltrating immune cells and DESRGs. Red to blue indicateed the change from high to low correlation. The asterisks represented the statistical p value (*P < 0.05; **P < 0.01; ***P < 0.001). (E) Boxplot showed the immune cell infiltration difference between the normal and IBD groups. Blue represented the normal group; red represented the IBD group.We proceeded with an investigation to explore the connection between the immune landscape of individuals diagnosed with IBD and a specific set of 87 genes known as DESRGs. As expected, our findings demonstrated a significant positive correlation between most DESRGs and neutrophils, M1 macrophages, and dendritic cells. Conversely, these DESRGs displayed a negative correlation with CD8 + T cells, regulatory T cells (Treg), and memory B cells (Fig. 3D). Furthermore, we investigated the immune landscape in patients with IBD compared to normal samples. Significantly elevated proportions of plasma cells, monocytes, M1 macrophages, neutrophils, resting NK cells, and activated mast cells were observed in patients diagnosed with IBD. Conversely, healthy individuals displayed an increased proportion of native B cells, memory B cells, CD8 + T cells, resting CD4 + T cells, and M2 macrophages as illustrated in Fig. 3E. These findings highlighted the intricate interplay between DESRGs and specific immune cell populations in the context of IBD.Identification of IBD clusters mediated by DESRGsBased on the expression profiles of DESRGs, we performed NMF clustering analysis to classify the 541 IBD samples into three distinct molecular clusters, denoted as C1 C2 and C3 (Fig. 4A). C1 comprised a total of 151 samples, C2 consisted of 189 samples, and C3 encompassed 201 patients diagnosed with IBD. To further investigate the differences among these clusters, we employed principal component analysis (PCA) and t-Distributed Stochastic Neighbor Embedding (tSNE) (Fig. 4B,C), which clearly demonstrated significant distinctions among three clusters. We then examined the expression patterns of DESRGs across different clusters. The heatmap (Fig. 4D) revealed that several genes, including ILK, IL10, BEST1, NFKBIA, OSM, NLRP3, TNF, IL1A, IL6, IFNG, CD38, CD274, MMP3, PFKFB3, ANGPTL2, DKK3, CAV1, TWIST1, HSD11B1, S100B, TGFB1, CLU, DKK1 and others, were highly expressed in C2. Conversely, SLC12A2, OPA1, AIFM1, NLN, FBXO32, REN, CST3, FOXO4, VDR, FZD5, MFN2, EIF4EBP2, PPARGC1A, GPT, CMAS, PIK3CB, SERPINB6, SOD1, HFE, CALCR, SMUG1, PLIN2, MSRA, ACE2 exhibited high expression in C1 and C3 clusters. Cibersort algorithm was employed to assess the level of immune cell infiltration among three clusters. Notably, the proportions of immune cells in 18 out of the 22 types showed significant variation across clusters (Fig. 4E). As shown in Fig. 4F–H, GSVA analysis identified a notable enrichment of C2 in various pathways, including the complement and coagulation cascade, NOD-like receptor signaling pathway, toll-like receptor signaling pathway, cytokine receptor interaction, B-cell receptor signaling pathway, natural killer cell-mediated cytotoxicity, and chemokine signaling pathway. Furthermore, it demonstrated significant enrichment in autoimmune disorders such as systemic lupus erythematosus, graft-versus-host disease, and type 1 diabetes. The C3 cluster assumed a crucial role in the metabolic pathways of various nutrients, including steroids, fatty acids, amino acids, and lipids.Figure 4Identification of IBD clusters mediated by DESRGs. (A) NMF clustering analysis of 541 IBD samples based on 87 DESRGs. PCA (B) and t-SNE (C) analyses for the transcriptome profiles of IBD clusters, showing a remarkable difference on transcriptome among different clusters. (D) Heatmap showed the expression of DESRGs in diverse clusters. Red represented high expression of related genes; blue represented low expression of related genes. (E) Boxplot showed the immune cell infiltration difference among different clusters. Blue represented the C1; red represented the C2; yellow represented the C3. The asterisks represented the statistical p value (*P < 0.05; **P < 0.01; ***P < 0.001). (F–H) GSVA enrichment among different IBD clusters. The heatmap was used to visualize these biological processes. Red represented activated pathways and blue represented inhibited pathways (F) IBD cluster 1 versus IBD cluster 2. (G) IBD cluster 1 versus IBD cluster 3. (H) IBD cluster 2 versus IBD cluster 3.Screening key gene modules by WGCNATo identify key gene modules in patients with UC and CD, we employed the WGCNA algorithm to construct co-expression networks. In the UC cohort, we first calculated the gene expression variances within the meta cohort and selected the top 25% of genes with the highest variation for subsequent analysis. Using a soft threshold of β = 9, we established a scale-free network and identified six distinct co-expression modules through dynamic cutting. We then assessed the correlation between module expression and clinical traits. Notably, the green module exhibited the strongest association with UC (cor = 0.47, P = 2e-27) (Fig. 5A). Based on predefined criteria (GS > 0.4 and MS > 0.8), we identified 71 key genes within the green module (Fig. 5B).Figure 5Screening key gene modules by WGCNA. (A) Module–trait relationships in UC. Each row represented a module; each column represented a clinical status. (B) Scatter plot between module membership in green module and the gene significance for UC. 71 key genes were selected in brown module with GS > 0.4 and MS > 0.8 as criteria. (C) Module–trait relationships in CD. Each cell contained the corresponding correlation and p-value. (D) Scatter plot between module membership in green module and the gene significance for CD. 143 key genes were selected in green module with GS > 0.4 and MS > 0.8 as criteria. (E) Module–trait relationships in IBD clusters mediated by DESRGs. Each cell contained the corresponding correlation and p-value. (F) Scatter plot between module membership in blue module and the gene significance for C2. (G) Venn diagram was used to visualize 11 common key genes.For patients with CD, we utilized the WGCNA algorithm and set a soft threshold of 12 (R2 = 0.9) to construct a scale-free network. A total of seven distinct co-expression modules were identified through dynamic cutting. A heatmap displaying the correlation between module eigengenes and clinical phenotype (CD and normal) revealed that the green module was highly correlated with CD (cor = 0.74, P = 5e-37) (Fig. 5C). Subsequently, we identified 143 key genes within the green module (Fig. 5D). To investigate key gene modules in the three identified IBD clusters, we used β = 10 and R2 = 0.9 as the optimal soft threshold parameters to construct scale-free networks. Among the six gene modules with distinct colors, the blue module exhibited the closest correlation with the IBD cluster (cor = 0.78, P = 2e-112) (Fig. 5E). We extracted a total of 53 key gene modules from these clusters. Finally, by examining the intersection of the three phenotypically related gene sets, we identified 11 overlapping key genes. These genes were visualized using a Venn diagram (Fig. 5G) for subsequent analysis. Finally, by conducting an in-depth analysis of the overlapping gene sets associated with the phenotype, we have successfully identified a total of 11 pivotal genes that exhibit intersection. These genes encompassed SEC14L1, EPB41L4B, HGF, PDPN, TGFBI, OSMR, PHLDA1, RAB31, PLAU, IGFBP5 and TIMP1.Screening characteristic genes by machine learning algorithmsTo assess the diagnostic value of the 11 key genes, two different machine learning algorithms, namely LASSO and RF, were employed to narrow down the selection of target genes. By applying the LASSO algorithm with tenfold cross-validation, we obtained 11 IBD-related feature genes (Fig. 6A,B). The 11 key genes were then evaluated using the RF algorithm, resulting in the identification of 8 characteristic genes with importance scores greater than 10 (Fig. 6C,D). In addition, we employed a filtering criterion of fold change greater than 2 and an adjusted p-value less than 0.05 to identify DEGs between IBD and normal tissues using the limma package. A total of 198 DEGs were identified, with 58 exhibiting downregulation in IBD tissues and 140 showing upregulation in IBD tissues. The intersection of 198 DEGs and the key genes identified by two machine learning techniques resulted in a set of four marker genes, namely TIMP1, PLAU, PHLDA1, and TGFBI, for subsequent analyses (Fig. 6E).Figure 6Screening characteristic genes by machine learning algorithms. (A) Identification of the optimal penalization coefficient lambda (λ) in the LASSO model with tenfold cross-validation. (B) LASSO coefficient profiles of 11 key genes. (C) The impact of the quantity of decision trees on the rate of inaccuracies. The horizontal axis displayed the number of decision trees, while the vertical axis denoted the inaccuracy rate. (D) The importance of 11 key genes identified by RF. (E) The Venn diagram was used to visualize the four common genes obtained through machine learning algorithms and differential analysis.Construction and validation of the diagnostic modelWe utilized the createDataPartition function to randomly divide 641 IBD and normal samples into training and internal validation groups with a 7:3 ratio. Leveraging the 4 marker genes identified in the previous study, we constructed a logistic regression model in the training set. The subsequent ROC curve demonstrated that the logistic regression model based on the 4 feature genes exhibited excellent predictive performance for diagnosing IBD, with an AUC of 0.916 (95% confidence interval [CI]: 0.878–0.947) (Fig. 7A). In the internal validation group, the AUC was 0.883(95% CI: 0.813–0.945) (Fig. 7B), confirming the stability and reliability of the model. We utilized GSE179285 as an independent validation group to assess the effectiveness of our diagnostic model for IBD. Notably, our developed model demonstrated exceptional precision in predicting IBD in this external validation cohort, as illustrated in Fig. 7C. This resulted in a remarkable AUC value of 0.912 (95% CI: 0.843–0.968). To visualize the logistic regression model, we employed a nomogram (Fig. 7D), which facilitated the prediction of the probability of IBD diagnosis based on the expression levels of each characteristic gene. Calibration curve analysis revealed minimal discrepancy between the actual IBD risk and the predicted risk (Fig. 7E). Furthermore, DCA curves and clinical impact curves demonstrated that our nomogram exhibited high accuracy and could serve as a valuable tool for clinical decision-making (Fig. 7F,G).Figure 7Construction and validation of the diagnostic model. (A) ROC curve for diagnosing IBD in the training cohort. (B) ROC curve for diagnosing IBD in the internal validation cohort. (C) The ROC curve was employed for the diagnosis of IBD in the external validation cohort. (D) Nomogram to predict the diagnostic value of the 4 model genes for IBD. (E) Calibration curves in the training set. The x-axis represented the predicted probability from the nomogram, and the y-axis was the actual probability of IBD. (F) DCA in the training set. The y-axis represented net benefits, calculated by subtracting the relative harm (false positives) from the benefits (true positives). The x-axis calculated the threshold probability. (G) Clinical impact curves of nomogram. The y-axis represented the number of people with high risk. The x-axis calculated the threshold probability. The red lines represented the number of individuals identified as high risk (IBD) by the model at the corresponding probability threshold. The blue lines represented the number of individuals who, at that same probability threshold, were classified by the model as high risk and actually experienced an outcome event (IBD).Ustekinumab-mediated modulation of 4 marker genesUstekinumab, a monoclonal antibody, exerts its action on T cells, natural killer cells, and antigen-presenting cells by targeting the p40 subunits of interleukin-23 and interleukin-1237. It has already gained approval as a primary therapeutic option for both the induction and maintenance treatment of CD and UC. In our investigation, we employed the GSE112366 dataset to examine the impact of Ustekinumab on the expression profiles of the aforementioned 4 model genes. As depicted in Fig. 8A–D, when compared to normal samples, individuals with moderate to severe CD exhibited significantly elevated expression levels of the 4 model genes prior to Ustekinumab treatment. However, following an 8-week Ustekinumab induction therapy, the expression levels of these genes saw a notable decrease, with 3 genes—TIMP1, PHLDA1, and TGFBI—displaying no statistical difference compared to normal samples. After 44 weeks of Ustekinumab maintenance therapy, the levels of PLAU and PHLDA1 expression exhibited a notable decrease compared to their initial levels before treatment (Fig. 8E,F). However, there were no statistically significant disparities observed in the expression of TIMP1 and TGFBI after undergoing treatment for 44 weeks (Fig. 8G,H). This leads us to propose a hypothesis that by modulating the expression of these model genes during both the induction and maintenance phases of Ustekinumab treatment for CD, it may be possible to mitigate the occurrence of sarcopenia.Figure 8Ustekinumab may treat CD by regulating the expression of sarcopenia-related key genes. (A–D) The relative expression levels of 4 model genes in the terminal ileum mucosa of normal, control, pre-Ust (CD patients before 8 weeks of Ustekinumab induction therapy) and after-Ust (CD patients after 8 weeks of Ustekinumab induction therapy). (B) The relative expression levels of 4 model genes in the terminal ileum mucosa of normal, control, pre-Ust (CD patients before 44 weeks of Ustekinumab maintenance therapy), after-Ust (CD patients after 44 weeks of Ustekinumab maintenance therapy). The asterisks represented the statistical p value (*P < 0.05; **P < 0.01; ***P < 0.001).scRNA analysis of colon lamina propria immune cells in patients with active UCWe extracted 11 samples of active inflammation from patients with UC using dataset GSE162335 for scRNA analysis. Figure 9A illustrated the expression characteristics of these 11 samples. Following data filtration, we collected 15,316 immune cells from the lamina propria of the colon in individuals with inflammatory UC. Subsequently, data normalization was performed, and we employed the “VST” method to identify 1,500 HVGs, with the top 10 displayed in Fig. 9B. Using the t-SNE method, we identified a total of sixteen distinct cell clusters, and the marker genes for each of these clusters can be found in Supplementary Table S2. By applying established methods from previous studies38, we identified seven types of cells, including B cells, CD4 + T cells, CD8 + T cells, adipocytes, hematopoietic stem cells (HSCs), monocytes, and NK cells (Fig. 9C). Notably, B cells, CD4 + T cells, and CD8 + T cells were the most abundant among these. Subsequently, we conducted an in-depth analysis of the expression distribution of 4 key genes across various immune cells (Fig. 9D). Our findings revealed that nearly all keys genes exhibited robust expression in monocytes (Fig. 9D), implying a potential involvement of monocytes in mediating the process of sarcopenia in patients with IBD.Figure 9scRNA analysis of colon lamina propria immune cells in patients with active UC. (A) The violin diagram was used to visualize the genes features, counts, and mitochondrial gene percentages of 11 inflamed UC samples. (B) The gene scatter plot with the top 10 high variable genes. The red dots represented high variable genes, and the black dots represented other genes. (C) Cell annotation results for different immune cell types. (D) The dot plot was used to visualize the relative expression of 4 key model genes in different immune cells.

Hot Topics

Related Articles