Integrated machine learning and Mendelian randomization reveal PALMD as a prognostic biomarker for nonspecific orbital inflammation

DEG identification and construction of the modelDifferential analyses were performed using the limma package of R4.31 software. With pheatmap package to build the heatmap23. We integrated datasets GSE58331 and GSE105149 and performed batch effect correction. PCA confirmed the successful segregation of patients into risk-specific cohorts (Fig. 2a-b). Among the 314 DEGs, several exhibited significant differences. Specifically, certain genes clustered within the treatment group (NLRC5, CXCR4, DENND1C, ARHGAP9, ADAM28, RHOH, IL7R, SPOCK2, LCK, CD3D, ITGAL, CD3G, etc.) while others were prevalent in the control group (MGST1, TGFBR3, ENPP6, NECAB1, TRHDE − AS1, NPR3, PGM1, TMEM100, IGFBP6, PLIN1, TIMP4, etc.) (Fig. 2c). Notably, some DEGs were significantly upregulated (CXCL9, TCIRG1, PIGR, IGHM, HLA-DQA1, PROM1, etc.), whereas others were markedly downregulated (LARP6, HLF, MGST1, ADH1B, C2orf40, PGM1, TGFBR3, etc.) (Fig. 2d, Table S2). We employed LASSO and Cox regression analyses to establish a gene signature (Fig. 2e-f). The SVM-RFE algorithm was utilized to build a machine learning model, demonstrating an accuracy of 0.894 and an error rate of 0.106 (Fig. 2g-h). Random forest analysis identified key genes such as SRPX, ITM2A, PGM1, and HLF (Fig. 2i-j). Attempts to integrate key genes from all three algorithms revealed that LASSO and SVM-RFE provided the most stable model constructions. Ultimately, we identified 15 hub genes (Fig. 2k, Table S3).Fig. 2Principal Component Analysis and Model. (a,b) PCA results. (c) Heatmap of DEGs. (d) Volcano plot of DEGs. (e) LASSO regression. (f) Cross-validation. (g,h) Model accuracy and error rates. (i,j) Random forest analysis. (k) Venn diagram of key genes.DEG identification and visualizationWe visualized the expression of these 15 hub genes in both the NSOI group and the normal sample group. IRF8, IGK and CCL4 were significantly expressed in the NSOI group. AKR1C1, CORO2B, ENPP6, GPR146, HLF, IGSF10, MAP1B, PALMD, PGM1, PLA2G16, RHOBTB3, and TNS1 were significantly expressed in the control group (Fig. 3a). Additionally, these genes were plotted together for a comprehensive visual comparison (Fig. 3b). The ROC analysis of these genes demonstrated their high predictive accuracy: RHOBTB3 (AUC: 0.806), GPR146 (AUC: 0.907), IGK (AUC: 0.765), CCL4 (AUC: 0.813), PALMD (AUC: 0.824), CORO2B (AUC: 0.887), HLF (AUC: 0.945), TNS1 (AUC: 0.802), IGSF10 (AUC: 0.882), PGM1 (AUC: 0.911), PLA2G16 (AUC: 0.801), ENPP6 (AUC: 0.830), MAP1B (AUC: 0.842), IRF8 (AUC: 0.840), and AKR1C1 (AUC: 0.836) (Fig. 4).Fig. 3DEG Identification and Visualization. (a) Expression of 15 hub genes in NSOI and normal sample groups. (b) Co-expression of all hub genes in a single plot.Fig. 4ROC curves of the 15 hub genes. AUC curve is an algorithm to evaluate whether the diagnostic model is stable and accurate. All of the 15 genes here, except IGK (AUC: 0.765), are above 0.8, which indicates that the results we obtained for the 15 hub genes are stable and credible.Validation of hub genesTo enhance the confidence and predictive accuracy of our model, we validated the 15 hub genes using the GSE105149 dataset. This validation revealed significant differences in the expression of these DEGs (Fig. 5). The ROC analysis of the 15 hub genes in the GSE105149 dataset further confirmed their high predictive accuracy: MAP1B (AUC: 0.862), PGM1 (AUC: 0.938), IRF8 (AUC: 0.851), PLA2G16 (AUC: 0.839), GPR146 (AUC: 0.943), TNS1 (AUC: 0.861), CCL4 (AUC: 0.798), ENPP6 (AUC: 0.882), IGK (AUC: 0.857), HLF (AUC: 0.971), PALMD (AUC: 0.867), CORO2B (AUC: 0.919), AKR1C1 (AUC: 0.810), RHOBTB3 (AUC: 0.861), and IGSF10 (AUC: 0.923). These results confirmed the high reliability and accuracy of our model (Fig. 6).Fig. 5Expression of 15 hub genes in GSE58331 analysis.Fig. 6ROC curves of the 15 hub genes. All of the 15 genes here, except CCL4 (AUC: 0.798), are above 0.8, which indicates that the results we obtained for the 15 hub genes are stable and credible.Differential expression analysis and enrichment analysis of PALMDSpecific genes clustered distinctively in the high and low expression groups. Notable genes in the high expression group included MAOA, MTURN, FAM114A1, ANG, NOSTRIN, and SH3BGRL2, while the low expression group featured MARCO, SLC7A7, CAPG, APOE, APOC1, and PLAU (Fig. 7a-b). Additionally, we constructed a correlation matrix plot related to PALMD (Fig. 7c, Table S4). GO enrichment analysis identified 448 core targets, encompassing biological processes BP, MF, and CC. Key molecular functions included DNA-binding transcription activator activity, RNA polymerase II-specific (GO:0001228), DNA-binding transcription activator activity (GO:0001216), and enzyme inhibitor activity (GO:0004857). Key cellular components included the apical part of the cell (GO:0045177), basal part of the cell (GO:0045178), and the apical plasma membrane (GO:0016324). Significant biological processes involved the regulation of cell-cell adhesion (GO:0022407), leukocyte migration (GO:0050900), and leukocyte-mediated immunity (GO:0002443). KEGG enrichment analysis indicated that the overexpressed genes were primarily involved in cell adhesion molecules (hsa04514), salivary secretion (hsa04970), and Staphylococcus aureus infection (hsa05150) (Fig. 7d-e, Table S5a-b).Fig. 7Differential Expression Analysis and Enrichment Analysis. (a) Heatmap. (b) Volcano plot. (c) Correlation matrix diagram. (d) GO analysis illustrating the barplot, chord, circos, and cluster of selected genes’ logFC. (e) KEGG analysis illustrating the barplot, chord, circos, and cluster of selected genes’ logFC.
GSEA and GSVA analysis of PALMD
GSEA was employed to identify functional alterations among the DEGs of PALMD. In the high expression group of the GO analysis, functional enrichment was primarily observed in BP such as sensory perception of bitter taste, endoplasmic reticulum to Golgi vesicle-mediated transport, and cytoplasmic translation. Conversely, in the low expression group, functional enrichment was predominantly associated with adaptive immune response, leukocyte-mediated immunity, and lymphocyte-mediated immunity (Fig. 8a). In the KEGG analysis, the high expression group showed enrichment in pathways related to protein export, valine, leucine, and isoleucine degradation, and ribosomes. The low expression group demonstrated enrichment in pathways involved in allograft rejection, primary immunodeficiency, and graft-versus-host disease (Fig. 8b, Table S6). GSVA was utilized to further elucidate functional alterations among the DEGs of PALMD. In the GO analysis, the high expression group exhibited functional enrichment in biological processes such as regulation of CD40 signaling pathway and CD4 or CD8 positive alpha-beta T cell lineage commitment. CC like the B cell receptor complex and the cytoplasmic side of the plasma membrane were also enriched, along with MF such as secondary active sulfate transmembrane transporter activity and ligase activity forming carbon-carbon bonds (Fig. 8c). In the KEGG analysis, the high expression group showed enrichment in pathways related to asthma, cytokine-cytokine receptor interaction, autoimmune thyroid disease, and hematopoietic cell lineage (Fig. 8d, Table S7).Fig. 8GSEA and GSVA of PALMD. (a) GSEA of GO analysis. (b) GSEA of KEGG analysis. (c) GSVA of GO analysis. (d) GSVA of KEGG analysis. Red represents high expression and green represents low expression. The horizontal line in the figure goes to the right, indicating that the larger the logFC is, the more obvious the enrichment is.Immune landscape characterizationThe immunological environment plays a crucial role in the initiation and progression of NSOI. Notably, the risk-associated profiles revealed substantial differences in immune cell infiltration. Within the PALMD cohort, aDCs, APC co-inhibition, APC co-stimulation, B cells, CCR, and CD8 + T cells exhibited significant differences between the low and high-risk groups. However, Th2 cells and Type I IFN response did not show significant variance between the two groups (P > 0.05) (Fig. 9a). Within the immune cell landscape, naive B cells, resting CD4 memory cells, and resting dendritic cells were predominantly expressed in the treatment group. Conversely, monocytes, M0 macrophages, and activated mast cells were highly expressed in the control group (Fig. 9b). Additionally, we constructed an immune infiltration correlation matrix and heatmap to further elucidate these relationships (Fig. 9c-d). PCA successfully categorized patients based on immune profiles (Fig. 9e). A Lollipop plot was utilized to illustrate the expression patterns of correlation coefficients, highlighting resting mast cells, monocytes, M2 macrophages, activated NK cells, and regulatory T cells (Tregs) (Fig. 9f). Resting dendritic cells, resting mast cells, activated NK cells, and plasma cells were positively associated with PALMD, whereas naive B cells, activated dendritic cells, macrophages M0, macrophages M1, activated mast cells, activated CD4 memory T cells, and naive CD4 T cells were negatively associated with PALMD (Fig. 9g, Table S7).Fig. 9Immune Landscape Characterization. (a,b) Immune function and cells. (c) Correlation rectangle plot. (d) Heatmap. (e) PCA analysis. (f) Expression patterns of correlation coefficients.(g) Immune infiltration analyses.Construction of miRNA-LncRNA shared genes and gene regulatory networksWe queried three databases to identify 15 miRNAs and 43 lncRNAs associated with PALMD (Table S8a-b). The resulting network included 43 lncRNAs (such as CDR1-AS, RP11-54O7.17, GAS6-AS1, RP11-166B2.5, LINC00969, CTC-265F19.1, RP3-470B24.5, SLC8A1-AS1, FLJ16779, etc.) and 9 miRNAs (including hsa-miR-875-3p, hsa-miR-9-5p, hsa-miR-335-3p, hsa-miR-129-5p, hsa-miR-582-3p, hsa-miR-181a-2-3p, hsa-miR-2355-5p, hsa-miR-450b-5p, hsa-miR-539-5p) (Fig. 10, Table S8). To elucidate the underlying mechanisms of PALMD, we established an extensive gene regulatory network (Fig. 11). Within this network, PALM, PALM2AKAP2, EZR, TIMM13 demonstrated robust interactions with PALMD, known for its roles in inflammation and immunity. The intricate relationships within the gene regulatory network are detailed in Table S9.Fig. 10miRNA-LncRNA Shared Genes Network. Note: Red circles represent mRNAs, blue quadrangles represent miRNAs, and green triangles represent lncRNAs.Fig. 11Gene regulatory networks of PALMD.Mendelian randomization analysisIn examining the direct linkage between the PALMD and NSOI incidence, a forest plot was utilized for visual illustration, revealing a general symmetry in the data. Through sensitivity analysis employing the “leave-one-out” technique, it was determined that the omission of any individual SNP had a minimal effect on the results of the inverse variance-weighted (IVW) analysis, indicating that the remaining SNPs closely mirrored the overall dataset’s findings. To further authenticate our outcomes, MR-Egger regression analysis was conducted, bolstering the integrity and reliability of our results and the chosen analytical framework (Fig. 12a-d).Fig. 12Mendelian Randomization Analysis. (a) Correlation rectangle plot. (b) Heatmap. (c) The expression patterns of Correlation Coefficient. (d) SNP effect on Insomnia.

Hot Topics

Related Articles