Deciphering the role of HLF in idiopathic orbital inflammation: integrative analysis via bioinformatics and machine learning techniques

DEG identification and principal component analysisWe integrated GSE58331 and GSE105149 and conducted batch match evidence integration. PCA corroborated the successful demarcation of patients into risk-specific cohorts (Fig. 2a,b). Among the 314 DEGs, some DEGs were found to be significantly different. In addition, Some genes cluster in the treat group and some in the control group. Treat: PPP1R1A, CAB39L, MTURN, MAOA2, NGFRAP1, CDR1, etc. Control: ITGB2, CAPG, CHI3L1, SLAMF8, APOC1, TCIRG1, etc. (Fig. 2c). Some of these DEGs were significantly up-regulated (TCIRG1, IGHM, CXCL9, PROM1, PIGR, HLA-DQA1, etc.). However, some genes were significantly down-regulated (HLF, ADH1B, MGST1, LARP6, PGM1, C2orf40, TGFBR3, etc.) (Fig. 2d) (Table S1).Figure 2Principal component analysis. (a,b) Analysis of PCA. (c) Heatmap. (d) Volcano map.Construction of the modelIn this investigation, disparate gene datasets were procured utilizing distinct algorithms, subsequently intersecting these datasets to enhance their credibility. Through the application of lasso regression, we identified 61 genes. The SVM-RFE algorithm yielded 22 genes. By intersecting the outcomes of these two methodologies, we discerned 15 central hub genes. Employing lasso, Cox regression analysis, and the determination of an optimum value, a gene signature was meticulously established, grounding our approach in robust statistical methodologies to ensure the reliability and predictive power of our findings (Fig. 3a,b). The SVM-RFE was used to build the machine learning model to validate the model’s accuracy and reliability. The accuracy of this model was 0.894, and the error was 0.106 (Fig. 3c,d). Some important genes were identified by Random forest analysis, and these genes included SRPX, ITM2A, PGM1, HLF, etc. (Fig. 3e,f). We attempted to combine the key genes of these three algorithms to construct the model. However, it was found that only LASSO and SVM-RFE had the most stable key gene construction models. Finally, we obtained 15 hub genes (Fig. 3g) (Table S3).Figure 3The development of the signature. (a) Regression of the NSOI-related genes using LASSO. (b) Cross-validation is used in the LASSO regression to fine-tune parameter selection. (c,d) Accuracy and error of this model. (e,f) Random forest analysis. (g) Venn.DEG identification and visualizationWe visualized these 15 hub genes in the NSOI group and the normal sample group respectively (Fig. 4). In addition, we also put these genes in the whole graph for visual comparison (Fig. 5). In the confirmation of 15 hub genes, we analyzed the ROC of these genes, showing that the accuracy of these genes is high. HLF (AUC: 0.945), PGM1 (AUC: 0.911), GPR146 (AUC: 0.907), IRF8 (AUC: 0.840), TNS1 (AUC: 0.802), PLA2G16 (AUC: 0.801), PALMD (AUC: 0.824), CCL4 (AUC: 0.813), IGK (AUC: 0.765), CORO2B (AUC: 0.887), IGSF10 (AUC: 0.882), AKR1C1 (AUC: 0.836), ENPP6 (AUC: 0.830), MAP1B (AUC: 0.842), RHOBTB3 (AUC: 0.806) (Fig. 6).Figure 4Expression of 15 hub genes in NSOI group and normal sample group respectively.Figure 5All hub genes are co-expressed in the same line plot.Figure 6ROC of 15 hub genes. AUC curve is an algorithm to evaluate whether the diagnostic model is stable and accurate. All of the 15 genes here, except IGK (AUC: 0.765), are above 0.8, which indicates that the results we obtained for the 15 hub genes are stable and credible.Validation of hub genesGSE58331 was used for validation to boost our model’s confidence and prediction accuracy of these hub genes. What’s interesting is that these DEGs are showed significant differences in GSE58331 analysis (Fig. 7). In the GSE58331 analysis of 15 hub genes, we analyzed the ROC of these genes, showing that the accuracy of these genes is high. HLF (AUC: 0.971), PGM1 (AUC: 0.938), GPR146 (AUC: 0.943), IRF8 (AUC: 0.851), TNS1 (AUC: 0.861), PLA2G16 (AUC: 0.839), PALMD (AUC: 0.867), CCL4 (AUC: 0.798), IGK (AUC: 0.857), CORO2B (AUC: 0.919), IGSF10 (AUC: 0.923), AKR1C1 (AUC: 0.810), ENPP6 (AUC: 0.882), MAP1B (AUC: 0.862), RHOBTB3 (AUC: 0.861). These results also confirmed the high reliability and accuracy of our model (Fig. 8).Figure 7Expression of 15 hub genes in GSE58331 analysis.Figure 8ROC of 15 hub genes. All of the 15 genes here, except CCL4 (AUC: 0.798), are above 0.8, which indicates that the results we obtained for the 15 hub genes are stable and credible.DEG identification of HLFBy differential analysis of single gene targets, we identified 218 DEGs. Among the 218 DEGs, some DEGs were found to be significantly different. In addition, some genes cluster in the high group and some in the low group. High: MGLL, KANK4, SH3BGRL2, TCEAL2, ZNF667-AS1, PPL, NET1, etc. Low: VCAN, THBS2, FN1, POSTN, RARRES2, LOX, ADAMTS2, COL1A1, etc. (Fig. 9a,b). In addition, we constructed a correlation matrix plot related to HLF (Fig. 9c) (Table S4).Figure 9DEG identification of HLF. (a) Heatmap. (b) Volcano map. (c) Correlation matrix diagram.Enrichment analysis of DEGs of HLFGO enrichment analysis revealed 221 core targets, including BP, MF, and CC. The MF mainly involves in DNA-binding transcription activator activity (GO:0001216), enzyme inhibitor activity (GO:0004857), extracellular matrix structural constituent (GO:0005201). The CC mainly involves in collagen-containing extracellular matrix (GO:0062023), apical part of cell (GO:0045177), apical plasma membrane (GO:0016324). The BP mainly involves in leukocyte migration (GO:0050900), ossification (GO:0001503), negative regulation of immune system process (GO:0002683). KEGG enrichment analysis revealing that the over-expressed genes were mainly involved in Chemokine signaling pathway (hsa04062),Staphylococcus aureus infection (hsa05150), Salivary secretion (hsa04970) (Fig. 10 and Table S5a-b).Figure 10For PMGs, GO, and KEGG analyses were performed. (a) The GO circle illustrates the barplot, chord, circos, and cluster of the selected gene’s logFC. (b) The KEGG barplot, chord, circos, and cluster illustrates the scatter map of the logFC of the indicated gene. Red represents high expression and green represents low expression. The horizontal line in the figure goes to the right, indicating that the larger the logFC is, the more obvious the enrichment is.GSEA of analysisGSEA was deployed to identify functional alterations across the DEGs of HLF. In high expression group of GO analysis, the functional enrichment mainly involves in BP cytoplasmic translation, BP energy derivation by oxidation of organic compounds, BP lgi vesicle transport. In low expression group of GO analysis, the functional enrichment mainly involves in BP activation of immune response, BP alpha beta t cell activation, BP adaptive immune response (Fig. 11a).Figure 11GSEA of analysis in PDE4B and PDE6D. (a) GO. (b) KEGG. Red represents high expression and green represents low expression. The horizontal line in the figure goes to the right, indicating that the larger the logFC is, the more obvious the enrichment is.In high expression group of KEGG analysis, the functional enrichment mainly involves in protein export, ribosome, spliceosome. In low expression group of KEGG analysis, the functional enrichment mainly involves in graft versus host disease, allograft rejection, autoimmune thyroid disease (Fig. 11b) (Table S6).GSVA of analysisGSVA was deployed to identify functional alterations across the DEGs of HLF. In the GO analysis, the functional enrichment mainly involves in CC perisynaptic extracellular matrix, CC synapse associated extracellular matrix, BP negative regulation of toll like receptor signaling pathway, BP positive regulation of defense response to bacterium, MF nadplus nucleosidase activity (Fig. 12a). In the KEGG analysis, the functional enrichment mainly involves in cytokine cytokine receptor interaction, graft versus host disease, asthma, type i diabetes mellitus (Fig. 12b) (Table S7).Figure 12GSVA of analysis in HLF. (a) GO. (b) KEGG.Immune landscape characterizationThe immunological environment has a critical role in the initiation and progression of NSOI. Intriguingly, the risk-associated profiles displayed stark differences in immune cell infiltration. Within the HLF cohort, aDCs, APC co inhibition, APC co stimulation, B cells, CCR, CD8+ T cells showed significant variance between the low and high-risk groups. While, Th2 cells and Type I IFN Reponse showed no significant variance between the low and high-risk groups (P > 0.05) (Fig. 13a). In immune cell, B cells naive, T cells CD4 memory resting, Dendritic cells resting were highly expressed in the treat group. While, Monocytes, Macrophages M0, Mast cells activated were highly expressed in the Control group (Fig. 13b). In addition, we also constructed an immune infiltration correlation rectangle plot and heatmap (Fig. 13c,d). Through PCA analysis, immune-based patient categorization was again successfully executed (Fig. 13e). A Lollipop was created to display the expression patterns of Correlation Coefficient. Mast cells resting, Plasma cells, NK cells activated, T cells CD8 were the most correlated immune cells (Fig. 13f). Mast cells resting, NK cells activated, Plasma cells, T cells CD8 were shown to be positively associated with HLF. While, T cells gamma delta, B cells naive, Macrophages M0, Macrophages M1, Mast cells activated were shown to be negatively linked with HLF (Fig. 14) (Table S7).Figure 13Immune landscape characterization. (a) Expression of immune function. (b) Expression of immune cells. (c) Correlation rectangle plot. (d) Heatmap. (e) PCA analysis. (f) The expression patterns of Correlation Coefficient.Figure 14Immune infiltration analyses.Identification of common RNAs and construction of miRNAs-LncRNAs shared genes networkThree databases were searched for 80 miRNAs and 84 lncRNAs linked with NSOI (Table S7a-b). The network of miRNAs-lncRNAs-genes was constructed by taking the intersection of them and shared genes (obtained by Lasso regression and SVM-RFE). Finally, the miRNAs-genes network included 73 lncRNAs (RP11-10J21.4, RP5-894D12.5, TTLL10-AS1, AC069257.8, AC079779.7, etc.), 27 miRNAs (hsa-miR-302a-3p, hsa-miR-708-3p, hsa-miR-106a-5p, hsa-miR-181a-5p, etc.) (Fig. 15) (Table S8).Figure 15miRNAs-LncRNAs shared Genes Network. Note: Red circles are mrnas, blue quadrangles are miRNAs, and green triangles are lncRNAs.

Hot Topics

Related Articles