Exosome-related gene identification and diagnostic model construction in hepatic ischemia-reperfusion injury

Technology roadmap (Fig. 1).Fig. 1Technology roadmap. PCA principal component analysis, GSEA gene set enrichment analysis, GSVA gene set variation analysis, DEGs differentially expressed genes, ERGs exosome-related genes, ERDEGs exosome-related differentially expressed genes, GO Gene Ontology, KEGG kyoto encyclopedia of genes and genomes, LASSO least absolute shrinkage and selection operator, SVM support vector machine, ROC receiver operating characteristic, RBP RNA-binding protein, TF transcription factor, Es Exosome scores.Data collection and correctionAfter combining three HIRI datasets to obtain the CGD, (Fig. 2A,B) the different expression values of the datasets were compared using a distribution boxplot. A PCA plot (Fig. 2C,D) was constructed to compare the distribution of low-dimensional features for the datasets before and after removing batch effects. The batch effect of the samples in the HIRI dataset was largely eliminated after removal.Fig. 2Debatching of the dataset. (A,B) Boxplot plot of CGD before (A) and after (B) normalization. (C,D) PCA plots of CGD before (C) and after (D) batch effect removal. Red represents the dataset GSE12720, purple represents the dataset GSE14951, and blue represents the dataset GSE15480. PCA principal component analysis, GEO gene expression omnibus, CGD combined GEO datasets.Exosome-related differentially expressed genes associated with HIRIThe data from the CGD were divided into the HIRI and normal groups. The gene expression between the two groups was analyzed using the R package limma, to perform differential analysis for DEG identification. A total of 3,810 DEGs (|logFC| > 0; p < 0.05) were identified in the CGD, with 1,503 upregulated (logFC > 0; p < 0.05) and 2,307 downregulated (logFC < 0; p < 0.05) DEGs. A volcano plot was constructed using these results (Fig. 3A).Fig. 3Differential gene expression analysis. (A) Volcano map of differentially expressed genes between HIRI and Normal group in CGD. (B) Venn diagram of DEGs and ERGs in CGD. (C) ERDEGs in CGD. DEGs differentially expressed genes, ERGs exosome-related genes, ERDEGs exosome-related differentially expressed genes, GEO gene expression omnibus, CGD combined GEO datasets.DEGs were intersected with ERGs to identify ERDEGs, and the relevant results were used to generate a Venn diagram (Fig. 3B). The 61 ERDEGs are listed in Table S2. We analyzed the differences in the expression of ERDEGs in the CGD for different groups. The results were used to generate a differential ranking map (Fig. 3C), which was visualized using the R package pheatmap.GO and KEGG enrichment analysesThe 61 ERDEGs were subjected to GO and KEGG enrichment analyses to determine their biological impact on HIRI (Table 1). In terms of BPs, the ERDEGs were mainly enriched in wound healing, positive regulation of MAPK cascade, positive regulation of MAPK cascade, regulation of protein kinase B signaling, and protein kinase B signaling. In terms of CCs, the ERDEGs were enriched in focal adhesion, cell-substrate junction, melanosome, and pigment granule. In terms of MFs, the ERDEGs were enriched in GTPase activity, GTP binding, guanyl nucleotide binding, and guanyl ribonucleotide binding.Table 1 Results of GO and KEGG enrichment analysis for ERDEGs.The ERDEGs were also enriched in endocytosis, Salmonella infection, the PI3K-Akt signaling pathway, tight junctions, Pathogenic Escherichia coli infection, proteoglycans in cancer, lipid and atherosclerosis, regulation of the actin cytoskeleton, and other KEGG pathways. The results are also presented as bubble plots (Fig. 4A, B). Finally, a network diagram, including BP, CC, MF, and KEGG, was constructed (Fig. 4C, D). The results of GO and KEGG enrichment analyses of the combined logFC are shown in a bar chart (Fig. 4E, F).Fig. 4GO and KEGG enrichment analyses of the ERDEGs. (A,B) Bubble Diagram of GO and KEGG Enrichment Analysis for ERDEGs: BP, CC, MF and KEGG. GO terms and KEGG terms are shown on the abscissa. In the bubble plot, the size of the bubble corresponds to the number of genes, while the color represents the magnitude of the p value. A deeper red color indicates a smaller p value, and a bluer color signifies a larger p value. (C,D) Network Diagram of GO and KEGG Enrichment Analysis for ERDEGs, this diagram display BP, CC, MF (C), and KEGG (D). Red nodes represent terms, blue nodes represent molecules, and the lines represent the relationship between terms and molecules. (E,F) Bar graph of GO and KEGG enrichment analysis for ERDEGs. The criteria for GO KEGG) enrichment analysis were set at p.value < 0.05 and FDR value (q value) < 0.05. ERDEGs exosome-related differentially expressed genes, GO Gene Ontology, KEGG kyoto encyclopedia of genes and genomes, BP biological process, CC cellular component, MF molecular function.Two KEGG pathways (endocytosis (Fig. 5A) and proteoglycans in cancer (Fig. 5B)) in the KEGG enrichment results were visualized using the R package Pathview.These pathway maps utilized information from the KEGG Pathway Database15,16,17.Fig. 5KEGG enrichment analysis of the ERDEGs. (A,B) The pathway map of KEGG enrichment analysis for ERDEGs (Endocytosis (A), Proteoglycans in cancer (B)) was shown. ERDEGs exosome-related differentially expressed genes, KEGG kyoto encyclopedia of genes and genomes. These pathway maps utilized information from the KEGG Pathway Database15,16,17.GSEA and GSVAGSEA was performed to determine the effect of gene expression in the CDG on HIRI. The BPs involved and correlation between the affected CCs and MFs were calculated (Fig. 6A; Table 2). All genes from the CGD were significantly enriched in the TP53 (Fig. 6B), MAPK (Fig. 6C), TGF\(\:\beta\:\) (Fig. 6D), JAK − STAT (Fig. 6E), NFKB (Fig. 6F), and other biological functions and signaling pathways.Table 2 Results of GSEA for combined datasets.Fig. 6GSEA for the combined datasets. (A) Presentation of five biological function mountain maps by GSEA for geneset in CGD. (B–F) GSEA showed that genesets in CGD were significantly enriched in TP53 pathway (B), MAPK pathway (C), TGFbeta pathway (D), JAK-STAT pathway (E). NFKB pathway (F). (G) Heat map of GSVA results between different groups of CGD. In the heat map, blue represents down-regulation and red represents up-regulation. The threshold value of GSVA were p value < 0.05 and FDR value (q value) < 0.25. Red represents the HIRI group and blue represents the Normal group. The threshold value of GSEA were p.value < 0.05 and FDR value (q value) < 0.25. GSEA gene set enrichment analysis, GSVA gene set variation analysis, HIRI hepatic ischemia reperfusion injury, CGD combined GEO datasets.The GSVA results based on all genes in the CGD were used to determine the differences in gene sets (h.all.v7.4. symbols.gmt) between the normal and HIRI groups (Table 3). Bile acid metabolism, peroxisome, fatty acid metabolism, and heme metabolism differed significantly between the two groups (p < 0.05). Differential expression between the groups, according to the GSVA results, was analyzed and visualized using a heat map (Fig. 6G).Table 3 Results of GSVA for combined datasets.Diagnostic model for HIRIA univariate logistic regression model was generated based on the 61 ERDEGs to determine their diagnostic value in HIRI; 56 ERDEGs were statistically significant (p < 0.05; Table S3).Based on these, a LASSO shrinkage and selection operator regression model was constructed for the diagnosis of HIRI. LASSO regression model (Fig. 7A) and LASSO variable trajectory (Fig. 7B) were used for visualization. The LASSO regression model included 16 ERDEGs (model genes), ADRA1B, ANXA1, ARPC5L, DPP4, GTF2I, HNRNPA2B1, HSP90AA1, ICAM1, PTEN, RAB27A, RAB27B, RAB5A, TGFBR1, THBS1, TLR4, and TUG1. The SVM model was established using 56 ERDEGs and the SVM algorithm. The number of genes that yielded the lowest error rate (Fig. 7C) and highest accuracy (Fig. 7D) was 7. Finally, we identified seven ERDEGs, ICAM1, PTEN, THBS1, ARF6, HNRNPA2B1, ANXA1, and STAT3.Fig. 7Construction of the diagnostic model for hepatic ischemia-reperfusion injury. (A) Diagram illustrating the LASSO regression diagnostic model for ERDEGs inCGD. (B) Plot of variable trajectories of the LASSO diagnostic model. (C) Identification of the number of genes with the lowest error rate obtained by SVM algorithm. (D) Identification of the number of genes with the highest accuracy using the SVM algorithm. (E) Venn Diagram showing the intersection of genes obtained by both the LASSO and SVM algorithms. LASSO least absolute shrinkage and selection operator, SVM support vector machine, ERDEGs exosomes-associated differentially expressed genes, CGD combined GEO datasets.Five ERDEGs—ANXA1, HNRNPA2B1, ICAM1, PTEN, and THBS1—identified using the LASSO regression and SVM models were identified as key genes for subsequent research and are represented in a Venn diagram (Fig. 6E).Diagnostic model validationBased on the model genes from the CGD, the HIRI diagnostic model was validated using a nomogram (Fig. 8A). The effectiveness of the model gene (HNRNPA2B1) in the HIRI diagnosis model was significantly higher than that of the other variables.Fig. 8Diagnostic and validation analysis of HIRI. (A) Nomogram of model genes in CGD for the diagnostic model of HIRI. (B,C) Calibration curve plot (B) and DCA plot (C) of model genes in CGD for the Diagnostic Model of HIRI. (D) ROC analysis of linear predictors of Logistic regression models. (E) Heat map of GSVA results between High and Low groups of linear predictors of Logistic regression model. Blue represents down-regulation and red represents up-regulation in the heat map. The screening criteria of GSVA were p value < 0.05 and FDR value (q value) < 0.25. Red represents the High group of linear predictors of the Logistic regression model, and blue represents the Low group of linear predictors of the Logistic regression model. The ordinate of the Calibration Curve plot represents the net benefit, while the abscissa corresponds to the Threshold Probability. A value of AUC greater than 0.9 indicates a high level of accuracy. AUC area under the curve, DCA decision curve analysis, ROC receiver operating characteristic, GEO gene expression omnibus, GSVA gene set variation analysis, HIRI hepatic ischemia-reperfusion injury, CGD combined GEO datasets.A calibration curve was generated, and the performance of the HIRI model in predicting the actual outcomes was evaluated using the fit between the predicted and actual probabilities under different conditions (Fig. 8B). The calibration line deviated slightly from the ideal diagonal model, but was a relatively close fit. A DCA plot was generated to assess the clinical practicability of the HIRI diagnostic models based on the model genes from the CGD (Fig. 8C). The line of the model remained stable within a range and was higher than all positives and negatives. The model offered a higher net benefit, indicating its good performance. ROC curves of linear predictors for the logistic regression models in different groups from the CGD also revealed good diagnostic effects (Fig. 8D).To explore the difference in gene sets (h.all.v7.4.symbols.gmt) between the linear predictors in the high and low groups in the logistic regression model, GSVA was performed on all genes from the CGD. TNF-α signaling (NF-kB), inflammatory response, apoptosis, and others differed significantly between the high and low groups of linear predictors of the regression model (p < 0.05). Differential expression between the linear predictors in the high and low groups in the logistic regression model was analyzed and visualized using a heat map (Fig. 8E).mRNA-RBP, mRNA-TF and mRNA-drug interaction networksmRNA-RBP data from the ENCORI database were used to predict RBP interactions with the key genes. Using the Cytoscape software, we constructed an mRNA-RBP regulatory network (Fig. 9A), which included 54 RBP molecules and 83 mRNA-RBP interaction relationships (Table S4).Fig. 9Regulatory network of key genes. (A) mRNA-RBP regulatory network of key genes. (B) mRNA-TF regulatory network of key genes. (C) mRNA-drug regulatory network of key genes. (D) Key Genes predict the interaction network of genes with similar functions. Circular nodes represent genes, and the size is determined by the attributes and characteristics of the genes. Lines represent relationships, interactions or functional connections between genes, and line thicknesses represent strong associations or important interactions. Yellow is mRNA, purple is RBP, green is TF, and blue is Drug. TF transcription factor, RBP RNA-binding protein.The TFs binding to the five key genes were obtained from the ChIPBase database. In addition, the mRNA-TF regulatory network (Fig. 9B), comprising 4 key genes (ANXA1, HNRNPA2B1, ICAM1, THBS1) and 16 TFs (Table S5), was constructed and visualized using Cytoscape.Following analysis of the CTD, we identified potential drugs or compounds associated with the key genes. An mRNA-drug regulatory network was generated using the Cytoscape software (Fig. 9C). Four key genes (ANXA1, ICAM1, PTEN, and THBS1) and 17 drugs or molecular compounds were identified (Table S6). Finally, the GeneMANIA website was used to predict genes similar to our key genes, and an interaction network was generated to visualize their physical interactions, shared protein domains, and gene interactions (Fig. 9D).Differential expression analysis of key genes between the HIRI and normal groups in the CGDThe violin plot (Fig. 10A) revealed differences in the expression of the five key genes between the HIRI and normal groups in the CGD. Four key genes (ANXA1, ICAM1, PTEN, and THBS1) showed an highly significant difference between the two groups (p < 0.001). The expression of the other key gene, HNRNPA2B1, was highly significantly different between the two groups (p < 0.01). A correlation analysis was performed and a correlation heat map was generated according to the complete expression matrix of the five key genes in the CGD (Fig. 10B). The key genes, ICAM1 and ANXA1, and ICAM1 and THBS1, were positively correlated; and PTEN and ICAM1, and PTEN and THBS1 were negatively correlated.Fig. 10Differential expression analysis of key genes between the HIRI and normal groups in the CGD. (A) The group comparison diagram illustrates the expression patterns of key genes in the Normal versus the HIRI group in CGD. (B) The correlation heat map of the key genes in CGD. (C) Functional similarity analysis of Key Genes. (D) Chromosomal localization map of Key Genes in human. (E–I) ROC curve analysis of Key Genes ICAM1 (E), ANXA1 (F), PTEN (G), THBS1 (H), and HNRNPA2B1 (I) in CGD. The symbol *** is equivalent to P < 0.001 and extremely statistically significant. The symbol ** represents P < 0.01, indicating highly statistical significance. The symbol * represents P < 0.05, indicating statistical significance. The closer the AUC is to 1 on the ROC curve, the better the diagnostic performance. An AUC above 0.9 indicates high accuracy. An AUC between 0.7 and 0.9 suggests moderate accuracy. ROC receiver operating characteristic, AUC area under curve. Red represents the HIRI group in the dataset and blue represents the Normal group in the dataset, GEO gene expression omnibus, HIRI hepatic ischemia reperfusion injury, CGD combined GEO datasets.A functional similarity analysis was performed using the five key genes. GO terms, sets of GO terms, semantic similarity among gene products, and gene clusters were calculated using the R package GOSemSim. Functional similarity analysis between the five key genes was obtained and visualized using a boxplot (Fig. 10C). ANXA1 had the highest functional similarity among the key genes.The positions of the five key genes were visually represented in the human chromosome using a visual circle diagram (Fig. 10D). HNRNPA2B1, ANXA1, PTEN, THBS1, and ICAM1 was located on chromosome 7, 9, 10, 15, and 19, respectively.The ROC curves were plotted for each key gene (Fig. 10E–I). The expression differences of the key gene, ICAM1, had high accuracy between the groups (AUC > 0.9) and that of the other four genes (ANXA1, HNRNPA2B1, PTEN, and THBS1) also had considerable accuracy (0.7 < AUC < 0.9).Immune infiltration analysis (ssGSEA and MCPCounter)We assessed the correlation between the expression profiles of 28 types of immune cells in the HIRI and normal groups based on the CGD using ssGSEA. Based on immune infiltration analysis and group comparison boxplots, we analyzed and visualized the infiltration abundance of 28 types of immune cells in two groups from the CGD (Fig. 11A). The expression of 18 types of immune cells in the two groups in the CGD was significant (p < 0.05), including activated CD4 T cell, activated dendritic cell, CD56dim natural killer cell, central memory CD4 T cell, central memory CD8 T cell, eosinophil, immature dendritic cell, macrophage, mast cell, MDSC, memory B cell, natural killer T cell, neutrophil, plasmacytoid dendritic cell, regulatory T cell, T follicular helper cell, Type 1 T helper cell, and Type 17 T helper cell.Fig. 11Immune Infiltration analysis (ssGSEA and MCPCounter). (A) Group comparison graph for 28 types of immune cells in different groups in CGD by ssGSEA. (B) Heat map of correlation analysis between Key Genes and the infiltrating abundance of immune cells by ssGSEA. (C) Heat map of correlation analysis between Key Genes and immune cell infiltration abundance by MCPCounter. In the correlation heat map, the red circle represents the positive correlation between the genes and the infiltration abundance of immune cells. The larger the circle is, the stronger the correlation is. Blue circles represent the negative correlation between genes and the infiltrating abundance of immune cells, and the larger the circle, the stronger the correlation. The symbol ns is equivalent to P < 0.05,indicating no statistical significance. The symbol * is equivalent to P < 0.05, indicating statistical significance. The symbol ** is equivalent to P < 0.01, indicating a highly statistical significance. The symbol *** is equivalent to P < 0.001, indicating a extremely statistical significance. ssGSEA single-sample gene-set enrichment analysis, MCPCounter microenvironment cell populations-counter, HIRI hepatic ischemia reperfusion injury, CGD combined GEO datasets.The correlation heatmap (Fig. 11B) revealed the relationships between the infiltration abundances of 18 types of statistically significant immune cells (p < 0.05) and the expression of key genes. ANXA1 was positively correlated, whereas PTEN was negatively correlated with most immune cells.We used the MCPCounter algorithm to quantify the five key genes and the immune cell infiltration abundance in the two groups in the CGD (Fig. 11C). These genes were associated with ten immune cell types (B lineage, cytotoxic lymphocytes, endothelial cells, fibroblasts, monocytic lineage, myeloid dendritic cells, neutrophils, NK cells, T cells). In particular, ANXA1 was positively correlated with neutrophils and negatively correlated with T cells; HNRNPA2B1 was negatively correlated with the B lineage; and THBS1 was positively correlated with neutrophils.Exosome scoreBased on the expression of the five key genes in the HIRI group from the CGD, the Es of each patient with HIRI was obtained using ssGSEA.The HIRI group was divided into Es_High and Es_Low groups based on the median Es and the expression of different genes in the two groups was analyzed using the R package limma. We performed GSEA to explore the expression patterns, BPs, and relationship of affected CCs and MFs to understand the impact of all gene expressions in the CGD on HIRI (Fig. 12A). All genes in the CGD were significantly enriched in the IL18 (Fig. 12B), TP53 (Fig. 12C), MAPK (Fig. 12D), TGF\(\:\beta\:\) (Fig. 12E), JAK-STAT (Fig. 12F), and other biologically related functions and signaling pathways.Fig. 12GSEA for combined datasets. Five biological function mountain maps of GSEA in CGD were presented (B–F). GSEA showed that ERDEGs were significantly enriched in IL18 pathway (B), TP53 pathway (C), MAPK pathway (D), TGFbeta pathway (E). JAK-STAT pathway (F). The screening criteria of GSEA were p.value < 0.05 and FDR value (q value) < 0.25. GSEA Gene set enrichment analysis, ERDEGs exosomes-associated differentially expressed genes, GEO gene expression omnibus, CGD combined GEO datasets.Prediction of the protein domainWe obtained the protein structures of the five key genes from AlphaFoldDB (Fig. 13A–E; ANXA1 [Fig. 13A], HNRNPA2B1 [Fig. 13B], ICAM1 [Fig. 13C], PTEN [Fig. 13D] and THBS1 [Fig. 13E]). All five key genes had high confidence in their major domains (pLDDT > 90).Fig. 13Protein structure of the model genes. (A–E) Protein domains of Key Genes ANXA1 (A), HNRNPA2B1 (B), ICAM1 (C), PTEN (D) and THBS1 (E) are shown. When pLDDT < 50, the predicted structure has low confidence. When 50 < pLDDT < 70, the predicted structure has moderate confidence. When 70 < pLDDT < 90, the predicted structure has high confidence. When pLDDT > 90, the predicted structure has extremely high confidence. AlphaFoldDB AlphaFold protein structure database, pLDDT predicted local distance difference test.H/R modelTo validate our analysis, we established an H/R model in AML12 cells and examined the relative mRNA expression of the five genes. Consistent with the bioinformatics results, the mRNA expression of PTEN was significantly downregulated in H/R cells, whereas that of the other four genes were significantly upregulated (Fig. 14A–E). This result provides strong evidence of the involvement of these genes in HIRI. Overall, these key genes may contribute significantly to the pathogenesis of HIRI.Fig. 14Verification of different expression levels of the five key genes in the H/R model. (A–E) Relative mRNA levels of ANXA1 (A), HNRNPA2B1 (B), ICAM1 (C), PTEN (D) and THBS1 (E) between H/R and control group were shown in a vivo experiment. All data were calculated as mean ± SD. The symbol * is equivalent to P < 0.05, indicating statistical significance. The symbol ** is equivalent to P < 0.01, indicating a highly statistical significance. The symbol *** is equivalent to P < 0.001, indicating a extremely statistical significance.

Hot Topics

Related Articles