Identification of PANoptosis-related genes for idiopathic pulmonary fibrosis by machine learning and molecular subtype analysis

Core module genes of IPFIn the GSE33566 dataset, the “WGCNA” R package was used to analyze the genes of IPF and HC group-related modules. The dataset included 93 samples from IPF patients and 30 samples from healthy controls, with a dendrogram illustrating the clustering of samples in Fig. 2A. A soft threshold of 5 was applied, resulting in an R2 value exceeding 0.9, indicating the presence of a scale-free network of gene connectivity as depicted in Fig. 2B. After the modules were merged, a total of 17 co-expression modules were identified, as shown in Fig. 2C. The brown and green modules displayed the strongest positive (r = 0.39, p = 7e-06) and negative (r = -0.32, p = 4e-04) correlations with IPF as depicted in Fig. 2D. The brown module consisted of 451 genes, while the green module consisted of 461 genes. 912 genes from the brown and green modules were identified as core genes linked to IPF for further examination.Fig. 2WGCNA analysis of the GSE33566 dataset. (A) Sample clustering tree diagram. (B) Soft threshold analysis. (C) Genes with differential expression measured were clustered in the dendrogram. (D) Heatmap of the module–trait relationship. Each cell displays the corresponding correlation and p-value. IP: idiopathic pulmonary fibrosis; HC: healthy control.Differentially expressed genes between IPF and HCIn the GSE33566 dataset, DEGs between IPF and HC were screened by the “limma” package. The results revealed 40 upregulated genes and 74 downregulated genes(Table S2). The volcano plot illustrated the top ten upregulated and downregulated genes (Fig. 3A), while the heatmap visualized the top 30 upregulated and downregulated genes (Fig. 3B). The GO and KEGG enrichment analysis indicated that a majority of the DEGs were enriched in various biological processes, including cytosolic ribosome, leukocyte activation, ribosome, cellular response to cytokine stimulus, positive regulation of cell death, regulation of immune response, and cytokine. The findings suggested a notable association between the pathogenesis and progression of IPF and immune inflammation, as illustrated in Fig. 3C and D.Fig. 3Identification of differentially expressed genes in GSE33566. (A) Volcano map of the DEGs. Blue dots represented downregulated DEGs, red dots represented upregulated DEGs and grey dots showed genes with no significant difference. (B) Heatmap of the first 30 up-regulated and down-regulated DEGs. (C) GO enrichment analysis of the DEGs. (D) KEGG enrichment analysis of the DEGs.Identification of PDEGs and core diagnostic markers of IPFThe overlap of PANoptosis-related genes, DEGs of IPF, and the most relevant module genes of WGCNA were defined as PANoptosis-related differentially expressed genes (PDEGs), consisting of 9 genes: MMP9, FCMR, TNFRSF25, HSPA1A, ADM, NIBAN3, CCR7, CD19, IL7R (Fig. 4A). The relationships among PDEGs were investigated through Spearman correlation analysis. The findings revealed a strong positive correlation between NIBAN3 and CD19, FCMR, as well as a notable negative correlation between MMP9 and TNFRSF25, FCMR, IL7R (Fig. 4B).The Boruta algorithm was utilized to assess the significance of PDEGs, leading to the identification of the top five genes as MMP9, FCMR, HSPA1A, ADM, and NIBAN3 (Fig. 4C). Subsequently, SVM-RFE identified the four most significant targets as MMP9, FCMR, ADM, and NIBAN3 (Fig. 4D). The LASSO algorithm was then applied to reduce dimensionality, resulting in the selection of MMP9, FCMR, HSPA1A, and NIBAN3 as the key genes (Fig. 4E-F). Genes with a gini coefficient exceeding 5 were chosen as core genes through the RF approach (Fig. 4G). The final core diagnostic markers (MMP9, FCMR, and NIBAN3) were determined through the intersection of the four machine learning algorithms (Fig. 4H). GeneMANIA was employed to forecast genes that were functionally similar to the core diagnostic markers and to create a protein interaction network depicting genes that interacted with the core diagnostic markers. Each node in the diagram represented a gene, with various colors indicating different functions (Fig. 4I). A multivariate logistic regression model was used to develop a nomogram of core diagnostic markers and clinical data to predict the diagnosis of IPF(Fig. 4J). ROC curve analysis was conducted to determine the diagnostic performance of core diagnostic markers, with AUC = 76.8%, gender + age + core markers AUC = 93.3% (Fig. 4K).Fig. 4Identifying of core diagnostic markers of IPF. (A) Venn diagram of PDEGs. (B) Correlation map of PDEGs. (C-G) Core diagnostic markers screening in Boruta, SVM, LASSO and RF algorithms. (H) Venn diagram of four algorithms selected core diagnostic markers of IPF. (I) PPI network of core diagnostic markers. (J) Nomogram of core diagnostic markers and clinical data. (K) ROC of the predictive capacity of core diagnostic markers.Validation of core diagnostic markers expression in IPFWe validated the expression and diagnostic performance of three core diagnostic biomarkers in both training and validation datasets. Specifically, MMP9 was upregulated in IPF, while FCMR and NIBAN3 were downregulated in the training dataset GSE33566. The differences were statistically significant (Fig. 5A). Receiver operating characteristic (ROC) curve analysis revealed that MMP9 (AUC = 0.834), FCMR (AUC = 0.812), and NIBAN3 (AUC = 0.747) exhibited notable diagnostic values between IPF and HC groups (Fig. 5B). Furthermore, we confirmed the expression of the three genes in the validation set GSE93606, and the results were consistent with those from the training set. Although the expression of NIBAN3 in the validation set lacked statistical significance, its expression trend was consistent (Fig. 5C). ROC curve analysis demonstrated the reliability of MMP9 (AUC = 0.817), FCMR (AUC = 0.792), and NIBAN3 (AUC = 0.591) as core diagnostic biomarkers for IPF (Fig. 5D).Fig. 5Expression of the core diagnostic markers. (A) Expression of MMP9, FCMR, NIBAN3 in GSE33566. (B) ROC curve of MMP9, FCMR, NIBAN3 in GSE33566. (C) Expression of MMP9, FCMR, NIBAN3 in GSE93606. (D) ROC curve of MMP9, FCMR, NIBAN3 in GSE93606.Correlation analysis between core diagnostic markers and immune cellsWe examined the relationship between core diagnostic biomarkers and immune infiltrating cells by Spearman correlation analysis. Our findings revealed a statistically significant positive correlation between MMP9 and innate immune response cells, such as neutrophils, eosinophils, megakaryocytes, monocytes, and basophils (p < 0.001), as well as a significant negative correlation with adaptive immune response cells, including T cells, B cells, and Plasma cells (p < 0.001) (Fig. 6A). On the other hand, FCMR and NIBAN3 displayed divergent trends, showing a notable positive association with adaptive immune cell types such as T cells and B cells (p < 0.001), as well as a significant negative relationship with innate immune cells such as monocytes, neutrophils, and macrophages (p < 0.001) (Fig. 6B-C).Fig. 6Correlation analysis beween core diagnostic markers and immune cells. (A) Correlation analysis beween MMP9 and immune cells. (B) Correlation analysis beween FCMR and immune cells. (C) Correlation analysis beween NIBAN3 and immune cells.Subtypes driven by PDEGs expression in IPFThe ComBat algorithm was utilized for the integration and batch correction of blood transcriptomic sequencing samples from four datasets (GSE33566, GSE132607, GSE28042, GSE93606), leading to the formation of a substantial cohort consisting of 298 patients of IPF. The effectiveness of batch correction was assessed through principal component analysis (PCA) (Fig. 7A-B). Subsequently, cluster analysis was performed using the k-means clustering method within the ConsensusClusterPlus function, with the optimal clustering identified at k = 2 based on Cumulative Distribution Frequency (CDF) values and relative changes in the area under the CDF curve (Fig. 7C-D). PCA identified notable variations in transcriptomic expression between the two subtypes(Fig. 7E). Furthermore, the heatmap illustrated distinct expression patterns of PDEGs between the two subtypes. Specifically, NIBAN3, FCMR, IL7R, CD19, CCR7, and TNFRSF25 were found to be upregulated in cluster 1, while ADM, MMP9, and HSPA1A were upregulated in cluster 2 (Fig. 7F). We compared the survival differences between the two clusters and found that the overall survival rate of cluster 1 was higher than that of cluster 2 (p = 0.0041)(Fig. 7G, Table S3). Finally, we also analyzed the differences in clinical indexes-FVC, DLCO, gender, and age between the two clusters by Sankey diagram. The patients in the cluster 2 cohort exhibited moderate to severe FVC and severe DLCO, predominantly consisting of elderly male individuals(Fig. 7H, Table S4).To delineate the differences between the two subtypes of IPF, we performed differential gene expression analysis and enrichment study. A total of 203 DEGs were identified, with 137 genes showing upregulation and 66 genes showing downregulation(Table S5). Enrichment analysis of the DEGs indicated significant involvement in immune-inflammatory responses, such as leukocyte activation, inflammatory response, cellular response to cytokine stimulus, and positive regulation of immune response (Fig. 7I), as well as a notable enrichment in KEGG signaling pathways such as cytokine-cytokine receptor interaction, neutrophil extracellular trap formation, and T cell receptor signaling pathway (Fig. 7J), indicating a close association of both IPF subtypes with immune-inflammatory processes.Fig. 7PANoptosis molecular features of IPF subtypes. (A) Principal component distribution map before batch correction of four data series. (B) Principal component distribution map after batch correction of four data series. (C) Heatmap of consensus matrix at k = 2. (D) CDF curve of clustered samples. (E) PCA plot showing two subtypes after consensus clustering. (F) Heatmap showing gene expression differences between subtypes. (G) Kaplan-Meier survival curve between subtypes. (H) Sankey diagram showed the difference of clinical indexes between different subtypes. (I) GO enrichment analysis of the DEGs between subtypes. (J) KEGG enrichment analysis of the DEGs between subtypes.Characterization of PANoptosis-related subtypes in IPFThe ssGSEA algorithm was utilized to perform a comparative analysis of immune cell expression levels among different subtypes of IPF. The findings revealed distinct immune cell profiles within the identified clusters. Cluster 1 demonstrated adaptive immune cell compositions with elevated levels of T cells and B cells, while Cluster 2 exhibited innate immune cell profiles characterized by increased infiltration of Neutrophils, Natural Killer cells, Macrophages, and Mast cells (Fig. 8A). To assess the effectiveness of immunotherapy in two subtypes, the sensitivity was determined by examining the levels of various immune checkpoints, indicating that cluster 1 displayed heightened sensitivity to immunotherapy. This was supported by the elevated expression levels of BTLA, CD226, CD27, CD28, CD40, HLA-DQA1, TIGIT, and SIRPA in cluster 1 (Fig. 8B). The results might explain that Cluster1 had a higher survival rate than Cluster2.The GSVA pathway enrichment analysis revealed that cluster 1 was primarily associated with DNA damage and repair. The results were supported by GO enrichment in ribosomal subunits (Fig. 8C), Hallmark enrichment in DNA repair and oxidative phosphorylation pathways (Fig. 8D), KEGG enrichment analysis showing DNA replication (Fig. 8E), and Reactome enrichment in mitochondrial RNA degradation (Fig. 8F). On the other hand, Cluster 2 was predominantly linked to lipid cholesterol metabolism and vascular remodeling pathways, as evidenced by Hallmark enrichment in cholesterol homeostasis and angiogenesis pathways (Fig. 8D), KEGG enrichment analysis indicating glycerophospholipid metabolism (Fig. 8E), and Reactome enrichment in alpha-linolenic omegas and linoleic omega acid metabolism (Fig. 8F).Fig. 8Molecular features of subtypes. (A) Expression abundance of immune cells in different subtypes. (B) Sensitivity analysis of immune checkpoints in different subtypes. (C) GO analysis in different subtypes. (D) Hallmark gene sets analysis in different subtypes. (E) KEGG analysis in different subtypes. (F) Reactome gene sets analysis in different subtypes. * P < 0.01, * * P < 0.001和 * * * P < 0.0001.Construction of PANoptosis diagnostic models in IPFThree machine learning algorithms, specifically generalized linear model (GLM), random forest (RF), and support vector machine (SVM) were employed to develop diagnostic models for PANoptosis in IPF. The combined datasets were partitioned randomly into training and validation sets at a 5:5 ratio. The area under ROC curves were used to evaluate the performance of the models. The training set yielded AUC values of 100%, 81.84%, and 84.03% for RF, SVM, and GLM, respectively (Fig. 9A). In the validation set, the AUC values for RF, SVM, and GLM were 74.99%, 72.05%, and 73.39%, respectively (Fig. 9B). The results indicated that the machine learning models of RF, SVM, and GLM performed well in terms of diagnostic accuracy, with RF showing the highest performance.Fig. 9Construction of the PANoptosis diagnosis models for IPF patients. (A) AUC curves of the diagnosis models in the training set. (B) AUC curves of the diagnosis models in the validation set.Potential therapeutic drugs for IPF patients based on PDEGsArsenic trioxide and dinoprostone were obtained from the CTD database by searching for IPF and 9 PDEGs, providing new choices for the clinical treatment of IPF patients in the future (Table S6).

Hot Topics

Related Articles