Identification of immune patterns in idiopathic pulmonary fibrosis patients driven by PLA2G7-positive macrophages using an integrated machine learning survival framework

Association between abundant immune cell infiltration, key gene modules, and adverse prognosis in IPF patients: unsupervised clustering analysis across multiple cohortsFigure 1 illustrates the key workflows in our study. In total, three bulk transcriptome datasets, GSE7086620, GSE11014721, and GSE1066722 were used. The GSE70866 dataset included lung lavage fluid samples from 20 healthy individuals and 176 IPF patients, with recorded survival time and status. The GSE110147 dataset included lung tissue samples from 11 healthy individuals and 22 IPF patients. The GSE10667 dataset comprised lung tissue samples from 15 healthy individuals and 31 IPF patients. We commenced our investigation by exploring the existence of distinct immune subtypes within IPF patients. To achieve this, we employed an immune cell signature gene set as previously reported38 and calculated immune cell enrichment scores for each patient in the GSE70866 dataset using the ssGSEA method. Subsequently, we conducted an unsupervised clustering analysis26. The results uncovered two distinct subtypes (Fig. S1A,B) designated as Subtypes A and B among all patients in the GSE70866 dataset (Fig. 2A). This differentiation was further validated through t-SNE analysis, demonstrating a substantial separation between Subtypes A and B based on the reduced-dimension distribution of patients (Fig. 2A). Notably, Subtype B patients exhibited an overall higher degree of immune cell enrichment than Subtype A patients (Fig. 2B). We repeated these analyses on the GSE10667 (Fig. 2C,D) and GSE110147 datasets (Fig. 2E,F). The results validated the presence of two stable subtypes among IPF patients based on immune cell enrichment scores. Kaplan–Meier analysis in the GSE70866 dataset revealed that overall survival was significantly reduced for Subtype B patients than for Subtype A (Fig. 2G). This outcome underscores the substantial association between abundant immune cell infiltration and adverse prognosis in IPF patients.Fig. 1Workflow illustrating the comprehensive process employed in this study.Fig. 2Unsupervised Clustering Analysis Across Multiple Cohorts Reveals a Strong Association Between Abundant Immune Cell Infiltration Regulated by Key Gene Modules and Adverse Prognosis in IPF Patients. (A) Unsupervised clustering and t-SNE dimensionality reduction analyses were performed on IPF patients from the GSE70866 dataset based on immune cell infiltration scores. (B) Heatmaps displaying immune cell infiltration scores for Subtypes A and B in the GSE70866 dataset. (C) Unsupervised clustering and t-SNE dimensionality reduction analyses were conducted on IPF patients from the GSE10667 dataset based on immune cell infiltration scores. (D) Heatmaps displaying immune cell infiltration scores for Subtypes A and B in the GSE10667 dataset. (E) Unsupervised clustering and t-SNE dimensionality reduction analyses were performed on IPF patients from the GSE110147 dataset based on immune cell infiltration scores. (F) Heatmaps displaying immune cell infiltration scores for Subtypes A and B in the GSE110147 dataset. (G) Survival analysis of patients in Subtypes A and B in the GSE70866 dataset. (H) Soft threshold determination. (I) Identification of gene clustering modules. (J) Correlation analysis between gene modules and phenotypes, with pink modules exhibiting a high correlation with subtypes. (K) Correlation coefficient analysis between gene significance (GS) and module membership (MM). (L) KEGG enrichment analysis. (M) GO enrichment analysis. (N) KEGG enrichment analysis.To identify genes highly correlated with these two subtypes, we conducted a weighted gene co-expression network analysis (WGCNA) within the GSE70866 dataset. The soft threshold (β) was set to 5 (Fig. 2H), providing an appropriate power value for constructing a co-expression network. In total, 27 gene modules, each represented by a different color, were identified (Fig. 2I). We evaluated the correlation between each module and the clinical traits of patients, including age, gender, and the subtypes clustered based on the ssGSEA method. The pink module exhibited the highest correlation with the subtype (correlation coefficient = 0.71; Fig. 2J). The correlation coefficient between gene significance (GS) and module membership (MM) reached 0.88 (Fig. 2K), indicating the superior construction quality of the pink module, which included 351 genes. To identify hub genes that influence the formation of immune subtypes, we applied filtering criteria (GS > 0.5 and MM > 0.6) to these genes. Ultimately, we obtained a set of 122 hub genes. To further validate the relevance of these 122 hub genes to the immune system, we conducted enrichment analysis from various perspectives. KEGG enrichment analysis (Fig. 2L) revealed significant enrichment of several immune-related pathways, including Th1 and Th2 cell differentiation; cytokine–cytokine receptor interaction; NF-kappa B, JAK-STAT, and TNF signaling pathways; and antigen processing and presentation. GO enrichment analysis (Fig. 2M) yielded consistent results, with pathways like regulation of type II interferon production, leukocyte activation involved in immune response, regulation of leukocyte differentiation, and lymphocyte-mediated immunity exhibiting significant enrichment. Disease Ontology (DO) enrichment analysis (Fig. 2N) highlighted the significant relevance of these genes to various immune system-related disorders, including primary immunodeficiency disease, acquired immunodeficiency syndrome, autoimmune diseases of the endocrine system, human immunodeficiency virus infectious disease, and COVID-19. Collectively, we identified two entirely distinct immune subtypes within IPF patients. Subtype A patients displayed lower levels of immune cell enrichment and longer overall survival, whereas Subtype B patients exhibited higher immune cell enrichment and shorter overall survival. Furthermore, we identified 122 hub genes highly correlated with these two immune subtypes.Development of an optimal immune-related prognostic model using a multi-machine learning integration frameworkBefore constructing the prognostic model, we conducted a univariate Cox regression analysis on these 122 genes based on survival time and status, thereby identifying 60 prognosis-related genes (Fig. 3A). All of these genes were risk genes significantly associated with poor patient outcomes. Subsequently, to establish training and validation sets, we randomly divided all patients in the GSE70866 dataset into two groups at a 7:3 ratio. Statistical analyses showed no significant difference between the two groups in terms of gender and age (Table 1). The 60 prognosis-related genes were then subjected to a multi-machine learning integration framework. In the training set, we trained the prognostic model, which was subsequently validated in the validation set. Within this multi-machine learning integration framework, ten different machine learning algorithms were employed, namely RSF, Enet, Lasso, Ridge, stepwise Cox, CoxBoost, plsRcox, SuperPC, GBM, and survival-SVM. Additionally, these machine learning algorithms were subjected to combinatorial analysis. Specifically, the first machine learning algorithm was used for gene selection, and the second algorithm was employed for building the prognostic model. During this process, to enhance the model’s robustness across other cohorts, we considered the prognostic model ineffective if the number of genes included was fewer than two. Consequently, a total of 91 prognostic models were constructed, and C-index was calculated for each signature (Fig. 3B). After ranking the average C-index values calculated from the training and validation sets, we identified the top ten prognostic models (Fig. 3C). Among these models, the StepCox [both] + GBM model combination exhibited the highest C-index value in the validation set, reaching 0.681. This result demonstrated that this model combination displayed superior robustness and generalizability, rendering it the best immune-related risk model. Comprising 20 genes (CSGALNACT1, TNFSF14, PTPN7, RGL4, DPP4, CDK6, GZMB, ABLIM1, PBX4, ADA, HOPX, EIF4E3, ITM2C, GNG2, RHOH, MIAT, P2RY8, CD8A, ADAM19, and HMHA1), this model enabled us to calculate an immune-related risk score (IRS) for each patient. The optimal IRS cutoff value for patient stratification was determined using the “survminer” package. Survival analysis results indicated that patients in the high-IRS group had significantly reduced overall survival compared with those in the low-IRS group in both the training and validation sets (Fig. 3D). Furthermore, given that the GSE70866 dataset was originally derived from the merging of three center cohorts (Freiburg, LEUVEN, and SIENA)20, we conducted a concurrent analysis of the prognostic performance of IRS in different center cohorts. Consistent with previous findings, when data was re-stratified into the Freiburg, LEUVEN, and SIENA cohorts, patients in the high-IRS group continued to exhibit significantly reduced overall survival compared with those in the low-IRS group (Fig. 3D). This outcome further validated the robustness of the IRS. Receiver operating characteristic (ROC) curve analysis revealed that in the training cohort, the area under the curve (AUC) values were 0.933, 0.953, and 0.927 at 1, 2, and 3 years, respectively (Fig. 3E). In the validation cohort, the AUC values were 0.773, 0.677, and 0.596 at 1, 2, and 3 years, respectively (Fig. 3E). Across the Freiburg, LEUVEN, and SIENA cohorts, AUC values remained above 0.8 at 1, 2, and 3 years (Fig. 3E). Collectively, we developed the IRS and demonstrated its robust effectiveness, stability, and reliability.Fig. 3Development of an optimal immune-related prognostic model using a multi-machine learning integration framework. (A) Univariate cox regression analysis was employed to select prognostically relevant genes based on survival time and status in the GSE70866 dataset. (B) Presentation of 91 prognostic models integrated through machine learning, along with their corresponding C-index values. (C) Showing the top ten models with the highest average C-index values, StepCox [backward] + GBM was selected as the best model in the validation cohort. (D) Survival analysis of high- and low-IRS patients in different cohorts. (E) ROC analysis of high- and low-IRS patients in different cohorts.Table 1 Sample statistics for the training and validation groups.Inflammatory activation phenotype and metabolic disruptions in High-IRS patientsWe conducted a comprehensive analysis of the relationship between IRS and patient phenotypes and observed a significant positive correlation between IRS and various immune cell enrichment indices. Patients with high IRS exhibited elevated levels of immune cell enrichment, especially in macrophages, neutrophils, mast cells, and regulatory T cells (Fig. 4A). Additionally, a strong positive correlation was noted among several immune cell enrichment indices, suggesting that high-IRS patients might experience acute and intense immune system hyperactivation and immune cell infiltration (Fig. 4B). Moreover, the clustering of samples from the Freiburg, LEUVEN, and SIENA centers in the heatmap revealed uniform distribution patterns as IRS values changed (Fig. 4A). This observation implies the broad applicability of IRS across different centers. Gene set enrichment analysis (GSEA) results indicated that high-IRS patients displayed activation in adaptive immune responses (Fig. 4C,D), inflammatory reactions (Fig. 4E), and white blood cell chemotaxis (Fig. 4F). In terms of metabolic activity (Fig. 4G), IRS showed a significant positive correlation with amino acid metabolism, glycosaminoglycan biosynthesis, and arachidonic acid metabolism. This suggests disruptions in protein synthesis, energy metabolism, extracellular matrix composition or structure, lipid metabolism, and inflammation in high-IRS patients. Conversely, IRS exhibited significant negative correlations with biotin, glyoxylate, dicarboxylate, and butanoate metabolisms.Biotin is a vital coenzyme involved in various enzymatic reactions, and the negative correlation may indicate abnormalities in biotin metabolism in patients with high-IRS scores, potentially affecting metabolism and cellular functions. Glyoxylate and dicarboxylate metabolism, along with butanoate metabolism, are associated with organic acid metabolism, which is crucial for maintaining cellular metabolic balance. The negative correlation suggests metabolic abnormalities in organic acid metabolism among high-IRS patients. Collectively, our results demonstrate that high-IRS patients exhibit intense immune system activation responses and an inflammatory microenvironment. Furthermore, the IRS plays a role in disrupting multiple metabolic pathways.Fig. 4High-IRS Patients Exhibit an Inflammatory Activation Phenotype and Various Metabolic Disruptions. (A) Heatmap illustrating the relationship between IRS expression and immune cell infiltration in the GSE70866 dataset. (B) Correlation analysis between IRS expression and immune cell infiltration in the GSE70866 dataset. (C–F) GSEA. (G) Correlation analysis between IRS and different metabolic pathways in the GSE70866 dataset.Identification of a robust prognostic and diagnostic biomarker, PLA2G7, using IRSTo further assess the utility of IRS, we conducted a differential expression analysis between high- and low-IRS patients, identifying differentially expressed genes (|log2FC| > 1, P-value < 0.05; Fig. 5A). Our analysis recapitulated the expression of well-established genes associated with adverse IPF prognoses, such as SPP139–41 and MMP742,43, thus validating the effectiveness of the IRS. To identify novel biomarkers, we focused on upregulated differentially expressed genes. Initially, 128 differentially expressed genes associated with prognosis, all categorized as risk genes, were identified through univariate Cox regression analysis (Fig. 5B). Subsequently, we calculated the AUC values for these genes in distinguishing healthy samples from IPF samples in the GSE70866, GSE10667, and GSE110147 datasets (Fig. 5C). The top three genes with the highest average AUC values were SPP1, MMP7, and PLA2G7. The value of PLA2G7 in IPF was previously unknown. PLA2G7 exhibited AUC values exceeding 0.8 in all three datasets, with an average AUC of 0.889. Furthermore, the expression of PLA2G7 was significantly elevated in IPF patients compared with that in healthy samples in all three datasets (Fig. 5D–F). Survival analysis in the GSE70866 dataset revealed that patients with high PLA2G7 expression experienced a significant reduction in overall survival (Fig. 5G). Additionally, PLA2G7 expression displayed a significant positive correlation with various immune cell enrichment indices (Fig. 5H), such as macrophages, neutrophils, regulatory T cells, and mast cells, consistent with the results of IRS. Collectively, based on the IRS, we successfully identified PLA2G7 as a novel biomarker. PLA2G7 demonstrated high value for both prognosis and diagnosis, making it a novel candidate in the field of biomarker genes.Fig. 5Identification of a Robust Prognostic and Diagnostic Biomarker, PLA2G7, Using IRS. (A) Identification of differentially expressed genes in high-IRS patients. (B) Univariate cox regression analysis of differentially expressed genes to select prognostically relevant genes. (C) Calculation of ROC diagnostic capabilities for each gene in distinguishing IPF samples from healthy samples in different cohorts. (D) Differential expression of SPP1, MMP7, and PLA2G7 in healthy and IPF samples in the GSE70866 dataset. (E) Differential expression of SPP1, MMP7, and PLA2G7 in healthy and IPF samples in the GSE110147 dataset. (F) Differential expression of SPP1, MMP7, and PLA2G7 in healthy and IPF samples in the GSE10667 dataset. (G) Survival analysis of IPF patients with high and low PLA2G7 expression in the GSE70866 dataset. (H) The association between PLA2G7 gene expression levels and the abundance of immune cells.Validation of PLA2G7-positive macrophages driving inflammatory activation in IPF through single-cell sequencing analysisWe obtained single-cell sequencing data from the GSE128033 dataset in the GEO database44. This dataset comprised single-cell sequencing data from lung tissue samples of eight healthy individuals and eight IPF patients, fresh bronchoalveolar lavage fluid samples from a healthy individual, and frozen bronchoalveolar lavage fluid samples from another healthy individual. To avoid potential bias introduced by sample freezing, the frozen bronchoalveolar lavage fluid sample was excluded. Following quality control, we analyzed a total of 68,064 cells, including 22,121 cells from healthy individuals and 34,943 cells from IPF patients. Using common cell markers (Fig. 6A), we performed dimensionality reduction, clustering, and annotation of these cells, including macrophages, endothelial, T cells, smooth muscle cell (SMC), dendritic cell (DC), B cells, monocytes, epithelial cells, fibroblasts, and mast cells (Fig. 6B). We found that PLA2G7 expression was primarily localized to macrophages, with a significant increase in its expression in IPF patients (Fig. 6C). Subsequently, we conducted a differential analysis of all cell types between IPF and healthy samples (Fig. 6D). Although we observed varying gene expression profiles in different cell types within IPF, all cells exhibited elevated expression of genes associated with inflammatory phenotypes. For instance, macrophages displayed increased expression of SPP1, while endothelial cells exhibited increased expression of CXCL11 and CXCL10. T cells showed elevated expression of CCL4 and IFNG, monocytes and DCs displayed high expression levels of HLA family genes, and epithelial cells exhibited increased expression of MMP7 and S100A2. The expression of these genes was significantly correlated with inflammatory activation pathways. To further validate our results, we processed single-cell sequencing data from Li Wu et al.30 (https://ngdc.cncb.ac.cn/gsa/browse/CRA011039). In this dataset, the lung Cd45+ immune cells were isolated using fluorescence-activated cell sorting from naïve mice (day 0) and at various stages after bleomycin administration: acute inflammatory (D7), profibrotic (D14), and later fibrotic (D21) phases. After dimensionality reduction and cell annotation (Fig. 6E–G), we isolated macrophages (Fig. 6H) and examined the expression level of Pla2g7 (Fig. 6I). As the treatment time extended, the expression level of Pla2g7 gradually increased, and the number of Pla2g7-positive macrophages increased gradually (Fig. 6J). Additionally, we observed a gradual increase in Spp1 levels within macrophages (Fig. 6I), which is consistent with our previous results, further validating the high reliability of our findings.Fig. 6Validation of PLA2G7-positive macrophages driving inflammatory activation in IPF through single-cell sequencing analysis. (A) Heatmap depicting marker gene expression for cell annotation in different cell types. (B) t-SNE dimensionality reduction landscape of single-cell sequencing data. (C) Differential expression of PLA2G7 in IPF and healthy samples. (D) Differential analysis of different cell types in IPF and healthy samples. (E) t-SNE dimensionality reduction landscape of single-cell sequencing data samples from mice. (F) t-SNE dimensionality reduction landscape of cell types in the single-cell sequencing data from mice. (G) Heatmap displaying the marker gene expression in different cell types. (H) t-SNE dimensionality reduction landscape after re-extraction of macrophages in the single-cell sequencing data from mice. (I) Bubble plot exhibiting the expression levels of PLA2G7 and SPP1 in different samples. (J) t-SNE plot displaying the expression levels of PLA2G7 in different mouse samples. (K) In human single-cell sequencing data, violin plot depicting PLA2G7 expression in PLA2G7-positive and PLA2G7-negative macrophages. (L) In human single-cell sequencing data, differential analysis of PLA2G7-positive and PLA2G7-negative macrophages. (M) In human single-cell sequencing data, GSEA of PLA2G7-positive and PLA2G7-negative macrophages. (N) H&E, Masson’s trichrome staining and Multiplex immunofluorescence in a mouse model. (O) Counting of Cd68 + Pla2g7 + cells in a mouse model. (P) ELISA for Pla2g7 expression levels in alveolar lavage fluid and lung tissues of the mouse model.To delve deeper into the impact of PLA2G7 on macrophage function, we extracted macrophages from GSE128033 data for further analysis (Fig. S1C). Increased expression levels of PLA2G7 in macrophages from IPF patients are demonstrated in a violin plot (Fig. S1D). Based on whether macrophages expressed PLA2G7, we classified them into PLA2G7-positive and PLA2G7-negative cells (Figs. 6K, S1E). Subsequent differential analysis (Fig. 6L) revealed that PLA2G7-positive macrophages exhibited high expression of HAMP and CCR5, indicating the involvement of PLA2G7 in inflammation regulation. Elevated expression of KLK4 and TIMP3 in PLA2G7-positive macrophages suggested a correlation with cell proliferation and apoptosis regulation. The high expression of CCR5 and SLAMF7 indicated the involvement of PLA2G7 in promoting immune responses. We further conducted GSEA using the HALLMARK database of pathway gene sets for these two groups of cells (Fig. 6M). The results showed that PLA2G7-positive macrophages exhibited activation in apoptosis, DNA repair, and STAT3-related pathways. We established a mouse model with lung fibrosis by treating them with bleomycin. Using H&E staining and Masson’s trichrome staining, we observed damage in the mouse lung tissues (Fig. 6N, Figure S1F). In comparison to healthy mice, we observed a significant infiltration of macrophages in the lung tissues of bleomycin-challenged mice in this model, along with a notable increase in Pla2g7-positive macrophages (Fig. 6N,O). Furthermore, ELISA analysis of mouse bronchoalveolar lavage fluid and lung tissue samples (Fig. 6P) revealed a significant increase in Pla2g7 concentration compared to healthy samples in bleomycin-challenged mice. These findings strongly support the establishment of the bleomycin-challenged model and suggest a potential role for Pla2g7 in the development of bleomycin-challenged. In summary, through single-cell analysis, we identified a group of PLA2G7-positive macrophages that increased in number in IPF patients and exhibited an inflammatory activation state. This state is significantly associated with the inflammatory characteristics observed in these patients.Celecoxib exerts anti-inflammatory effects by Targeting PLA2G7To enhance the clinical relevance of our study, we conducted a small molecule sensitivity analysis using the Enrichr database (https://maayanlab.cloud/Enrichr/). We identified the top five drugs sensitive to PLA2G7, ranked by P-values. Celecoxib emerged as the most sensitive drug (Table 2). As a nonsteroidal anti-inflammatory drug, celecoxib is primarily used to treat pain and inflammation-related conditions, alleviating pain and swelling during the inflammatory process45. Furthermore, we employed computer simulation techniques for molecular docking, considering PLA2G7 as the protein receptor and celecoxib as the small molecule ligand. Our results demonstrated that celecoxib could stably bind within the protein pocket of PLA2G7 (Fig. 7A) with an affinity of − 9.0 kcal/mol (Fig. 7B). It formed hydrogen bonds with the Gly-154 (3.3 Å) and Gln-352 (2.3 Å) residues in the PLA2G7 protein (Fig. 7A). Surface plasmon resonance analysis revealed that celecoxib could bind to PLA2G7 in a dose-dependent manner (Fig. 7C). The KD value of celecoxib binding to PLA2G7 was 2.607 µM, indicating that this binding is very stable. Additionally, in molecular docking, we also analyzed the affinity between PLA2G7 and four other small molecules as well as the PLA2G7-specific inhibitor darapladib. Our results showed that celecoxib had the best affinity with PLA2G7, even better than darapladib. In addition, we note that both celecoxib and darapladib can bind to GLN-352, and thus their binding to PLA2G7 may be competitive with each other. Collectively, our findings substantiate the potential of celecoxib to exert anti-inflammatory effects by targeting PLA2G7.Table 2 Top five drugs that bind to PLA2G7 identified through drug sensitivity analysis.Fig. 7Celecoxib exerts anti-inflammatory effects by targeting PLA2G7. (A) Schematic representation of the binding mode of molecules with PLA2G7. (B) Affinity between small molecules and PLA2G7. (C) Surface plasmon resonance analysis of celecoxib and PLA2G7. Celecoxib, as the ligand, was dissolved and injected at concentrations of 0 µM, 0.19 µM (purple), 0.39 µM (black), 0.78 µM (blue), 1.56 µM (orange), 3.13 µM (green), and 6.25 µM (red).

Hot Topics

Related Articles