Investigating the molecular mechanisms underlying the co-occurrence of Parkinson’s disease and inflammatory bowel disease through the integration of multiple datasets

Identification of differentially expressed genes associated with IBD and PDThe sva package was employed to correct batch effects in the datasets (GSE7621, GSE75214). Subsequently, the gene names corresponding to the probes were obtained using the GPL570 and GPL6244 probe annotation files for PD and IBD datasets, respectively. Redundant probes mapping to the same gene were removed, resulting in a total of 22,880 genes for the PD dataset and 41,115 genes for the IBD dataset. Differential gene expression analysis was performed between PD patients and healthy controls using a threshold of fold change ≥ 1 and p-value ≤ 0.05. This analysis identified 199 differentially expressed genes, including 25 upregulated and 174 downregulated genes (Fig. 2A). A heat map was generated to visualize the expression patterns of the top 10 upregulated and downregulated genes in PD patients and healthy controls (Fig. 2B). Similarly, differential gene expression analysis was conducted between IBD patients and healthy controls. This analysis identified 259 differentially expressed genes, including 148 upregulated and 111 downregulated genes (Fig. 2C). A heat map was generated to visualize the expression patterns of the top 10 upregulated and downregulated genes in IBD patients and healthy controls (Fig. 2D).Figure 2Identification of differentially expressed genes associated with IBD and PD. (A). Volcano plot showing the results of differential gene expression analysis between PD patients and healthy controls based on the integrated gene expression profiles from GSE7621. (B). Heatmap showing the expression changes of the top 10 upregulated and downregulated differentially expressed genes in PD compared to healthy controls. (C). Volcano plot showing the results of differential gene expression analysis between IBD patients and healthy controls based on gene expression profiles from GSE75214. (D). Heatmap showing the expression changes of the top 10 upregulated and downregulated differentially expressed genes in IBD compared to healthy controls.WGCNA gene weighted co-expression analysisFollowing data integration, the expression data for the PD dataset was preprocessed. The variance of each gene across samples was calculated, and the top 75% of genes with the highest variance were selected for downstream analysis. Hierarchical clustering of samples was performed using the hclust function to identify outliers. One sample was identified as an outlier and removed from subsequent analysis (Fig. 3A). WGCNA requires the selection of an appropriate soft threshold to ensure that the gene network follows a scale-free network distribution. A soft threshold of 8 was determined to achieve a scale independence of 0.8 and a mean connectivity close to 0, indicating that the gene network approximated a scale-free network. Using the selected soft threshold, the weighted correlation coefficients between genes were calculated and converted into an adjacency matrix. The Topological Overlap Matrix was then constructed. Hierarchical clustering was performed on the TOM to identify distinct gene modules, with a minimum module size of 30. A total of 20 modules were obtained, with gray modules representing genes not assigned to any module (Fig. 3B). Characteristic genes were extracted for each module, and the correlation between each module and clinical characteristics (alive, dead, overall survival) was calculated based on the characteristic genes. The focus was on understanding the correlation between modules and PD patients. A heatmap was generated to visualize the correlation between modules and clinical characteristics, revealing that the blue module had the highest correlation with PD (Fig. 3C). To further investigate the relationship between the blue module and PD, a scatter plot of GS and MM was generated. The genes in the blue module showed a significant correlation with PD (r = 0.32, p = 2e−07) (Fig. 3D). GO enrichment analysis was performed on the co-expressed genes within the blue module. The enriched biological processes included positive regulation of cytokine production, leukocyte-mediated immunity and positive regulation of response to external stimulus. The enriched cell components included secretory granule membrane, endocytic vesicle and endocytic vesicle membrane. The enriched molecular functions included peptide binding, immune receptor activity and peptide antigen binding (Fig. 3E).Figure 3WGCNA was performed on the integrated PD expression data to identify co-expressed gene modules. (A). Clustering tree obtained using the hclust function to identify sample outliers after preprocessing the PD expression data. (B). WGCNA results showing the dendrogram of co-expression modules and the module eigengenes. (C). Heatmap illustrating the correlation between co-expression modules and clinical characteristics (alive, dead, overall survival). (D). Scatter plot depicting the correlation between gene significance and module membership for the blue module. Genes with higher significance are more likely to be hub genes within the module. (E). Biological processes, cell components, and molecular functions enriched in the coexpressed genes within the blue module.GSEA enrichment analysisGSEA enrichment analysis of differentially expressed genes in PD patients revealed downregulation of the following pathways: response to corticosterone, glucosamine-containing compound metabolic process, amino sugar catalytic process and regulation of lymphocyte chemotaxis (Fig. 4A–C,F). Conversely, grooming behavior and positive regulation of glutamate secretion pathway were upregulated in PD patients (Fig. 4D,E).Figure 4GSEA was performed to identify enriched pathways in PD patients compared to healthy controls. (A–C, F). GSEA enrichment analysis results depicting down-regulated pathways in PD patients. (D–E). GSEA enrichment analysis results illustrating up-regulated pathways in PD patients.IBD and PD common differentially expressed gene enrichment analysisFirst, we constructed a Venn diagram to visualize the overlap between PD blue module genes, IBD differential genes and PD differential genes (Fig. 5A). This analysis revealed that 11 genes were shared among the three gene sets. Next, we explored the functional implications of these shared genes. GO enrichment analysis was performed, categorizing the genes into three domains: biological process, cellular component and molecular function. The results indicated that the enriched biological processes included acute inflammatory response, immunoglobulin-mediated immune response and Fc receptor signaling pathway. Enriched molecular functions included immune receptor activity, IgG binding and immunoglobulin binding (Fig. 5B). KEGG pathway enrichment analysis further revealed that these shared genes were significantly enriched in metabolic pathways such as osteoclast differentiation, Fc epsilon RI signaling pathway and leishmaniasis (Fig. 5C). We also investigated the functions of PD differential genes specifically. GO enrichment analysis showed that these genes were enriched in biological processes related to signal release, embryonic organ development and gland development. Enriched cellular components included actin cytoskeleton, neuronal cell body and cell-substrate junction. Enriched molecular functions included DNA-binding transcription factor binding, DNA-binding transcription activator activity and DNA-binding transcription activator activity (Fig. 5D). KEGG pathway enrichment analysis of PD differential genes identified enrichment in TNF-alpha signaling via NF-κB, hypoxia and inflammatory response pathways (Fig. 5E).Figure 5GO and KEGG enrichment analyses were performed to identify enriched pathways and biological processes associated with the common differentially expressed genes between PD and IBD. (A). Venn diagram illustrating the overlap of genes in the blue module of PD, differential genes in IBD and PD differential genes. (B). GO enrichment analysis results for the common genes. (C). KEGG enrichment analysis results for the common genes. (D). Functional analysis results for PD differential genes. (E). KEGG enrichment analysis results for PD-related differential genes.Lasso model buildingBased on the WGCNA, we identified 19 candidate genes significantly associated with PD. Subsequent LASSO analysis was performed to select the optimal lambda value, resulting in a signature consisting of 23 genes (Fig. 6A,B). ROC curve analysis of the LASSO model in the IBD dataset demonstrated an area under the curve of 0.942 (Fig. 6C), indicating excellent discriminatory ability for IBD patients. In the PD dataset, the AUC was 0.75 (Fig. 6D), suggesting moderate discriminatory ability for PD patients. These findings indicate that the LASSO model composed of these 23 genes has potential as a diagnostic biomarker for both IBD and PD.Figure 6Lasso regression analysis was performed to identify a robust gene signature for PD. (A). Results of Lasso analysis displaying the selection of the optimal λ value for PD. (B). The construction of a signature consisting of 23 genes. (C). ROC curve analysis of the Lasso model in IBD data, demonstrating the model’s ability to distinguish between IBD patients and healthy controls. (D). ROC curve analysis of the Lasso model in PD data, showing the model’s ability to differentiate between PD patients and healthy controls.Support vector machine and random forest model analysisUtilizing the LASSO model, we identified 23 genes associated with IBD and PD patients. Subsequently, we constructed RF and SVM models independently. Analysis of the residual cumulative distribution and box plots for the LASSO, RF and SVM models revealed that the RF model exhibited smaller residual values compared to the SVM and LASSO models (Fig. 7A,B). This indicates that the RF model is the most suitable for our dataset. We further evaluated the importance of each gene and ranked them accordingly. The top 10 most important genes are presented in Fig. 7C. Finally, we validated the RF model independently in the IBD and PD datasets. ROC analysis demonstrated an AUC of 1.0 for IBD patients (Fig. 7D), indicating perfect discrimination, and an AUC of 0.992 for PD patients (Fig. 7E), indicating excellent discrimination.Figure 7Support Vector Machine and Random Forest Model Analysis. (A). Residual cumulative distribution comparing the Lasso, RF and SVM models. (B). Box plots comparing the Lasso, RF and SVM models. (C). Importance analysis of genes in the RF model, highlighting the top 10 important genes. (D). ROC curve analysis of the RF model in IBD patients. (E). ROC curve analysis of the RF model in PD patients.Nomogram and decision tree analysisBased on the gene importance rankings, we selected the top 5 genes (BTK, NCF2, CRH, FCGR3A and SERPINA3) to construct nomograms for aiding in the clinical diagnosis of IBD and PD patients (Fig. 8A,C). Calibration curves were generated to assess the accuracy of the nomogram models in predicting the positive rates of IBD and PD. The results showed good agreement between the predicted and actual positive rates (Fig. 8B,D). We further analyzed the expression of these 5 genes in IBD patients compared to healthy controls. BTK was found to be downregulated in IBD patients, while NCF2, CRH, FCGR3A and SERPINA3 were upregulated (Fig. 8E). Similarly, we examined the expression of these 5 genes in PD patients compared to healthy controls. BTK, FCGR3A and SERPINA3 were upregulated in PD patients, while NCF2 and CRH were downregulated (Fig. 8F).Figure 8Support Vector Machine and Random Forest Model Analysis. Nomograms constructed using the top 5 genes to facilitate clinical diagnosis of (A) IBD and (C) PD patients. The nomograms provide a visual representation of the relationship between the gene expression levels and the probability of disease. Calibration curves illustrating the accuracy of the nomogram models in predicting the positive rates of IBD (B) and PD (D). E. Expression analysis of the top 5 genes in IBD patients compared to healthy individuals. (F). Expression analysis of the top 5 genes in PD patients compared to healthy individuals.In addition, we constructed decision trees using these 5 genes to differentiate IBD/PD patients from healthy controls. The results showed that both the IBD and PD decision trees required only the expression levels of BTK and NCF2 for accurate discrimination (Fig. S1).Small molecule drug sensitivity analysisTo identify potential therapeutic interventions for PD/IBD patients, we also utilized the Connectivity Map (CMap) database to analyze the effects of small molecule drugs. Differential genes were input into the CMap website to predict candidate drugs. Based on median_taus, 15 distinct perturbagens, including genes and knocked down genes, were selected. The results suggest that to improve therapeutic outcomes in PD/IBD patients, the expression of genes such as CTNNBIP1, SLC16A3, PTPN11 and FDX1L should be knocked out or downregulated. Additionally, the drugs RO-90-7501 and MST-312 may be beneficial. Conversely, the knockout or downregulation of genes such as CSNK1G2, IL6, CDK7 and MPDZ is associated with worse patient prognosis. Similarly, the overexpression of SHC1, PSEN1 and CORO1A genes and the use of the drug MK-1775 are also linked to poor outcomes (Fig. S2A). The three-dimensional structures of the small molecule drugs RO-90-7501, MST-312 and MK-1775 were analyzed using the PubChem database (Fig. S2B–D).Consensus cluster analysis identifies PD subtypesPD is a heterogeneous disorder with multiple pathobiological subtypes. To identify these subtypes, we employed consensus clustering analysis to partition PD patients into distinct subgroups. Analysis of the cumulative distribution function (CDF) and consensus matrix revealed that K = 2 yielded the smallest CDF value and minimal correlation between groups (Fig. 9A,B). Therefore, we selected K = 2, resulting in the division of PD patients into two subgroups (Fig. 9C,D). Differential gene expression analysis between the two subgroups identified 74 downregulated genes and 122 upregulated genes. Enrichment analysis of these differentially expressed genes revealed significant enrichment in biological processes related to cell junction assembly, autophagic structure homeostasis and vesicle-mediated transport in synapses. Enriched cellular components included presynapse, GABAergic synapse and distal axon. Enriched molecular functions included postsynaptic neurotransmitter receptor activity, GABA-A receptor activity and GABA receptor activity (Fig. 9E). KEGG pathway analysis identified enrichment in metabolic pathways such as GABAergic synapse, neuroactive ligand-receptor interaction and nicotine addiction (Fig. 9F).Figure 9Consensus clustering analysis was performed to identify distinct subgroups of PD patients based on their gene expression profiles. (A, B). Consensus clustering analysis results indicating the CDF for different values of K (number of subgroups) in PD patients. The CDF plot suggests that K = 2 is the optimal number of subgroups. (C, D). Consensus clustering heatmap showing the correlation matrix when K = 2. (E). Enrichment analysis results for differentially expressed genes between the two PD subgroups. (F). KEGG analysis results for differentially expressed genes between the two PD subgroups.

Hot Topics

Related Articles