Exploring diagnostic biomarkers of type 2 cardio-renal syndrome based on secreted proteins and bioinformatics analysis

Expression data collection and processingTwo microarray datasets of CKD, namely GSE32591 and GSE66494, were accessed from the GEO database9. Additionally, raw expression profile datasets of heart tissues (GSE1145, GSE5406, GSE21610, and GSE141910) and peripheral blood mononuclear cells (PBMC) (GSE59867) from patients with CHF were also retrieved from the GEO database. The integration of these expression profiles was accomplished using the “ComBat” function within the “SVA” package in R software (version 4.3.2, https://www.r-project.org/).Differentially expressed genes (DEGs) analysisThe CKD combined dataset and CHF-related datasets underwent preprocessing steps including background correction, normalization, and gene symbol conversion. Subsequently, differentially expressed genes (DEGs) in the CKD and CHF datasets were identified using the “Limma” package in R software. DEGs were screened based on the criteria of adjusted p-value < 0.05 and fold change > 1.2. Following this, the expression patterns of the DEGs were visually represented through the utilization of the “ggplot2” package for volcano plots and the “pheatmap” package for heatmaps in R, respectively.Weighted gene co-expression network analysis (WGCNA) and significant modules filteringThe application of WGCNA was employed to unveil gene association patterns across diverse samples and to pinpoint potential biomarker genes or therapeutic targets. The interaction between gene sets and their relationship with phenotypes played a crucial role in the selection process. The initial step involved filtering out genes with a median absolute deviation of 0 from each sample. Subsequently, the “goodSamplesGenes” function was employed in Step 2 to detect missing values, and samples surpassing a cutHeight threshold of 20,000 were excluded as outliers. In Step 3, an optimal soft threshold of 5 is determined using cex1 = 0.85, which aids in the establishment of a scale-free co-expression gene network. Subsequent to gene clustering, modules are acquired and similar ones are consolidated based on MEDissThres = 0.25 criteria. Step 4 involves the creation of a heat map to demonstrate module-trait relationships, while Step 5 includes the computation of module membership (MM) and gene significance (GS) values and the generation of scatter plots to depict MM-GS correlations for each module10.Secreted proteins accessThe Human Protein Atlas database facilitated the retrieval of secreted proteins11. A total of 3947 genes encoding secreted proteins were obtained from the category of “SPOCTOPUS predicted secreted proteins”.The establishment of protein–protein interaction (PPI) networkThe study investigated the interactions between CHF-related secreted proteins and CKD key genes, resulting in the construction of a protein–protein interaction (PPI) network. This network was created by integrating data from the STRING database12 using a confidence score threshold of 0.4. Cytoscape software (version 3.9.0, https://cytoscape.org/) was used to visualize the PPI network. Furthermore, an additional procedure entailed the utilization of the Cytoscape plug-in, MCODE, to identify significant subsets within the network. Subsets scoring above 10 were designated as CRS2-related pathogenic genes for subsequent analytical investigations.Functional enrichment analysisThe investigation of the biological function and mechanistic foundations of CRS2-related pathogenic genes involved the utilization of Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis13. These genes were inputted into the SangerBox14 for this purpose. Statistical significance in enrichment was assessed with a threshold set at adjusted p < 0.05. Additionally, the results of the functional enrichment analysis were visually depicted using a lollipop chart.Connectivity map (cMAP) analysisThe cMAP15 functions as a database of gene expression profiles that employs the analysis of gene expression patterns to unveil associations between genes, diseases, and small molecule compounds. The inclusion of upregulated genes associated with CRS2-related pathogenic genes was conducted, with these genes subsequently integrated into the cMAP online repository to identify potential small-molecule drugs for the treatment of CKD. Ultimately, the research successfully identified the top ten compounds with the highest enrichment scores.ML algorithmsIn this study, a combination of five ML algorithms, namely random forest (RF), eXtreme gradient boosting (XGB), support vector machine (SVM), generalized linear model (GLM), and the least absolute shrinkage and selection operator (LASSO), were utilized for the identification of candidate biomarkers and development of a diagnostic model for CKD. First, the CKD combined dataset was randomly split according to a 6:4 ratio: 60% was the training set, and the other 40% was the test set. Next, LASSO regression was employed for feature dimensionality reduction on the CKD combined dataset. The aforementioned five ML algorithms were implemented utilizing the “glmnet,” “caret,” “randomForest,” “kernlab,” and “xgboost” packages within the R software. The candidate genes identified by the five algorithms were compared, and those genes present in the intersection were designated as hub genes for the development of CHF-related CKD diagnostic models.The establishment and evaluation of diagnostic nomogram modelThe nomogram was establishment using the three hub genes with the assistance of the “rms” package. Diagnostic performance for CKD was assessed through the generation of receiver operating characteristic (ROC) curves for each hub gene and the nomogram. Calibration curves and decision curve analysis (DCA) were utilized to evaluate the predictive efficiency of the nomogram in heart failure (HF) comorbid with CKD.External verification of hub genes expression pattern and diagnostic efficacyTwo independent datasets (GSE180394 and GSE104954) containing CKD cases and control were obtained from the GEO database. The expression patterns of the hub genes in external datasets were analyzed by the violin plot, while the diagnostic efficacy of both the hub genes and nomogram model were assessed using ROC analysis.Correlation analysis of hub genes with CKD severity and disease progressionA dataset (GSE137570) containing clinical information such as age, gender, estimated glomerular filtration rate (eGFR), degree of renal tubulointerstitial fibrosis (TIF), and disease progression of CKD patients was obtained from the GEO database. Sangerbox database was used to draw the scatter map of the correlation between hub genes and various clinical features, and ROC was used to analyze the predictive efficacy of hub genes and model on eGFR, TIF and CKD progression.Immune infiltration measurementssGSEA (Single Sample Enrichment Analysis) algorithm was used to estimate the immune cell and immune function enrichment scores of each sample to represent the relative infiltration abundance by means of the “GSVA” package. The box plot, generated using the “ggplot2” package, was employed to present these significant differences. Following that, Spearman’s rank correlation coefficient was employed to present the correlation of the expression of biomarkers with the quantity of infiltrated immune cells and immune functions.Patients’ samples collectionSerum samples from healthy controls and CHF patients with or without CKD were collected from Shaoxing Second Hospital. All CKD patients had never used dialysis-related treatments such as peritoneal dialysis or hemodialysis. Patients with diabetes, infectious nephropathy, drug or allergic nephropathy, systemic lupus erythematosus and other immune system diseases were excluded. The clinical characteristics of our cohort are shown in Table 1. The protocol of human samples has been approved by the Ethics Committee of Shaoxing Second Hospital (Ethics batch number: 2023081), and all participants provided written informed consent. In addition, ELISA kit (Elabscience, Wuhan, China) was used to detect serum COL3A1 levels, and ELISA kits (Animaluni, Shanghai, China) were used to detect serum CD48 and LOXL1 levels, respectively, according to the manufacturer’s protocols.Table 1 Clinical information in our cohort.Statistical analysisStatistical analysis was conducted using GraphPad Prism 9.5.1 (GraphPad Software Inc., San Diego, CA, USA, https://www.graphpad.com/) in this study. Measurement data were presented as mean ± standard deviation, while counting data were expressed as quantity (percent). Inter-group comparisons of measurement data were performed using unpaired Student’s t-test or ANOVA, and counting data were compared using the Chi-square test. Statistical significance was defined as p < 0.05.

Hot Topics

Related Articles