DRCTdb: disease-related cell type analysis to decode cell type effect and underlying regulatory mechanisms

Overview of DRCTdbThe DRCTdb provides multiple single-cell multiomics datasets with nearly all available human tissues. We manually curated the data and added corresponding metadata information for every dataset, including cell type, tissue, and experiment type. We further performed several downstream analyses and provided profiles within the web interface, such as cell-cell communication, TF-activity, and gene regulatory network. In summary, the DRCTdb database provides 16 single-cell multiomics datasets across 28 tissues and encompassing 603 cell types (Fig. 2).Fig. 2: Comparative summary of tissues, diseases, and cells in various Datasets.a Number of tissues summarized by different datasets. b Number of cell type enriched genetic diseases in each dataset, X axis represents dataset ID. c Number of cells in each dataset, color represents data modalities, X axis represents dataset ID.Web design interfaceDRCTdb provides a user-friendly web interface, allowing users to explore the relationships between cell types and genetic diseases in different human tissues. DRCTdb contains five main functional interfaces.The ‘Home’ page gives a brief description of this database and statistics of data used in DRCTdb, including tissue number, cell number, and cell type number.‘Search’ pages provide an interactive table for users to easily find their interested datasets. Users can search their interested datasets based on tissue and enriched diseases (Fig. 3a).Fig. 3: Web interface overview.a Search page of DRCTdb, users are able to click on the column names (Study name, Disease, Stage, Tissue, PMID) in the interactive table in order to sort the table. b UMAP embedding plot of selected dataset in the result page. c LDSC analysis results of the selected dataset are presented in the interactive table. The table displays the genetic diseases that are enriched in specific cell types. d Differentially accessible regions (differential peaks) for each cell type. Column name ‘closed gene’ represents a gene associated with the peak and ‘region’ indicates peak annotation. e Top 10 transcription factors exhibiting the highest binding activity in each cell type.After selecting datasets of interest, users have the option to explore the enrichment of genetic diseases in specific cell types, as well as the underlying mechanisms behind this association. On the left panel, users can access an overview of the selected datasets, including information about single cell embedding plot (Fig. 3b), statistical significance between cell types and genetic diseases (DRCT) (Fig. 3c), cell type-specific differential accessible regions (DAR) (Fig. 3d), and cell type-specific active transcription factors (TF activity) (Fig. 3e).On the right panel, the DRCTdb frontend presents the results of downstream analysis to elucidate the underlying regulatory mechanisms linking genetic diseases and their enriched cell types. Users have the option to select a specific cell type and disease pair by using the ‘Choose Cell type’ and ‘Choose Disease’ buttons. Once both cell and disease types are selected, the exploration deepens with our four specialized modules. 1.’ Cell-Cell Communication’ module: A dynamic visualization of the intercellular interactions network between all cell types. By clicking on the “Figure” button, users are able to explore the detailed weights of the ligand-receptor interactions involved in these intercellular communications (Fig. 4a). 2. ‘SNP overlapped peaks’ module: This module showcases open chromatin regions in the selected cell type’s scATAC data that overlap with disease GWAS risk loci, highlighting the genetic vulnerabilities and resistance points in relation to the disease (Fig. 4b, Supplementary Fig. 1a). 3. ‘SNP overlapped genes’ module: Focusing on the scRNA data from the chosen cell type, this module identifies genes that overlap with disease GWAS risk loci, providing insights into the cellular mechanisms influenced by genetic risk factors (Fig. 4c, Supplementary Fig. 1b). 4. ‘Gene Regulatory Networks’ module: This gene regulatory network maps the transcription factor regulatory networks within the cell types related to a specific disease. Users can interactively explore the regulatory relationships between genes by dragging network nodes, offering a hands-on approach to understanding gene regulation (Fig. 4d, Supplementary Fig. 1c).Fig. 4: Core analysis modules.a Cell-Cell Communication among genetic diseases related cell types, colors represent cell types. b GO enrichment of genes which nearby SNPs overlapped with chromatin accessibility. c GO enrichment of genes which overlapped with SNPs. d Gene regulatory network in disease related cell types, blue represents genes, while each of the other colors represents a transcription factor.For users who are interested in exploring the functional role of a set of SNPs related to a specific gene, DRCTdb provides an ‘Online Enrichment’ page. This page allows users to perform GO enrichment analysis for their selected set of genes. Users simply need to input an Excel format file that includes a column of gene names. Our ‘Online Enrichment’ tool will perform automatic calculations to determine the enrichment Gene Ontology terms associated with the provided genes (Supplementary Fig. 1d).DRCTdb also enables users to download all well-processed data and analysis results in the ‘Download’ page (Supplementary Fig. 1e). This page provides an interactive table that facilitates the searching and downloading of all the processed data from this study. Users can conveniently search for their desired datasets based on either tissue or enriched disease options. Once identified, they can download the meticulously processed single-cell RNA-seq and single-cell ATAC-seq data, along with well-annotated metadata, in h5ad format.The ‘Tutorial’ page provides documents about how to correctly browse and download the analysis result.In summary, DRCTdb provides a user-friendly platform, enabling users to explore the integrative analysis result of GWAS and single-cell multiomics data.Case studyWe present a case study to illustrate the usage and capabilities of DRCTdb. We selected a single-cell multiomics dataset of human pancreatic islets for our demonstration14. This dataset contains 95,109 cells with both gene expression and chromatin accessibility. The following analysis will first reveal the diseases related cell types found in human pancreatic islets, and then identify the underlying regulatory mechanisms between the enriched genetic diseases and cell types.Firstly, we identified cell types related to genetic diseases through LDSC analysis. This analysis integrates cell type-specific accessible regions and GWAS summarized statistics data to infer the association between genetic diseases and cell types. In order to conduct LDSC analysis, it is necessary to calculate cell-type specific accessible regions. We used the Wilcoxon rank-sum test to identify differentially accessible regions (DARs) in 9 cell types within the human pancreatic islets dataset. We identified a total of 5938 DARs across the nine cell types, with 2972 DARs located in the promoter region, 449 DARs in coding regions, and 2517 DARs in the intron and intergenic regions (Supplementary Data 1).Next, we used scBasset to identify the transcription factors that have higher binding activity in these DARs. Our analysis revealed that both YY1 and REST genes exhibited the highest binding activity in the beta cell type. Previous studies have reported that these transcription factors play important roles in the growth and development of beta cells (Supplementary Data 1)15,16.Third, We conducted LDSC analysis to identify genetic diseases enriched cell types. This analysis utilized the previously mentioned cell type-specific open chromatin regions as well as GWAS summarized statistics data. The analysis revealed a significant association between T2D (Type 2 Diabetes) and beta cells and delta cells, which aligned with previous research (Fig. 5a)13,17.Fig. 5: Case study of single-cell multiomics from human pancreas islet.a The bar plot illustrates the relationship between T2D (Type 2 Diabetes) and cell types based on single-cell multiomics data. The dotted line represents the p-value cutoff. b Cell-Cell Communication among T2D related islet cell types, colors represent cell types c GO enrichment of gene which nearby T2D risk SNPs overlapped with beta cell chromatin accessibility (d) GO enrichment of gene which overlapped with T2D risk SNPs. e Gene regulatory network of risk gene in beta cells, cyan represents genes, while each of the other colors represents a transcription factor.Several mechanisms may lead to the association between cell types and genetic diseases, including niche, transcriptome regulation, and epigenomic regulation. Therefore, we further investigate the underlying regulatory relationship between beta cells and delta cells in relation to T2D through the analysis of cell-cell communication and gene regulatory network (GRN). We then performed cell-cell communication analysis among T2D-enriched cell types (beta cells and delta cells). This analysis identified several significant ligand-receptor interaction pairs, including BMP5 and ACVR1, which are reported by previous studies that may regulate beta cell growth and development in T2D (Fig. 5b)18.To further decode the underlying mechanisms between beta cells and T2D, we integrate scRNA-seq and scATAC-seq data of beta cells to construct a disease-related GRN. We initially select disease-related features (genes and accessible regions) to construct disease-related GRN (GWAS overlapped accessible regions and genes are disease-related features). We identified 21,148 accessible regions and 1870 genes overlapped with T2D-associated SNPs in beta cells. Then, we performed enrichment analysis for these genes and accessible regions (Supplementary Data 1). Our findings indicate that the SNPs overlapped genes are enriched for nucleobase-containing compound catabolic process and positive regulation of cellular catabolic process functions. SNPs overlapped accessible regions show enrichment for cell growth and small GTPase mediated signal transduction functions (Fig. 5c, d). By utilizing these 1870 genes and 21,148 accessible regions, we constructed a disease-related gene regulatory network to reveal the regulatory mechanisms between T2D and beta cells. Through the visualization of this gene regulatory network, we found that ATF3 and TCF4 as key nodes within the networks, indicating that they may have a regulatory role in beta cells leading to T2D (Fig. 5e)19. ATF3 and TCF4 were identified as risk genes for T2D in GWAS analysis20. Furthermore, previous studies have reported that ATF3 can induce beta cell stress, while TCF4 is known to cause maturity-onset diabetes of the young.21,22. These results demonstrate the reliability of our pipeline and the usefulness of the database.Overall, this case study has revealed the genetic disease-related cell types and their underlying regulatory mechanisms. We have identified several ligand-receptor pairs and transcription factors that have previously been reported to regulate T2D. These findings demonstrate the reliability of the DRCTdb analysis pipeline and highlight the utility of the database. Consequently, we believe that DRCTdb will enhance our understanding of genetic diseases and assist in the identification of potential therapeutic targets for genetic disease screening and treatment.

Hot Topics

Related Articles