Transcriptional profiling in microglia across physiological and pathological states identifies a transcriptional module associated with neurodegeneration

Human microglia transcriptome profiles (bulk RNA-seq datasets)We identified 21 published transcriptome studies focusing on human microglia (MGL, Fig. 1a and Supplementary Data 1) and selected the 12 containing purified microglial cells with relevant control samples and with available raw sequencing data for further analysis (Supplementary Data 1 and “Methods”), 10 datasets (6 bulk RNA-seq and 4 single-cell (sc) RNA-seq,) from 8 publications were included in the analysis (Figs. 1 and 2 and Supplementary Data 1).Fig. 1: Microglial gene expression.a Summary of datasets reviewed, processed, and used in this study. A total of 21 datasets were identified focusing on the human microglia transcriptome. After pre-processing and QC steps, 12 were selected and analyzed further. Subsequently, 8 of 12 were selected due to biological importance and relevancy. b Bar plot showing differentially upregulated (red) and downregulated (blue) genes in each dataset. Differential expression is determined by DESeq2 (false discovery rate (FDR) < 0.05 and fold change difference > 2). Error bars are standard errors of the mean. c Heatmap showing gene expression patterns of differentially expressed genes across datasets. Each row is a sample, each column is a gene. Black boxes mark sets of genes differentially expressed across multiple datasets.Fig. 2: Integrated sc RNA-seq analysis for microglia.a Uniform manifold approximation and projection (UMAP) plot showing microglia clusters for integrated microglia analysis of 18,713 cells from four datasets. Non-microglial cells were excluded from this analysis. Each dot represents a single cell. Colors correspond to different clusters. b As in (a), but, colors correspond to different conditions. c Heatmap of top differentially expressed genes in each cluster compared to all other clusters. d Top gene ontology terms for differentially expressed genes in each cluster compared to all other clusters.The six bulk RNA-seq datasets included: (1) purified microglia from human brain with no reported neurological disease, compared to other cell types in the cortex (Fig. S1a26); (2) purified microglia from human brain with no reported neurological disease compared to the whole cortex (Fig. S1b21); (3, 4) differentiating (stage 1, 10 days post-injection: maturing microglia) and differentiated (stage 2, 60 days post-injection: mature microglia) human induced pluripotent stem cell (iPSC)-derived microglia, compared to microglial progenitor cells (Fig. S1c27); (5) human iPSC-derived, lipopolysaccharide (LPS)-induced microglia compared to control microglia (Fig. S1d28); and (6) microglia purified from brains from patients with AD compared to control subjects (Fig. S1a26). After re-processing and differential expression analysis (“Methods”), differentially expressed genes (adjusted P-value < 0.01 and fold change > 2, Fig. 1b) were combined across datasets and the overlapping set (Fig. S1e) was evaluated to establish human microglial modules (Fig. 1c, Supplementary Data 2 and 3).First, we determined the upregulated genes between MGL vs other brain cells (dataset 126) and MGL vs whole cortex (dataset 221) datasets to identify genes upregulated in microglia compared to other brain cells (‘human microglial gene set’). From that set, we selected transcripts that were not differentially expressed in any other conditions (differentiation, LPS, or AD) to define an ‘MGL Core module’ (n = 480, Fig. 1c, box 1). ‘MGL Core module’ contains genes such as TREM2 and SYK. Some of the known microglia-related genes such as P2RY12, BIN2, and TYROBP were not assigned to the ‘MGL Core module’ despite their upregulated expression in datasets 1 and 2 due to downregulated expression in LPS treatment. Similarly, from the human microglial gene set, we selected the microglial genes significantly regulated in the differentiation data set (dataset 3 and 427) to identify a ‘Differentiation- MGL’ module (box 2 in Fig. 1c). We identified human microglial genes upregulated (box 3 in Fig. 1c) and downregulated (box 4) by LPS treatment (dataset 528). These two groups represent microglia-enriched genes that are regulated upon LPS treatment. Likewise, we identified human microglial genes upregulated (box 5) and downregulated (box 7 in Fig. 1c) in AD (dataset 626). We also identified upregulated AD genes that were not defined as microglial markers in dataset 1 or 2 (box 6). Finally, we used Euclidean distance to cluster the features (Fig. S1f) and reveal a group of transcripts most highly expressed in microglia (MGL Top expressed module, n = 830), which contains known microglial genes such as TREM2 and SYK, as well as P2RY12, BIN2, TYROBP, and APOE. In summary, combining several different datasets allowed the identification of a core set of microglial markers, and condition-specific subsets of transcripts.Sc transcriptomics identifies shared microglial states across conditionsWe identified sc or single-nucleus (sn) publicly available RNA-seq datasets from microglial cells from subjects with neurodegenerative diseases including AD, MS, and PD. We first identified sc and sn RNA-seq datasets either profiling purified microglial samples (microglia-specific Fig. S2a, b) from AD29 and MS19 patients, or reporting sc brain expression data from PD30 and MS31 patients, from which microglial data could be computationally derived (Fig. S2c, d). Each dataset was individually processed for quality control, normalization, dimension reduction, clustering, annotation, and marker identification using the Seurat package (Fig. S2). The two MS datasets were combined using Seurat (“Methods”, Fig. S2e). In the AD dataset (Fig. S2b), 13 clusters were identified, including some non-microglial clusters (e.g., 5, 9, and 10 due to lack of microglia marker genes expression) and AD-associated clusters (3 and 6). 10 clusters were identified in PD (Fig. S2c), including clusters 1, 2, and 3 containing PD-associated cells. The combined analysis of MS datasets (Masuda et al. 19 and Schirmer et al. 31) identified ten clusters (Fig. S2e).We then checked the expression of two microglia markers—CSF1R, which is broadly expressed in microglia, and P2RY12, which is expressed in homeostatic microglia32—to validate and further annotate microglial clusters in these datasets (Fig. S2f). CSF1R was expressed broadly in these cells as expected, regardless of their disease association. In contrast, P2RY12 was expressed less uniformly in disease-associated clusters. This helped us to further categorize diseased microglia in each dataset. For example, in the PD dataset clusters, 1, 2, and 3 contained diseased microglia, however, expression of P2RY12 was decreased only in clusters 2 and 3.In order to identify a microglial gene expression subcluster associated with multiple neurodegenerative diseases, we integrated all four datasets and further excluded cells not expressing microglial markers (see “Methods” for details). This integrated analysis identified 13 clusters (Fig. 2a) containing healthy and disease-associated cells (Fig. 2b). In order to check for microglial specificity, we (1) reviewed the expression of known microglial markers including P2RY12, BIN1, CSF1R, and SLCO2B1 (Fig. S3a); (2) checked for the absence of common B and T-cell markers (including TRAC, TRBC2, CD52, IL32, CD3D, CD25, CD52, CD3, CD8, CD56, CD1, CD21, CD25, CD27, CD138, and CD4); and (3) checked for the expression of MRC1 (a macrophage-specific marker33). Although some cells expressed MRC1 at high levels, we didn’t observe a macrophage-specific cluster (Fig. S3b). A few clusters (7, 10, 12, and 13) clustered distantly from the others. Among those, clusters 10 and 13 expressed known neuronal markers (Fig. S3c), and cluster 13 expressed known oligodendrocyte markers (Fig. S3d), pointing to possible contaminating cells.Cluster analysis (Fig. 2a, b) identified clusters 0, 1, 2, and 3, which included predominantly microglia from control samples from all 4 datasets (Fig. 2b and Fig. S3f), and a few clusters predominantly containing disease cells (cluster 4, 6, and 9). Cluster 4 contained cells from all 3 diseases and was defined as a cross-disease-associated microglia (CDAM) cluster (Fig. 2b). We annotated each cluster by identifying marker genes (Fig. 2c, see “Methods” for details) and gene ontology annotation (Fig. 2d). Immune response-related terms were significantly enriched in disease-associated clusters 4 (P = 1.3E-11) and 9 (P = 1.4E-7).Characterization of the CDAM clusterWe examined the transcripts associated with cluster 4 (CDAM), containing cells from patients with AD, MS, and PD, by comparing its gene expression with that of ‘core’ microglial clusters (0, 1, 2, and 3, Fig. 3a). Transcripts upregulated in CDAM compared to core clusters 0,1,2,3 (Fig. 3a, b and Supplementary Data 4) included Apolipoprotein E (APOE), known to be associated with AD, Leucine-rich repeat kinase 2 (LRRK2), known to be associated with an increased risk in PD, and glycoprotein NMB (GPNMB), known to be upregulated in disease-associated microglia in mouse models of neurodegenerative disease, and recently implicated in PD pathogenesis34. By contrast, P2YR12, a homeostatic/healthy microglia marker35, was downregulated in CDAM.Fig. 3: CDAM cluster.a Differential expression analysis comparing genes from CDAM and healthy-microglia-associated clusters. The top significant genes are highlighted on the volcano plot. b Log-2 expression levels of representative genes up-regulated (APOE, LRRK2, and GPNMB) and down-regulated (P2RY12) in CDAM (cluster 4). c Boxplot showing gene expression in CDAM for microglia modules identified in bulk-RNA-seq analysis in Fig. 1. Red dots mark median values for each comparison. P-values between the groups were determined by the Wilcoxon test. UP stands for upregulation and DN stands for downregulation. Error bars on each box represent the highest and lowest values excluding outliers. d Overlap between CDAM and publicly annotated microglial gene sets. All gene sets are defined in the MGEnrichment database. Bar plot showing odds ratio (OR) and -log10 FDR for the most significant overlapping gene sets for CDAM upregulated and downregulated genes compared to core clusters.We examined the overlap between the modules identified in the bulk RNA-seq and the clusters identified in the scRNA-seq analysis (Fig. 3c). We observed a significant overlap between disease-associated modules (AD-MGL upregulated (P = 1.5E-07) and AD-downregulated (P = 0.006)) and expression CDAM vs clusters 0–3. Overlapping genes include APOE, GPNMB, PARVG, TGFBI, and TMEM119. Genes that were upregulated in LPS microglia modules were also upregulated in CDAM.We finally compared the overlap between CDAM genes and microglial gene sets reported in the literature and collected in the MGEnrichment database, which contains over 100 gene sets from 26 publications36 (Fig. 3d and Supplementary Data 5). As expected, both CDAM upregulated (CDAM > core clusters) and downregulated (CDAM < core clusters) genes showed a significant overlap with microglia core signals, though the downregulated set showed much higher overlap compared to the upregulated set (OR, 8.1 vs 3.7 and -log10 FDR 68 vs 25). CDAM upregulated genes showed a significant overlap with publicly annotated neurodegenerative disease signature microglial genes (MGnD signature increased, DAM > HOM MG), as well as LPS and age-related microglial genes (Fig. 3d, lower panel). On the other hand, CDAM downregulated genes overlapped with homeostatic/healthy state markers (Fig. 3d, upper panel).We also annotated the CDAM module using the dataset reported by Geirsdottir et al. 37, a cross-species transcriptomic analysis of microglia. We observed a varied conservation pattern of the CDAM module across species (Fig. S4), supporting the notion that microglial genes relevant for neurodegeneration in humans might not be uniformly conserved across non-human species, and therefore conclusions from work in animal models might not readily transfer to human neurodegenerative conditions.In summary, a combined analysis of human microglial gene expression identified modules associated with microglial states, and an integrated sc analysis identified a cross-disease-associated cluster that showed high expression of disease markers and downregulation of homeostatic/healthy microglia markers.Microglial signature is enriched for common genetic variants associated with diseaseGWAS have identified many genetic variants associated with traits and diseases over the past decades. Our transcription-level analysis revealed gene expression clusters associated with healthy and diseased-state microglia. We asked whether these clusters had any overlap with genetic signals associated with neurodegenerative disease. We used GWAS summary statistics for AD38, MS39, and PD40 and assigned aggregated variants to genes using the MAGMA tool41, which provides gene-level and gene set-level enrichment analysis for GWAS studies by aggregating all the variants while controlling for confounding effects (Fig. 4a). We identified multiple microglial modules significantly enriched in AD and MS GWAS (Fig. 4b and Supplementary Data 6). Interestingly, genes that were upregulated in AD microglia (AD-MGL upregulated module) were significantly associated with all three GWAS, while genes that were upregulated in AD, but not identified as microglia markers (AD-others upregulated) were not (Fig. 4b). CDAM was also significantly enriched in both AD and MS GWAS. Within this module, we identified many genes implicated in AD GWAS (Fig. 4c) including BIN1, APOE, and PLCG2. We then extended our variant search to exome sequencing data from the UK Biobank (UKB42) for the CDAM genes that were identified as significant in the MAGMA analysis, including PLCG2, which is upregulated in microglia compared to other cells in the cortex and upregulated in CDAM (Fig. 4d, left panel). We identified three variants associated with AD for PLCG2 (Fig. 4d, right). The most significant variant was a missense variant (rs72824905, P = 7.3E-05) associated with a decreased risk of an AD-related trait (Alzheimer’s in the mother, OR = 0.87). This variant was previously shown as an AD protective variant in another study43. Interestingly, an additional rare intronic variant (rs748729362) was nominally associated with an increased risk of AD (P-value: 8.46E-05). The evidence from our transcriptional analysis further supports PLCG2 as a microglial gene with relevance for AD.Fig. 4: Common-variant enrichment analysis for microglial modules.a Schematic of common variant analysis. Microglial module enrichment for common variants (minor allele frequency (MAF) > 1%) was tested by MAGMA analysis using AD, MS, and PD GWAS summary statistics. b Summary of common variant enrichment analysis. Bubble plot showing enrichment P-values obtained from MAGMA analysis of each module for AD, MS, and PD GWAS studies. The size of the dot represents the number of genes in each module. P-values < 0.05 are shown in red. c Volcano plot showing genes in CDAM module that are significantly associated with the AD GWAS. d Bar plot showing PLCG2 gene expression changes in our datasets (left). Breakdown of PLCG2 variants associated with AD or related traits (right).Disease-associated rare genetic variation also affects the CDAM signatureWhile GWAS captures common variant associations efficiently, rare coding variation (especially if predicted damaging) could have stronger consequences on phenotypes. We mined exome sequencing data for rare (minor allele frequency (MAF) < 1%) coding variants and performed gene burden testing for genes with microglial expression (see “Methods”). We identified genes in each microglial module with nominally significant (P < 0.05) burden associations with AD, MS, and PD (Fig. 5a) and calculated the enrichment in each module (Fig. 5b). We only observed significant enrichment in AD ExWAS for disease-associated modules (AD- MGL upregulated, CDAM, CDAM UP, CDAM DN) (Fig. 5b and Supplementary Data 7). In CDAM, we identified 160 upregulated and 206 downregulated genes with nominally significant burden in AD ExWAS (Fig. 5c). For example, gamma Parvin (PARVG), encoding an actin-binding protein, is significantly upregulated in microglia compared to other cells in human cortex and upregulated in AD-MGL and CDAM (Fig. 5d, left panel), confirming reports of increased expression in AD mouse models and humans44. Although not reaching genome-wide statistical significance, a burden of rare LOF variants in PARVG is associated with protection from parental AD in the UK biobank (Fig. 5d, right panel). Taken together, these results point to a role for PARVG in AD and will need to be confirmed in a larger series.Fig. 5: Rare-variant enrichment analysis for microglial modules.a Schematic of rare variant analysis. Gene burden test was done for microglial genes for rare variants (MAF < 1%). Microglial module enrichment for rare variants was tested by Fisher’s Exact Test using AD, MS, and PD ExWAS summary statistics. b Summary of rare variant enrichment analysis. Bubble plot showing enrichment P-values obtained from MAGMA analysis of each module for AD, MS, and PD ExWAS studies. The size of the dot represents the number of genes in each module. P-values < 0.05 are shown in red. c Volcano plot showing genes in CDAM module that are significantly associated with AD ExWAS (rare variants). d Bar plot showing gene expression of PARVG in our datasets (left). Breakdown of gene burdens in PARVG associated with an AD-related trait (parental history of AD, right).Similarly, Interleukin-1 receptor-associated kinase-like 2 (IRAK2), which is a microglia marker gene, is downregulated in CDAM (Fig. S5, left panel), confirming a literature report implicating IRAK2 in AD45. A missense variant in IRAK2 is associated with increased risk for AD in siblings (OR = 8.9, P = 1.28E-06) suggesting that a decrease in gene expression, possibly caused by variants in IRAK2, is a risk factor for AD.

Hot Topics

Related Articles