Identifying genetic variants associated with chromatin looping and genome function

Interaction QTLs (iQTLs): a class of QTLs associated with 3D chromatin interactionsAs previous works showed the existence and importance of genetic variants that have a statistically significant association with gene expression (expression QTLs4, splicing QTLs61) or epigenomic features (chromatin accessibility QTLs62, histone QTLs, methylation QTLs63, tfQTLs57), here we set out to define interaction QTLs (iQTLs) that are associated with the intensity of a chromatin contact between two genomic regions (H3K27ac HiChIP contact counts in CD3+CD4+CD45RA+CCR7+ naïve CD4 T cells, in this study). For this we utilized sorted immune cells from a cohort of donors that were previously genotyped and profiled for gene expression as part of the DICE study (n = 91; Database of Immune Cell Expression, Expression quantitative trait loci and Epigenomics)8. For 30 donors from DICE, we performed HiChIP experiments to profile interactions enriched for H3K27ac mark (Supplementary Data 1) to define contacts between active regulatory regions at high-resolution (5kb bin size) for naïve CD4 T cells (see Fig. 1A, B and Methods). Using our previously developed HiChIP loop caller FitHiChIP64, we derive a total number of ~600k statistically significant HiChIP loops at 5kb resolution (see Methods; Supplementary Data 2, 3; Zenodo repository https://doi.org/10.5281/zenodo.13127086, file Complete_FitHiChIP_Loops_iQTL_Input.xlsx) across all 30 donors (union of loops). Among these 600k loops, 54% were identified in more than one donor with 40% shared among 2 to 10 donors (Supplementary Fig. 1A). Principal component analysis (PCA) (see Methods) with respect to the HiChIP contact counts did not reveal any outliers (Supplementary Fig. 1B), hence, we utilized data from all 30 donors for iQTL analysis.Fig. 1: Workflow of iQTL derivation.A Schematic of the cohort, cell type, and data used in iQTL analysis. B Workflow of iQTL derivation from donor-wise HiChIP data together with genotype and ChIP-seq peak information. Union of HiChIP loops (derived from FitHiChIP) together with the donor-wise genotype information are used to run RASQUAL analysis. Final set of iQTLs (and associated HiChIP loops) are derived by combining outputs from three RASQUAL models and the paired t-test analysis of allele-specific reads. These iQTLs are further characterized and categorized with respect to their overlap with reference eQTLs, hQTLs, fine-mapped GWAS SNPs, colocalized SNPs (between eQTL and GWAS statistics), TF binding motif annotations, regulatory regions (e.g., ChIP-seq peaks) as well as reference SNP annotations. C Schematic of SNPs tested for association with one HiChIP loop. For each loop between two 5kb anchors, SNPs within the 15 kb region surrounding each anchor (+/−1 bin from the anchor) are tested for association with the HiChIP loop. QC quality control, CC HiChIP contact counts, VCF variant calling format, eQTL expression quantitative trait loci, hQTL histone quantitative trait loci.Using ~600k FitHiChIP loops defined at 5kb resolution and considering bi-allelic SNPs within +/−1 bin of each loop anchor (15kb region per anchor; Fig. 1C), we then tested the association between SNP genotypes and loop strength as measured by HiChIP contact counts to map the interaction QTLs (or iQTLs) while accounting for covariates such as sequencing depth, sex, age group and race (see Methods). For this purpose, we employed RASQUAL, a Bayesian approach originally developed for mapping chromatin accessibility QTLs65. We started with the default model of RASQUAL that considers both genotype-dependent and allele-specific variation of contact counts. From the identified significant associations at 5% FDR, we applied a series of filters utilizing statistics from the genotype-dependent model, allele-specific model, paired t-test of allele-specific reads as well as concordance of trends for mean or median values (across donors) of HiChIP counts between genotype- and allele-specific estimates (see Methods, Fig. 1B, Supplementary Fig. 1C, D). This stringent filtering produced the final set of iQTL associations with 14,921 SNP-loop pairs, corresponding to 2292 loops and 9426 iQTL-associated SNPs (which we refer to as iQTLs when appropriate) across all the autosomal chromosomes (Supplementary Data 4). Among the 2292 HiChIP loops associated with the iQTL SNPs, the majority (1684 loops, ~73%) contain the iQTLs in their interacting anchors (5 kb) as opposed to the flanking bins that cover twice the size (+/− 5 kb) suggesting iQTL discovery is mainly driven by the assessment of exact loop anchors. To further characterize iQTL SNPs, we also assessed their overlap with enhancers predicted by activity-by-contact (ABC) model66. Around 40% (3765) of iQTLs were within 5 kb (14% or 1279 with exact overlap) of enhancer regions of the enhancer-to-gene (E2G) links predicted by the ABC model for the activated CD4 Naïve cells (see Methods).We next assessed the utility of HiChIP loops associated with iQTLs, and found that ~41% of these loops are significant in more than 10 donors and only ~16% (375 out of 2292) are significant exclusively in one donor (Supplementary Fig. 2A). Further, 102 loops (~5%) were observed as significant in all the donors suggesting that there are iQTLs that strictly associate with the strength of the looping rather than its presence or absence. The 2292 iQTL-associated HiChIP loops also have higher median contact counts and are statistically significant (FDR < 0.01) in higher number of donors compared to the complete set of FitHiChIP significant loops, which are used as the input of iQTL analysis (Supplementary Fig. 2B). The iQTL associated loops, however, are of shorter distance range (mean distance: 140 kb, median distance: 50 kb) compared to the complete set of FitHiChIP loops (mean distance: 220 kb, median distance: 90 kb) suggesting increasing statistical power for detecting meaningful variation in contact counts for shorter distance ranges. Aggregate peak analysis38,64 (APA) (see Methods) also confirmed higher enrichment of iQTL-associated loops compared to the complete set of FitHiChIP loops with respect to either the background HiChIP contacts (Supplementary Fig. 2C) or the CD4 T cell type merged donor Hi-C contacts provided in Shi et al.59 (Supplementary Fig. 2D). Pathway analysis (see Methods) reveals that the genes overlapping the anchors (interacting bins) of these iQTL-associated loops are highly relevant to the immune system processes and cell signaling among other biological processes (Supplementary Fig. 2E).To assess the overlap between iQTLs and more established QTL measurements such as eQTLs and histone modification QTLs (hQTLs), we utilized two studies profiling eQTLs in nCD4 T cells (DICE8 and ImmuNexUT7) and H3K27ac hQTLs from Blueprint WP10 Phase II data63. We observed that ~64% (5933 out of 9426) iQTL SNPs overlap with eQTLs or hQTLs (Fisher’s exact test p-value < 0.00001, OR = 1.43; see Methods; Supplementary Data 11A) while the remaining ~36% iQTL SNPs (3433 out of 9426) do not (Supplementary Fig. 3A, B). iQTLs common with hQTLs or eQTLs, however, showed high concordance among their effect sizes (iQTLs – H3K27ac hQTLs: Pearson correlation = 0.78, p-value < 2.2e−16; iQTLs – DICE nCD4 eQTLs: Pearson correlation = 0.65, p-value < 2.2e−16) (Supplementary Fig. 3C, D), suggesting that the changes in corresponding HiChIP contact counts are related with the H3K27ac levels and gene expression. To further validate whether iQTL effects are explained by the reference eQTLs or hQTLs, we accounted for LD (R2 > 0.8) between iQTL SNPs and these QTLs and observed that ~28% (2603 out of 9426) iQTLs are not in LD with these reference QTLs and only ~38% of iQTLs are in LD with the BLUEPRINT hQTLs (Supplementary Fig. 3E). Further, only for ~31% iQTL-associated HiChIP loops, corresponding iQTL SNPs overlap or exhibit strong LD (R2 > 0.8) with hQTLs where the hQTL-associated peaks overlap with one of the iQTL loop anchors (Supplementary Fig. 3F). Overall, iQTL-associated HiChIP loop anchors showed higher overlap with reference CD4 naïve DICE eQTLs or hQTLs compared to either FitHiChIP loop anchors not associated with iQTLs (OR: 1.39, Fisher’s exact test p-value < 0.00001) or random anchors (OR: 5.14, Fisher’s exact test p-value < 0.00001) (see Methods, Supplementary Data 11C). The same was true when we compared between the ChIP-seq peaks in the iQTL associated HiChIP loop anchors versus the rest, in terms of their overlap with reference eQTLs or hQTLs (OR: 2.46, Fisher’s exact test p-value < 0.00001) (see Methods, Supplementary Data 11D). These results indicate that a significant fraction of iQTLs is driven by the variation of 3D chromatin conformation without underlying changes in histone modification profiles and some without corresponding changes in gene expression in the profiled cell type.iQTLs overlapping with eQTLs in naïve CD4 T cells are enriched for fine-mapped GWAS SNPs and TF binding motifsTo understand the properties of iQTL-associated SNPs, we first tested their overlap with the reference nCD4 T cell eQTLs provided in two different eQTL databases for immune cell types, namely DICE8 and ImmuNexUT7. We observed that 5381 iQTLs (~57%) are also eQTLs of nCD4 (Fisher’s exact test p-value < 0.00001, OR: 1.88, see Methods; Supplementary Data 11B) with respect to either of these two databases (Supplementary Fig. 3G and Supplementary Data 4). In total, 1276 out of 2292 iQTL loops (~56%) are associated with these 5381 iQTLs which are also nCD4 eQTLs (Supplementary Figs. 2F, 3G, and Supplementary Data 4). As the HiChIP protocol returns high-resolution regulatory contact maps47, we reasoned that iQTLs should also be enriched for regulatory SNPs. Indeed, we observed that 1725 iQTLs (~32%) of this category fall within the regulatory regions defined by aggregate nCD4 H3K27ac ChIP-seq peaks, and ChIP-seq peaks for various CD4 T cell subsets and regulatory TFs provided in the ChIP-Atlas67 database (see Methods and Supplementary Data 4), and such a fraction of regulatory SNPs is substantially higher (OR: 3.16; Fisher’s exact test p-value < 0.00001; Supplementary Data 11E) than that of the reference DICE nCD4 eQTLs (~13% regulatory SNPs) (Supplementary Data 5). Out of the 5381 iQTLs in this category, 2163 (~40%) were within 5kb of ABC enhancers (725 or ~13.5% with exact overlap)66. We next checked whether iQTLs are enriched for transcription factor (TF) binding motifs (see Methods) supported by various techniques such as motif scanning by FIMO68,69, allele-specific motif enrichment by AME70 or evidence of allele-specific binding from ChIP-seq data by ADASTRA71,72 (https://adastra.autosome.org/). Considering iQTLs which are also nCD4 eQTLs, ~62% SNPs overlapped with TF binding sites derived by at least one of these approaches (compared to ~51% DICE Naïve CD4 eQTLs; OR: 1.58; Fisher’s exact test p-value < 0.00001; Supplementary Data 11F), out of which ~19% SNPs showed TF binding motif evidence from at least two approaches (compared to 1.3% DICE Naïve CD4 eQTLs; OR: 17.3; Fisher’s exact test p-value < 0.00001; Supplementary Data 11F; Supplementary Fig. 4A). Allele-specific motif enrichment analysis by AME on these iQTLs (see Methods) revealed the enrichment of regulatory TFs such as TCF3 (an E protein that plays essential roles in B and T cell development) and FOS (a subunit of the AP-1 TF that plays role in cell proliferation, differentiation and in response to stimuli such as cytokines) for all motif databases (JASPAR, cisbp and HOCOMOCO) and the TFs STAT3 (a lineage defining TF for Th17 and regulates B and T cell responses to stimuli), and REL (a member of NFKB family of TFs associated with ulcerative colitis and rheumatoid arthritis) for two motif databases (Supplementary Fig. 4B, Supplementary Data 4). De novo motif analysis using FIMO showed significant enrichment of regulatory TFs YY1, CTCF, IRF1 (interferon regulatory factor 1), SP1 (zinc finger TF, binds many promoters) and SP2 in all motif databases, and TFs STAT1 (response to pathogens, interferon signaling, Th1 differentiation), MYC (oncogene), ETS1 (a member of ETS family of transcriptional activators) in two motif databases (Supplementary Fig. 4C, Supplementary Data 4). These results suggest that iQTLs overlap binding sites of TFs that are important for T cell differentiation and function as well as TFs implicated in chromatin loop formation such as CTCF and YY1.To show that iQTLs prioritize putative causal or disease-specific eQTLs, we next assessed iQTLs and DICE nCD4 eQTLs by their overlap with fine-mapped GWAS SNPs from various immune diseases (Supplementary Data 6) provided in the CausalDB73 database (http://www.mulinlab.org/causaldb/browse.html). We observed that the iQTLs shared with eQTLs show significantly higher overlap (assessed by Fisher’s exact test) with the GWAS SNPs compared to the full set of eQTLs, particularly for the diseases Asthma, Crohn’s disease, Diabetes Mellitus, Allergic Rhinitis (p-value < 0.00001), psoriasis (p-value < 0.001), Ulcerative colitis, T1D, IBD, and multiple sclerosis (p-value < 0.05) (Supplementary Fig. 4D). In fact, the full set of iQTLs (eQTLs or not) also exhibit higher overlap with the fine-mapped GWAS SNPs for the diseases Asthma, Crohn’s disease (p < 0.00001), psoriasis (p < 0.001), Diabetes Mellitus, multiple sclerosis (p < 0.05) compared to the reference DICE nCD4 eQTLs (Supplementary Fig. 4D). In addition, we performed heritability analysis, which gives insight of the genetic contribution to the complex traits or diseases74. We compared iQTLs and DICE nCD4 eQTLs by their heritability enrichment for various immune diseases (Supplementary Data 7), by adapting the stratified LD score regression75 (S-LDSC) method to compute the heritability enrichment for a given set of SNPs (see Methods and https://github.com/ay-lab/S_LDSC_SNP). We observed that the complete set of iQTLs as well as iQTLs overlapping nCD4 eQTLs provided higher heritability enrichment and statistical significance compared to the complete set of eQTLs for the IBD, IBS, Crohn’s disease, and T2D. For asthma and SLE, however, the overall set of DICE eQTLs produced higher heritability enrichment (Supplementary Fig. 4E). Overall, these results suggest that iQTLs prioritize eQTLs associated with disease risk and higher enrichment of heritability for immune-related GWAS traits.Among the 1276 iQTL loops associated with iQTLs that are also nCD4 eQTLs, a high fraction (884 loops or ~69%) are only associated with such types of iQTLs (Supplementary Fig. 2F and Supplementary Data 4). An example of such an iQTL loop is a 40 kb loop between the genes IKZF3 and GSDMB within the 17q21.1 locus, a locus implicated in many immune-associated diseases (Schmiedel et al. 2016). We obtained 24 iQTLs (Supplementary Data 4A) associated with this loop (Fig. 2A), all of them being significant nCD4 eQTLs of GSDMB and 21 of them (like the SNPs rs2305479, rs7216389) are nCD4 eQTLs for the asthma risk associated gene ORMDL3 (Supplementary Fig. 5A). Genotype-dependent variation of HiChIP contact counts for these iQTLs show similar trends with gene expression, but do not exhibit significant genotype-dependent or allele-specific variation of 1D ChIP or HiChIP coverages (Fig. 2B, Supplementary Fig. 5A). These iQTLs are also relevant to various immune diseases – all 24 iQTLs belong to the fine-mapped (95% credible sets) GWAS SNPs (CausalDB73 database) for the trait Asthma; a few belong to the fine-mapped GWAS SNPs for the traits Type 1 Diabetes (T1D), Systemic Lupus Erythematosus (SLE), and Ulcerative Colitis (Supplementary Data 6). Further, 16 out of 24 iQTLs belong to the list of PICS76 fine-mapped SNPs for five different immune diseases (Supplementary Data 4A) as provided in ref. 10. The SNP rs56380902 (chr17: 38066372) is also shown to be associated with time-dependent activation of ORMDL3 in the single-cell eQTL study of memory T cells13. Another SNP rs56750287 (chr17:38062944) belongs to the list of expression-modulating variants (emVars), a set of regulatory GWAS SNPs validated by MPRA having potential regulatory effects in Jurkat T cells10 (Supplementary Data 4A). Although all 24 iQTLs are in a tight haploblock (R2 > 0.8) they are not linked to the lead eQTL rs17608925 (chr17:38082831, not an iQTL) of the genes GSDMB and ORMDL3 (Supplementary Fig. 5B) suggesting that gene expression variation may not have a 1-to-1 relationship with changes in chromatin looping. In support of this, we observed that the two SNPs we previously identified to overlap with CTCF binding sites in naïve CD4 T cells77, namely rs4065275 (chr17:38080865) and rs12936231 (chr17:38029120) were also in tight LD with iQTLs but not the lead eQTL (Supplementary Fig. 5B). These two SNPs were shown to switch the CTCF binding from the gene ZPBP2 to ORMDL3 intronic region in the risk haplotype (G for rs4065275 and C for rs12936231) compared to non-risk alleles (A for rs4065275 and G for rs12936231).Fig. 2: Examples and properties of iQTLs which are also eQTLs in Naïve CD4 T cells.A Example of iQTL SNPs (which are also nCD4 eQTLs) associated with a 40kb HiChIP loop in chr17 ORMDL3 locus. The color scale of green arcs and the values on top of them indicate mean normalized HiChIP contact counts (indicated by numbers) by genotype for one of these SNPs rs2305479. B Trends of genotype-dependent sequencing depth normalized HiChIP contact counts, ChIP-seq coverage, 1D HiChIP coverage, and allele-specific variation of ChIP and 1D HiChIP reads, for two iQTLs rs7216389 (top) and rs2305479 (bottom) associated with the 40kb HiChIP loop in A. For genotype dependent trends, X axis indicates different SNP genotypes, numbers in the formats (a / b) or (b) denote that the corresponding genotype is present in b donors, out of which a donors have this HiChIP loop as significant (by FitHiChIP). For allele-specific trends, X axis denotes different alleles, and numbers in the format [c] indicate that c heterozygous donors have nonzero reads for this allele. FDRDef denotes the statistical significance (FDR) of this iQTL SNP-loop pair using the default RASQUAL model. For genotype-specific plots, p-values are obtained from linear regression (ANOVA) while for allele-specific plots, p-values are computed by one-sided paired t-test. Boxplots contain mean (bigger black dots), median (middle lines), 25th, 75th percentiles (box) and individual samples (small circles or dots or symbols A* where * indicates numbers). C Example of iQTLs associated with a 40kb HiChIP loop in the CD247 locus, such that the iQTLs are also the lead eQTLs of CD247 in nCD4 (ImmuNexUT). Green arcs indicate mean normalized HiChIP contact counts by genotype for one of these SNPs rs2949661. Enhancers reported by the E2G links of the activity-by-contact (ABC) model for the activated naïve CD4 T cell type are also shown. D Similar to B, for the iQTL rs2949661 associated with the loop in C. E TF motif for IRF1 overlapping the SNP rs2995091 (according to FIMO). The circle denotes the position of the SNP/iQTL within the TF binding motif. nCD4: Naïve CD4, CC: contact count. Source data are provided as a Source Data file.Overall, iQTLs common to DICE nCD4 eQTLs (2471 iQTLs) are associated with 275 eGenes. Fine mapping of these eQTLs using the tool FINEMAP20 (see Methods) produced 111 fine-mapped eQTLs associated with 84 eGenes (~31% of 275 eGenes), suggesting that a high fraction of eGenes show correlated changes in gene expression and HiChIP contacts. An example of such an iQTL and a fine-mapped eQTL is rs3761847 (chr9:123690239), associated with a 50kb HiChIP loop between the genes TRAF1 and PHF19 (Supplementary Fig. 5C and Supplementary Data 4B), showing significant variation of allele-specific HiChIP contacts and 1D HiChIP reads but not for 1D ChIP-seq coverage (Supplementary Fig. 5D), and is also a lead eQTL of TRAF1. The TRAF1 gene is a member of the TNF receptor (TNFR) associated factor (TRAF) 1, which is an intracellular signaling protein involved in the regulation of immune responses, T cell activation, differentiation, and survival. Specifically, TRAF1 is highly expressed for the RA patients, while the gene PHF19 (PHD finger protein 19) is downregulated for the RA patients78. The iQTL rs3761847 overlaps with the reference CD4 Naïve H3K27ac and H3K4me3 ChIP-seq peaks, and also belongs to the list of PICS fine-mapped SNPs (Supplementary Data 4B) for the trait Rheumatoid arthritis (RA)10. In fact, its association with RA is also confirmed by the GWAS catalog1,79, Phenome-wide association study (PheWAS) (https://a2f.hugeamp.org/), and Open Targets Genetics framework80,81. A previous study82 highlighted that the allele G of the intronic SNP rs3761847 (chr9:123690239:G:A) is the risk allele and this allele is also associated with higher expression of TRAF1 (eQTL in nCD4 in both DICE and ImmuNexUT; Supplementary Fig. 5E). Interestingly, the G allele was associated with a weaker loop from the TRAF1 promoter region (Supplementary Fig. 5D) and 7 other iQTL SNPs that are within 15 kb distance of rs3761847 and are in near perfect LD were all also associated with weaker loops in the region (Supplementary Data 4B). Upon examination none of these seven iQTLs overlapped CTCF binding motifs but rs3761847 overlapped a motif for ZNF263, another zinc finger protein (ZNF) the binding sites of which are recently shown to be enriched at relocated cohesin upon CTCF and MAZ depletion in mouse embryonic stem cells83. Our motif enrichment analysis using FIMO, also highlighted ZNF263 binding sites as one of the motifs enriched for iQTLs that are also nCD4 eQTLs (Supplementary Fig. 4C, Supplementary Data 4), suggesting a potential insulator role for this ZNF protein in T cells.Similar to the iQTL rs3761847 which is also a lead DICE nCD4 eQTL of TRAF1, overall, we obtained 384 iQTLs which are also the lead eQTLs of one or more eGenes in either DICE or ImmuNexUT (Supplementary Data 8). As an example, all 10 iQTLs associated with the 40kb HiChIP loop in the CD247 locus (Supplementary Data 4C and Fig. 2C) are also the lead eQTLs (p-value = 6.8 * 10−7) of the CD247 gene (ImmuNexUT), and they are mutually in strong to moderate LD (R2 > 0.7) (Supplementary Fig. 5F). The CD247 encodes for the CD3 zeta chain and is part of the T-cell-receptor (TCR)-CD3 complex and hence critical in transmission of TCR-mediated signals in T cells. Genotype dependent trends of HiChIP contact counts for these 10 QTLs (e.g., rs2949661) are concordant with their eQTL trends of CD247 expression (Fig. 2D, Supplementary Fig. 5G) for different CD4 T cell subsets. Similar trends are observed for allele-specific ChIP-seq (statistically significant; p-value = 0.001, paired t-test) and 1D HiChIP coverage (Fig. 2D). All 10 iQTLs are also fine-mapped GWAS SNPs (CausalDB73 database) for the traits asthma and allergy (Supplementary Data 6). The SNP rs2949661 (chr1:167424924) in particular, is a regulatory SNP overlapping with reference nCD4 H3K27ac ChIP-seq peaks, ChIP-seq peaks for the TFs ETS1, BCL6, RUNX1, BRD4 and RNA-pol-II (ChIP-Atlas67) (Supplementary Data 4C). This SNP rs2949661 also overlaps with activated CD4 Naïve cell ABC enhancers66 associated with the gene CD247 via significant E2G links (Fig. 2C). Further, rs2949661 exhibits allele-specific binding of ETS1 (FDR = 7.2 * 10−4 as per the ADASTRA71,72 database) in Th1 cells. Lastly, rs2949661 is also shown to be associated with Sjogren’s syndrome by being an eQTL to both genes CD247 and cellular repressor CREG1 and by altering TAD interactions involving CD24784. Another iQTL rs2995091 (chr1:167434277) in near perfect LD with rs2949661 overlaps binding motif of IRF1 (activator of interferon alpha and beta transcription) (Fig. 2E), which is another TF with motifs enriched in this category of iQTLs (Supplementary Fig. 4C, Supplementary Data 4). We also obtained rs2995091 as one of the nCD4 eQTLs from a recent single-cell eQTL study15. Open Targets Genetics80,81 confirms promoter capture Hi-C loops45 between these iQTLs and CD247. Overall, our results suggest that, for some loci, iQTLs may find a refined subset of eQTLs where the effect on gene expression can be attributed to underlying changes in chromatin contacts. In many cases, this is likely mediated by changes in TF binding to the iQTL SNP or to other nearby SNPs that are in LD. The concordance between iQTL and eQTL effects may depend on which TFs are binding weaker/stronger to these SNPs. This iQTL subset is also enriched in fine-mapped GWAS SNPs for various immune diseases as supported by genome-wide analysis and the example loci highlighted above supporting their significance.Naïve CD4 T cell iQTLs pinpoint cell-specific eQTLs in other CD4 T cell subsets, revealing potential regulation of gene expression by chromatin interactionsUnderstanding the variation of eQTL effects according to different cell states or differentiation stages such as different CD4 T cell subsets11 or immune cell types15 or inferring activation or differentiation-dependent eQTLs12,77,85 help reveal the impact of genetic effects on complex traits. In view of this, we next cataloged 1143 (~12%) iQTLs that are not eQTLs in nCD4 but are eQTLs for other CD4 T cell subsets (8 cell types including stimulated CD4 T cells, Tfh, Th1, Th2, Th17, Th1/17, naïve and memory Tregs compiled from DICE and ImmuNexUT; see Methods, Supplementary Fig. 3G, Supplementary Data 4). In total, these 1143 iQTLs were associated with 482 distinct HiChIP loops. Around 34% (386) of these iQTLs overlapped with regulatory peaks (Supplementary Data 4), a percentage comparable to iQTLs which are nCD4 eQTLs, and much higher (OR: 3.41; Fisher’s exact test p-value < 0.00001; Supplementary Data 11E) compared to the complete set of DICE nCD4 eQTLs. Overall, 474 iQTLs (~41%) of this category overlap are within 5kb of ABC enhancers (198 or ~17% with exact overlap) of activated CD4 Naïve T cells. Around 60% of iQTLs in this category overlapped TF binding motifs supported by at least one of the techniques FIMO, AME or ADASTRA TF (OR 1.45, Fisher’s exact test p-value < 0.00001, compared to the complete set of DICE nCD4 eQTLs, Supplementary Data 11F) and ~20% had motif overlap supported by at least two methods (OR 18.5; Fisher’s exact test p-value < 0.00001, compared to the complete set of DICE nCD4 eQTLs, Fig. 3A, Supplementary Data 4, Supplementary Data 11F), both percentages similar to iQTLs that are nCD4 eQTLs (Supplementary Fig. 4A). These iQTLs, similar to iQTLs that are nCD4 eQTLs, showed enrichment of TF motifs (FIMO analysis) for zinc finger proteins such as ZNF263, SP1/2/3/4 and MAZ, transcriptional activators STAT1 and RUNX1, as well as interferon response factors IRF1/2/3 but not for CTCF or YY1 (Fig. 3B, Supplementary Data 4). Allele-specific enrichment analysis using AME also identified additional TF motifs such as ZEB1 (regulates T cell development differentiation) and REL (Supplementary Fig. 6A, Supplementary Data 4).Fig. 3: Properties of iQTLs which are not eQTLs in nCD4 T cell but are eQTLs in other CD4 T cell subsets.A iQTLs in this category with or without TF binding motifs (left); overlap of iQTLs with TF binding motifs from different approaches – de novo motif analysis using FIMO, allele-specific motif enrichment by AME, and ADASTRA TF database. For AME and FIMO, iQTLs overlapping TF binding motifs with respect to one or more of the three databases (cisbp, hocomoco or JASPAR) are considered. B TF motif enrichment by FIMO for these iQTLs, such that the motif is supported by at least two of the three motif databases cisbp, hocomoco or JASPAR. TFs with gene expression > 1 TPM having motifs with p-value < 1e-6 are plotted. C MICB locus having multiple iQTLs associated with multiple HiChIP loops, where the iQTLs are eQTLs of MICB in T helper subsets including Th1, Th2, Th17, Tfh and Tregs. The symbol * indicates that the corresponding SNP (rs4947321) is a fine-mapped eQTL of MICB in Treg memory (DICE database). ABC model enhancers, corresponding to the E2G links of the activated naïve CD4 T cell type, are also shown. Bottom three tracks show CD4 T cell ATAC-seq, CTCF log fold change over control ChIP-seq track (used as an input to C. Origami) and the impact score at 1 kb resolution obtained by C. Origami, where higher impact score indicates higher impact on 3D chromatin organization by perturbation of the corresponding loci. D Genotype-dependent variation of MICB expression for the SNP rs4947321 in the cell types CD4 Naïve, Treg Memory and Th17 (according to the DICE database). Boxplots indicate median (middle lines), 25th, 75th percentiles (box) and minimum and maximum values (whiskers). E Similar to D, for the SNP rs3828912 in the cell types Naïve CD4 and Treg Memory. F CTCF binding motif overlapping with the SNP rs3828912 in Spleen, according to the ADASTRA database. TF Transcription factor, nCD4 Naive CD4. Source data are provided as a Source Data file.To understand whether these iQTLs and the genotype-dependent variation of the associated loops in nCD4 are mechanistically linked to eQTL effects in other CD4 T cell subsets, we focused on the differentially expressed genes (DEGs; FDR < 0.05) upregulated (log2 fold change > 1) in various CD4 T cell subsets compared to nCD4 cells. Overlapping the considered set of iQTLs with eQTLs of these DEGs, we obtained 268 such iQTLs and 71 DEGs (Supplementary Data 9). An example DEG upregulated in Treg memory cell type is the MICB (MHC class I polypeptide-related sequence B) gene, which is a ligand for the NKG2D receptor. This MICB locus contains multiple HiChIP loops associated with a total of 90 iQTLs where none of them are eQTLs of nCD4 in either DICE or ImmuNexUT (Fig. 3C). However, 43 of these 90 iQTLs are eQTLs of MICB for Treg memory cells (Supplementary Data 4D). To showcase the importance of this MICB locus in regulating 3D genome organization for CD4 T cells, we applied the deep learning technique C. Origami86 which models 3D contact map from DNA sequence and cell-type-specific epigenomic tracks such as ATAC-seq and CTCF fold changes (see Methods). Using the pre-trained model, C. Origami also supports screening a specific region or locus by a fixed resolution (like 1Kb) to assess the potential perturbation impact of regulatory elements on 3D genome. Using a pre-trained C. Origami model on CD4 T cell Hi-C dataset provided in ref. 59 (see Methods), we applied perturbation screening of 1 kb resolution and observed that the locus containing iQTLs also exhibit high impact scores (indicating higher regulatory impact), thereby potentially impacts 3D genomic changes (Fig. 3C). The iQTL rs4947321 (chr6:31468091), in particular, is a fine-mapped eQTL of MICB (derived by FINEMAP20) in Treg Memory as well as an eQTL for various other CD4 T cell subsets including Tfh, Th1, Th2 and Th17, all with higher MICB expression for the non-reference allele G (Fig. 3D–E). This iQTL is in strong LD (R2 > 0.8) with 38 out of the remaining 42 iQTLs, which are also eQTLs of MICB in Treg Memory (Supplementary Fig. 6B). Two of such linked iQTLs rs3828912 and rs3828914, showing similar trends of MICB expression and HiChIP contacts (higher values for non-reference alleles) (Fig. 3E), overlap with T-cell-specific CTCF ChIP-seq binding sites87, and CTCF ChIP-seq peaks from spleen (ReMap DNA binding database88). These SNPs also overlap with the enhancers involved in the E2G links to the gene MICB, according to the ABC model66 (Fig. 3C). The SNP rs3828912 also shows allele-specific binding preference for CTCF for spleen and for pancreas tissue in the ADASTRA database (Fig. 3F). However, these iQTLs do not show significant genotype-dependent or allele-specific variation of 1D ChIP or HiChIP coverages (Supplementary Fig. 6C) suggesting that the effects of these SNPs are solely at 3D level. Multiple other iQTLs in this locus are GWAS SNPs for various immune diseases, such as Systemic lupus erythematosus or SLE (rs4947321, rs6930510, rs9267366, rs2534671, rs2534681), Multiple Sclerosis (rs3828912), Type 1 diabetes or T1D (rs4947321, rs2534671, rs2855807, rs2534681), and Rheumatoid Arthritis or RA (rs5027463) according to PheWAS (https://a2f.hugeamp.org/) and causalDB databases (Supplementary Data 6). These results suggest that iQTLs that are not eQTLs in the studied cell type may likely have underlying eQTL effects that will surface when the specific cell type is perturbed, activated or differentiated into different fates.iQTLs not eQTLs in any CD4 T cell subset are enriched for fine-mapped GWAS SNPsThe third category of iQTLs constitute 2902 SNPs (~31%) which are not eQTLs (either DICE or ImmuNexUT) in any CD4 T cell subset that is profiled (Supplementary Data 4). In terms of overlap with regulatory regions (~30%) (OR: 2.87, Fisher’s exact test p-value < 0.00001 with respect to the complete DICE nCD4 eQTLs, Supplementary Data 11E), and TF binding motifs (FIMO analysis: 61% with 20% supported by two databases, odd ratios of 1.51 and 18.45, respectively, with respect to the complete DICE nCD4 eQTLs; Supplementary Data 11F), this set of iQTLs were very similar to the two previously discussed categories above that are eQTLs in either nCD4 T cells or other CD4 subsets (Fig. 4A, Supplementary Data 4). In terms of enriched motifs from FIMO and allele-specific analysis using AME, the identified TFs were also very similar (e.g., ZNFs, IRFs, SP1/2/3/4) with YY1 coming up as significant from both approaches but without an enrichment for CTCF (Supplementary Fig. 7A, B). Out of 696 HiChIP loops solely associated with iQTLs which are not eQTLs in any CD4 T cells (Supplementary Fig. 2F), an example is a 60kb loop associated with 33 iQTLs within the RUNX3 locus (Fig. 4B and Supplementary Data 4E). These iQTLs are mutually in tight (R2 > 0.8) LD (Supplementary Fig. 7C), and they do not exhibit any genotype or allele-specific changes with respect to 1D ChIP-seq or HiChIP coverage (Fig. 4C). The RUNX3 gene is a transcriptional activator and regulates the differentiation of CD4+ T cells into various subsets (Th1, Treg) and is expressed across naïve cells and different T helper populations in human. Although not eQTLs in CD4 T cell subsets, majority of these iQTLs are associated with T1D and Atopic Dermatitis, as confirmed by CausalDB fine-mapped GWAS73, PICS76 fine-mapped SNPs provided in ref. 10 (Supplementary Data 4E, 6), and PheWAS studies (https://a2f.hugeamp.org/). These iQTLs, being proximal to the gene RUNX3, also exhibit high variant-to-gene (V2G) association with the gene, as reported by Open Targets Genetics platform80,81. The iQTLs rs10751776 and rs11249221, in particular, are eQTLs in myeloid dendritic cells (mDC) (Supplementary Fig. 7D, E). The rs10751776 SNP shows CTCF binding motifs according to both ADASTRA database and FIMO (Fig. 4D), while rs11249221 overlaps a binding motif for ZEB1 (Fig. 4E). These results suggest that iQTLs with no associated eQTLs in any CD4 cell subset may also be of interest in annotating GWAS variants that cannot be explained by other QTL modalities. The limited overlap of this category of iQTLs (13%) with H3K27ac hQTLs, in comparison to 28% overlap for all iQTLs, while having similar overlap with regulatory regions and TF binding motifs, supports this possibility.Fig. 4: Example of iQTLs which are not eQTLs in any CD4 T cell subsets.A iQTLs that are not eQTLs in any CD4 subset with or without TF binding motifs (left); overlap of iQTLs with TF binding motifs from different approaches – de novo motif analysis using FIMO, allele-specific motif enrichment using AME, and ADASTRA TF database. For AME and FIMO, iQTLs showing TF binding motifs with respect to any of the three databases cisbp, hocomoco or JASPAR are considered. B Example iQTLs rs10751776 and rs11249221 associated with a 60kb HiChIP loop involving the gene RUNX3. Green arcs indicate mean normalized HiChIP contact counts by genotype for rs10751776. ABC model enhancers, corresponding to the E2G links of the activated naïve CD4 T cell type, are also shown. C Trends of genotype-dependent ChIP-seq coverage, 1D HiChIP coverage, and allele-specific variation of ChIP and 1D HiChIP reads, for two iQTLs rs10751776 (left) and rs11249221 (right) associated with the 60kb HiChIP loop in B. For genotype dependent trends, X axis indicates different SNP genotypes, numbers in the format (b) denote that the corresponding genotype is present in b donors. For allele-specific trends, X axis denotes different alleles, and numbers in the format [c] indicate that c heterozygous donors have nonzero reads for this allele. For genotype-specific plots, p-values are obtained from linear regression (ANOVA) while for allele-specific plots, p-values are computed by one-sided paired t-test. Boxplots show median (middle lines), 25th, 75th percentiles (box) and individual samples (black dots or dots with symbols A* where * indicates numbers). D CTCF binding motif overlapping the SNP rs10751776 according to the JASPAR database (reverse complement). Note that this region does not have any CTCF binding (ChIP-seq signal) in naïve CD4 T cells. E The SNP rs11249221 overlaps ZEB1 binding motif, according to the JASPAR database. CC: contact count. Source data are provided as a Source Data file.iQTLs may be associated with concordant changes in chromatin contacts over a broad genomic locusAs previous QTL studies7,8 report eQTLs regulating expression of multiple genes, we similarly obtained 2090 iQTLs (~22%) associated with multiple HiChIP loops (Supplementary Data 4) potentially involving multiple genes or regulatory regions. We thus enquired whether there exist iQTLs associated with not only one loop but rather with multiple HiChIP loops in a broad locus, and devised a method to infer such QTLs (Fig. 5). Considering an example, the iQTL rs2247325 in the RNASET2 locus is associated with two HiChIP loops (Fig. 6A, B) while the SNPs rs11980001 and rs6464142 are associated with 6 and 5 HiChIP loops, respectively, in the NUB1 locus (Supplementary Fig. 9A, B, D). Multiple HiChIP loops associated with a common iQTL could share one common anchor (+/− 5 kb) but may span different genes or regulatory elements over a broad region with their other anchors. In view of this, we next enquired whether any iQTL X is associated with changes in chromatin contacts over a broad region R, and whether such changes are accompanied by changes in gene expression within that region. For an iQTL X and its associated HiChIP loop L, we define such a testable region R by adapting the connected component labeling algorithm (see ref. 89 and our previous implementation of merge filtering in FitHiChIP64). We utilized the spatial adjacency of the interacting bins of HiChIP loops (see Methods) to define the connected component and underlying HiChIP loops, and defined the testable region R using the union of interacting bins (denoted by the segments A and B in Fig. 5) of these HiChIP loops. If the segments A and B overlapped, the region R was defined by the complete span (A+B) of these regions, while for distinct segments, R was defined by disjoint intervals A and B. An iQTL X was defined as a connectivity-QTL for region R provided the complete set S of HiChIP loops (either within A+B or between the disjoint A and B regions) and the corresponding contact map showed significant variation associated with different genotypes of X (see Methods and Fig. 5). Overall, we obtained a set of 218 connectivity-QTLs across 29 loci (regions) (linear-regression p-value <10−5, Bonferroni adjusted p-value <0.005, and >0.1 connectivity difference between genotype aggregated contact maps; see Methods), where underlying HiChIP loops spanned from distances of up to 1.55 Mb (Supplementary Data 10). Majority (182 out of 218, or ~83%) of these connectivity-QTLs were eQTLs of nCD4, while only 26 iQTLs (~12%) were not eQTLs in any CD4 T cell subsets (Supplementary Data 4, 10) indicating that the changes of chromatin conformation over a broad region is highly likely to be accompanied by changes in gene expression. This was also consistent with the high concordance between the direction of eQTL effects and connectivity-QTL effects with 99% (180/182) of the cases the genotype corresponding to higher expression also associated with higher connectivity.Fig. 5: Workflow to derive connectivity-QTLs.For a given iQTL X and the associated HiChIP loop L, connected component labeling algorithm is adapted to derive the set of HiChIP loops S spatially proximal to the given loop L (inclusive). The union of interacting bins A and B for the set of loops S can either overlap or not. To decide whether X is a connectivity-QTL associated with S, observed / expected contact counts for all loops in S are combined per genotype and the trend is tested for statistical significance with additional filtering criteria on the differences between contact maps aggregated per genotype. Here, r1s and r1e denote the start and endpoints of the region A, while r2s and r2e denote the same for the region B.Fig. 6: Example of connectivity-QTLs.A Example of a connectivity-QTL rs2247325 associated with the connectivity of RNASET2 locus containing overall 4 iQTL-associated loops (numbered), two of which (numbered 1 and 2, shown in the track iQTL Loop) are associated with the iQTL rs2247325. Three tracks below iQTLs indicate the changes in overall HiChIP connectivity (color scales indicate the contact counts) according to the genotypes of rs2247325. ABC model enhancers, corresponding to the E2G links of the activated naïve CD4 T cell type, are also shown. Bottom three tracks show CD4 T cell ATAC-seq, CTCF log fold change over control ChIP-seq track (used as an input to C. Origami) and the impact score at 1kb resolution obtained by C. Origami, where higher impact score indicates higher impact on 3D chromatin organization by perturbation of the corresponding loci. B Genotype-dependent variation of sequencing-depth normalized HiChIP contacts and allele-specific variation of HiChIP reads for the iQTL rs2247325 and for the loops 1 (left) and 2 (right). FDRDef indicates statistical significance (FDR) of these SNP-loop pairs using the default RASQUAL model. In the genotype-dependent plots, numbers in the format (a / b) denote that the corresponding genotype is present in b donors, out of which a donors have this HiChIP loop as significant (by FitHiChIP). In the allele-specific plots, X axis denotes different alleles, and the underlying numbers indicate the number of donors having nonzero HiChIP read counts for the respective alleles. For genotype-specific plots, p-values are obtained from linear regression (ANOVA) while for allele-specific plots, p-values are computed by one-sided paired t-test. Boxplots show mean (bigger black dots), median (middle lines), 25th, 75th percentiles (box) and individual samples (smaller black dots or dots with symbols A* where * indicates numbers). C Genotype-dependent variations of the observed vs expected contact counts across all HiChIP loops within this locus (shown in A). Counts are stratified with respect to the SNP rs2247325. Here, X axis denotes different SNP genotypes, and the underlying numbers indicate the number of donors having that genotype. The p-values are obtained from linear regression (ANOVA). Boxplot shows median (middle lines), 25th, 75th percentiles (box) and individual samples (black dots). D Aggregated contact map (log2-transformed observed vs expected contact counts) for different genotypes of the SNP rs2247325. The numbers on top are the 95th percentile values of the respective contact maps. Obs: observed, Exp: expected, CC: contact count. Source data are provided as a Source Data file.Considering the 115kb RNASET2 locus that harbors SNPs associated with multiple autoimmune disease, we identified 4 HiChIP loops associated with 51 iQTLs (Supplementary Data 4), out of which the iQTL rs2247325 was found to be a connectivity-QTL (Fig. 6A, Supplementary Data 10). This SNP is also a strong eQTL of RNASET2 for various CD4 T cell subsets (Supplementary Fig. 8A) and an iQTL for two HiChIP loops both with significant trends for genotype-dependent and allele-specific contact counts (Fig. 6B). As a connectivity-QTL, rs2247325 shows a significant genotype-dependent variation of contact counts for the associated HiChIP loops (Fig. 6C) and for overall connectivity (all HiChIP contact counts) of this region with both effects in the same direction as the RNASET2 expression (Fig. 6D, Supplementary Fig. 8A). However, genotype or allele-specific variation of 1D ChIP and HiChIP coverages for this SNP are not significant suggesting that this connectivity-QTL directly affects the 3D genomic chromatin contacts (Supplementary Fig. 8B). Interestingly, the SNP rs2247325 is in only moderate LD (R2 ~ 0.65) with the rest of the 50 iQTLs that are tightly linked (R2 > 0.8) (Supplementary Fig. 8C). The SNP rs2247325 is a regulatory SNP and is also linked to the GWAS studies for various diseases such as Crohn’s disease, various autoimmune traits, hypothyroidism, diabetes, inflammatory bowel disease (IBD), etc. according to Open Targets Genetics80,81 and PheWAS (https://a2f.hugeamp.org/) analysis. Genome-wide perturbation screen predicted by the deep learning method C. Origami86 using the training model defined by reference CD4 T cell Hi-C data (provided in ref. 59) shows that the connectivity-QTL rs2247325 overlaps with loci with the highest impact score (Fig. 6A). The SNP also overlaps with the CD4 T cell ATAC-seq peak, and an activated CD4 Naïve cell ABC enhancer with an enhancer-to-gene link to RNASET266. Strong eQTL trend accompanied by substantial changes in chromatin contacts within this locus suggests a concerted genotype-dependent variation in regulation of RNASET2 consistent with the identification of this gene in many GWAS and post-GWAS functional analysis of immune cells.Another 6 connectivity-QTLs, including the previously mentioned SNPs rs11980001 and rs6464142, belong to a 200kb region involving NUB1 gene (Supplementary Fig. 9A) and show a significant genotype-dependent variation of HiChIP contact counts and aggregated contact maps (Supplementary Figs. 9B-G). All 6 connectivity-QTLs are in strong (R2 > 0.8) LD (Supplementary Fig. 9H), and are eQTLs of nCD4 (ImmuNexUT) for the genes CRYGN and WDR86, however with both genes having very low expression (<3 TPM on average). The iQTL rs11980001 is of particular importance (Supplementary Fig. 9A) since it is also an eQTL of the interacting gene NUB1 in naïve CD8 (ImmuNexUT) (Supplementary Fig. 9I), overlaps with nCD4 H3K27ac ChIP-seq peaks, and the genotype-dependent variation of HiChIP contact counts matches with the expression trend of NUB1 in naive CD4 DICE data (Supplementary Fig. 9J). The locus containing these SNPs also exhibits high impact scores from C. Origami and overlaps with CTCF log fold change tracks, thereby may be a potential regulator of 3D chromatin organization. This locus also overlaps with the ABC enhancers of activated naïve CD4 T cell having E2G links to both CRYGN and WDR86 (Supplementary Fig. 9A). Open Targets Genetics also confirms PCHi-C interactions between rs11980001 and NUB1 for naïve CD4 and CD8 cell types. Furthermore, this SNP is also identified as a single-cell eQTL for the memory T cells11, showing its potential importance in regulation of NUB1 across different T cell subsets.Both examples so far have highlighted connectivity-QTLs associated with a contiguous region. We then focused on connectivity-iQTLs in the KANSL1 locus (chr17) where the associated HiChIP loops are between two disjoint regions ~350 kb apart spanning 65kb (A) and 210kb (B), respectively (Supplementary Fig. 10A). This locus has been previously marked for containing fusion transcripts related to T-ALL leukemia90. The KANSL1 gene is also related to a variety of cellular processes including enhancer regulation, cell proliferation, and mitosis. Overall, we obtained 44 connectivity-QTLs in this locus, all of them in tight (R2 > 0.8) LD, showing significant genotype-dependent changes of HiChIP contact counts and aggregated contact maps (Supplementary Figs. 10B-D). Further, 38 of these 44 iQTLs are eQTLs not only for the nCD4 but for all 15 cell types in the DICE database (Supplementary Data 4, 10). The 17q21.31 locus with these QTLs also harbors a common inversion polymorphism spanning up to 1Mb that includes our A and B regions and all HiChIP loops connecting them. The 17q21.31 inversion polymorphism exhibits two primary haplotypes: non-inverted (H1) and inverted (H2), also known as the MAPT haplotypes. These are connected to diverse diseases including neurodegenerative, developmental, and immune-related disorders. Additionally, they are linked to normal variations in common traits such as brain size. Notably, recent cerebral cortex GWAS pinpointed this inversion as a top hit linked to diminished overall cortical surface area, particularly prominent in the visual cortex91,92. One of the connectivity-QTLs we identified, rs62071575 (also named rs554959222), was among the numerous SNPs reported for their high LD with the inversion, where the C allele was noted to tag this inversion (R2 of 0.94)93. This SNP was also an eQTL for KANSL1 gene (within the inverted region) and ARL17A gene (at the right boundary of the inversion) with inverted allele showing higher expression for both genes (Supplementary Fig. 10E) consistent with previous reports91,92. The genotype-dependent connectivity pattern of the overall region for this SNP revealed strong proximity of the two inversion endpoints especially for donors with C/C genotype confirming the 1D/genomic proximity or juxtaposition of these endpoints upon inversion. Although this is a genetic event rather than an epigenetically driven change in contact maps, this case confirms the validity of our connectivity-QTL mapping in identifying genotype-dependent patterns in chromatin contacts.

Hot Topics

Related Articles