Enhancer-promoter hubs organize transcriptional networks promoting oncogenesis and drug resistance

Contact for reagent and resource sharingFurther information and request for reagents may be directed to and will be fulfilled by the corresponding author, Robert B. Faryabi (faryabi@pennmedicine.upenn.edu).Experimental proceduresCell cultureAll of the data analyzed in this study for TNBC cell lines MB157 (ATCC, Cat# CRL-7721) and HCC1599 (ATCC, Cat# CRL-2331) and for T-ALL cell line CUTLL163 were taken from previous investigations (see subsequent section). For the purpose of the FISH experiments conducted in this study, DND41 (DSMZ, Cat# ACC525) GSI-sensitive and GSI-resistant cells were cultured and maintained as previously described22. Briefly, the DND41 cell line used for FISH analysis in this study was purchased from the Leibniz-Institute DSMZ-German Collection of Microorganisms and Cell Lines, and was grown in suspension with RPMI 1640 (Corning, cat# 10-040-CM) supplemented with 10% fetal bovine serum (Thermo Fisher Scientific, cat# SH30070.03), 2 mM L-glutamine (Corning, cat# 25-005-CI), 100 U/mL and 100 μg/mL penicillin/streptomycin (Corning, cat# 30-002-CI), 100 mM nonessential amino acids (GIBCO, cat# 11140-050), 1 mM sodium pyruvate (GIBCO, cat# 11360-070) and 0.1 mM of 2-mercaptoethanol (Sigma, cat# M6250). GSI-resistant DND41 cells were constantly cultured with GSI compound E (125 nM, Calbiochem, cat# 565790) and were periodically validated to have maintained the drug-resistant state by Western blotting for Notch intracellular domain 1 (NICD1), which is not present in cells constantly treated with GSI. Parental and resistant DND41 cells were used at a low passage number (<12) and subjected to regular (approximately every 6 months) mycoplasma testing and short tandem repeat (STR) profiling.For experiments involving ibrutinib-sensitive and ibrutinib-resistant Rec-1 MCL, Rec-1 cells from the Genentech cell bank were cultured in RPMI 1640 (Corning, cat# 10-040-CM) supplemented with 10% fetal bovine serum (Thermo Fisher Scientific, cat# SH30070.03), 2 mM L-glutamine (Corning, cat# 25-005-CI), 100 U/mL and 100 μg/mL penicillin/streptomycin (Corning, cat# 30-002-CI), 100 mM nonessential amino acids (GIBCO, cat# 11140-050), 1 mM sodium pyruvate (GIBCO, cat# 11360-070) and 0.1 mM of 2-mercaptoethanol (Sigma, cat# M6250). Ibrutinib-resistant cells were generated over prolonged period of time using ibrutinib (Selleckchem, cat# S2680) dose escalation until they were stable in cell culture media supplemented with 100 nM ibrutinib. Ibrutinib resistance was confirmed by 100-fold shift for ibrutinib IC50 between parental and resistant Rec1 lines. Parental and resistant Rec-1 cells were used at a low passage number ( < 12) and subjected to regular (approximately every 6 months) mycoplasma testing and short tandem repeat (STR) profiling. Parental and resistant cells, when used for RNA-seq, ATAC-seq, Hi-C, and ChIP-seq experiments, were treated with DMSO (parental) or ibrutinib (resistant) for 24 h following BCR crosslinking with IgM. Cells used for ChIP-seq and Hi-C assays were subsequently fixed with 1% or 2% formaldehyde, respectively. Fixed frozen cell pellets were stored at −80C and used when needed.Multi-omic assaysIn this study, we performed in-situ Hi-C, RNA-seq, H3K27ac ChIP-seq, and ATAC-seq on ibrutinib-sensitive and ibrutininb-resistant Rec-1 cells. Refer to the subsequent sections for descriptions on how these assays were performed. Excluding ibrutinib-sensitive and -resistant Rec-1 cells, multi-omic data for DND41, MB157, HCC1599, (untreated) Rec-1 cells, and CUTLL1 were obtained from previous investigations14,22,29,30,31. Specifically, in situ Hi-C, SMC1 HiChIP, ATAC-seq, RNA-seq, H3K27ac ChIP-seq, and H3K27me3 CUT&RUN data from DND41 GSI-sensitive and GSI-resistant cells were downloaded from GSE173872. SMC1 HiChIP data for MB157, HCC1599, and Rec-1 were downloaded from GSE116876 along with H3K27ac ChIP-seq data for GSI-washout MB157, Rec-1, and HCC1599 and RNA-seq data for GSI-washout MB157 and HCC1599. RNA-seq data for GSI-washout Rec-1 was downloaded from GSE59810. GSI-washout cells were treated with GSI (1 uM, Calbiochem) for 72 h before being washed and cultured in media containing DMSO for 5 h, at which time RNA-seq and ChIP-seq assays were run. For the purposes of this study, data collected from GSI-washout cells was considered equivalent to data collected from untreated cells for hub identification and analysis. For CUTLL1, in situ Hi-C and H3K27ac ChIP-seq data were downloaded from GSE115896, RNA-seq data was downloaded from GSE59810, and ATAC-seq data was downloaded from GSE216430.Oligopaint FISH probe synthesisDNA FISH Oligopaint probe libraries targeting three 50 Kb elements within the LEF1 hub and three 75 Kb elements in the IKZF2 hub were designed using OligoMiner64. Each of the six distinct probe sublibraries was amplified and isolated from a larger pooled library using several short oligonucleotide primers (RPL34 F primer: CTCGAATCGGTGTCGCATTC, R primer: TTGACGTTTGCGCCGAATAC; LEF1 promoter F primer: TCCGCCGTGTTATCGATTTG, R primer: ATTCAACGGCCCTCGATTTG; LEF1 enhancer F primer: TCATAATTCGGCGCTTGGTG, R primer: TGTATCGCGCGGTCAATTTC; IKZF2 promoter F primer: TCGCTACGCCGGTTGTAATG, R primer: ATTACCGCGACCGGTTGAAG; IKZF2 5ʹ F primer: CAGTTACCGGTCCGTCGATG, R primer: ACGTATCGTCCCGCAACATG; IKZF2 3ʹ F primer: TTGTCGCGATGCCATAGACG, R primer: AGCTCAATCGTCGCACGATC), as previously described22. Briefly, the pooled library was amplified via low cycle PCR and the T7 promoter was separately cloned into oligos within the six probe sublibraries of interest using primers specific to each probe sublibrary, which were purchased from IDT. Each probe was transcribed to RNA using a T7 RNA polymerase, and then RNA was reverse transcribed back into DNA oligos, which were purified and isolated for subsequent nuclear DNA hybridization. In order to visualize primary probes with fluorescence microscopy, secondary DNA probes conjugated with Alexa-488 (sequence: 5Alex488N/CACACGCTCTTCCGTTCTATGCGACGTCGGTG/3AlexF488N), Atto-565 (sequence: 5ATTO565N/ACACCCTTGCACGTCGTGGACCTCCTGCGCTA/3ATTO565N), and Alexa-647 (sequence: 5Alex647N/TGATCGACCACGGCCAAGACGGAGAGCGTGTG/3AlexF647N) reporters, which were also purchased from IDT, were used.3D Oligopaint DNA FISH on slidesCells were prepared for DNA FISH as previously described22. Briefly, GSI-sensitive and GSI-resistant DND41 cells were first incubated on poly-L-lysine-coated glass slides (ThermoScientific, cat# P4981-001) and fixed in a solution of 4% formaldehyde in PBS for 10 min. Cell membranes were permeabilized by submerging slides in 0.5% Triton X-100 in PBS for 15 min and then denatured by immersion in increasing concentrations of ethanol (70%, 90%, and 100%). Cells were further permeabilized with cycles of immersion in heated 2× SSCT/50% formamide before incubation with a hybridization buffer containing 100 pmol of each of the three Oligopaint probes targeting the locus of interest. Slides were then sealed with rubber cement and a coverslip. After incubating in a 37 °C humidified chamber for 16 h, the coverslip and probe hybridization solution were removed from the slides, which were once again cyclically submerged in 2×  SSCT and 0.2× SSCT in heated water baths to re-permeabilize cell and nuclear membranes. Another hybridization mix containing secondary probes conjugated to fluorophores was aliquoted onto each slide before slides were sealed with rubber cement and coverslips. After incubation in the 37 °C humidified chamber for 2 h, slides were submerged in 2×  SSCT. DAPI dye was then added to stain nuclei, and slides were submerged in 2× SSCT for a final time. Finally, mounting media (Invitrogen, Ref# 336936) was added to each slide and coverslips were sealed onto the slides using transparent nail polish. Slides were then imaged using a Leica SP8 confocal microscope (IKZF2 locus) with a 40× oil immersion objective or the Vutara VXL microscope (LEF1 locus) (Bruker) in the widefield, epi-fluorescence microscope setting.Rec-1 ibrutinib in-situ Hi-C106 parental or ibrutinib-resistant Rec-1 cells fixed with 2% formaldehyde were used as input for Hi-C assay performed with Arima Hi-C kit (Arima Genomics, cat#A510008) per the manufacturer’s instructions. Both samples passed Arima QC1 and were subsequently used for library generation with Accel-NGS 2S Plus DNA Library Kit (Swift, cat# 21024) and 2S Set A Indexing Kit (Swift, cat# 26148). After passing Arima QC2, samples were PCR amplified for 6 cycles and quality was inspected using D5000 on Agilent 4200 Tapestation. Libraries were paired-end sequenced (2x150bp) using NovaSeq.Rec-1 ibrutinib H3K27ac chromatin immunoprecipitation sequencing (ChIP-seq)ChIP-seq was performed using a previously published protocol14. 10  × 106 parental or ibrutinib-resistant Rec-1 cells previously fixed with 1% formaldehyde (Pierce, cat#28908) and quenched with 0.125 M Glycine (Fisher scientific, cat#AAJ1640736), were sonicated for 5.5 min on Covaris L220 with the following settings: PIP 350, DF 15%, CPB 200. Chromatin was cleared with recombinant protein G–conjugated agarose beads (Invitrogen, cat# 15920-010) and subsequently immunoprecipitated with H3K27ac antibody at 1:500 dilution (Active Motif, cat# 39133). Antibody-chromatin complexes were captured with recombinant protein G–conjugated agarose beads, washed with Low Salt Wash Buffer, High Salt Wash Buffer, LiCl Wash Buffer and TE buffer with 50 mM NaCl and eluted. Input sample was prepared by the same approach without immunoprecipitation. After reversal of crosslinking, RNase (Roche, cat# 10109169001) and Proteinase K (Invitrogen, cat# 25530-049) treatments were performed, and DNA was purified with QIAquick PCR Purification Kit (QIAGEN, cat# 28106). Libraries were then prepared using the NEBNext Ultra II DNA library Prep Kit for Illumina (NEB, cat# E7645S). Two replicates were performed for each condition. Indexed libraries were validated for quality and size distribution using a TapeStation 4200 (Agilent). Libraries were paired-end sequenced (2 × 50 bp) on Illumina NextSeq instrument.Rec-1 ibrutinib assay for transposase accessible chromatin (ATAC-seq)ATAC-seq assay was performed as previously described14. Briefly, 60,000 parental or ibrutinib-resistant Rec-1 cells were washed with 50 μL of ice cold 1 × PBS (Corning, cat# 21031CV), followed by 2 min treatment with 50 μL lysis buffer containing 10 mM Tris-HCl, pH 7.4, 3 mM MgCl2, 10 mM NaCl, and 0.1% NP-40 (Igepal CA-630). After pelleting nuclei, nuclei were resuspended in 50 μL of transposition buffer consisting of 25 μL of 2× TD buffer, 22.5 μL of molecular biology grade water, and 2.5 μL Tn5 transposase (Illumina, cat# FC-121-1030) to tag the accessible chromatin for 45 min at 37 °C. Tagmented DNA was purified with MinElute Reaction Cleanup Kit (QIAGEN, cat# 28204) and amplified with 5 cycles. Additional number of PCR cycles was determined from the side reaction and ranged from 8–9 total cycles of PCR. Two replicates were performed for each condition. Libraries were purified using QiaQuick PCR Purification Kit (QIAGEN, cat# 28106) and eluted in 20 μL EB buffer. Indexed libraries were assessed for nucleosome patterning on the TapeStation 4200 (Agilent) and paired-end sequenced (50 bp+50 bp) on HiSeq (Illumina).Rec-1 ibrutinib RNA sequencing (RNA-seq)Strand-specific RNA-seq was performed on parental and ibrutinib-resistant Rec-1 cells using SMARTer Stranded Total RNA Sample Prep Kit (Takara, cat# 634873) per the manufacturer’s instructions. Briefly, 24 h post treatment with DMSO and 100 nM ibrutinib followed by IgM stimulation for 5 min, parental and resistant cells were lysed with 350 μL RLT Plus buffer (QIAGEN) supplemented with 2-mercaptoethanol (Sigma, cat# M6250) and total RNA was isolated using the RNeasy Plus Micro Kit (QIAGEN, cat# 74034). RNA integrity numbers were determined using TapeStation 2200 (Agilent). 800 ng of total RNA was used, and libraries were prepared using the SMARTer Standard Total RNA Sample Prep Kit – HI Mammalian. Libraries were paired-end sequenced (38 + 38 bp) on a HiSeq. Three biological replicates were performed for each condition.Quantification and statistical analysisDefinition of regulatory elementsFor the purposes of this study, regions within 2.5 Kb of transcription start sites (TSSes) of expressed genes were considered to be promoters. H3K27ac ChIP-seq peaks that did not overlap with promoter regions were considered to be enhancers. For identification of hubs from Hi-C data, only enhancers and promoters that overlapped with ATAC-seq peaks were considered valid loop anchors.Gene annotationAll genes presented in this study were annotated in accordance with the Human Genome version 19 (hg19) GRCh37.75 assembly. 2,828,317 Ensembl transcripts from this assembly were downloaded, and the longest transcript from each Ensembl gene id (ENSG) was used to generate a list of transcription start sites (for promoter annotation) and gene position. 57,209 gene annotations were used for RNA-seq analysis after excluding rRNA and chrM genes.Rec-1 ibrutinib H3K27ac ChIP-seq and ATAC-seq data analysisRec-1 ibrutinib-sensitive and -resistant H3K27ac ChIP-seq reads were trimmed with Trim Galore (version 0.6.6) using parameters -q 5 –phred33 –fastqc –gzip –stringency 5 -e 0.1 –length 20 –paired. Ensembl GRCh37.75 primary assembly, which included chromosome 1-22, chrX, chrY, chrM and contigs was used for alignment of trimmed reads with BWA (version 0.7.13)65. BWA was run using the command bwa aln -q 5 -l 32 -k 2 -t 12 and paired-end reads were grouped using bwa sampe -P -o 1000000 -r. After grouping, reads which were considered duplicates from Picard (version 2.1.0) as well as reads that were matched with ENCODE blacklist regions or contigs were filtered out so that valid reads were kept and used for all subsequent analysis. This procedure was repeated for alignment of Rec-1 ibrutinib-sensitive and -resistant ATAC-seq reads. See the following section for ChIP-seq and ATAC-seq peak-calling protocols.Rec-1 ibrutinib Hi-C data analysisHiC-Pro (version v2.8.1)66 was used to process Rec-1 ibrutinib-sensitive and -resistant Hi-C raw reads for each sample using default parameters except LIGATION_SITE and GENOME_FRAGMENT were provided by Arima. Putative interactions identified from HiC-Pro were used as the basis for hub calling. Similar to Rec-1, CUTLL1 Hi-C data was processed using HiC-Pro (version v2.8.1) with default parameters.H3K27ac ChIP-seq and ATAC-seq peak callingPeak calling of all H3K27ac ChIP-seq and ATAC-seq data was performed similar to as previously described14. Briefly, fragment length of H3K27ac ChIP-seq reads was estimated with HOMER (version 4.8)67, and MACS was used to identify peaks with the parameters -q 1E-5 –shiftsize = 0.5∗fragment_length –format = BAM –bw = 300 –keep-dup = 1. After peak calling, H3K27ac signal over peaks was quantified and normalized to reads per kilobase per million mapped reads (RPKM). ATAC-seq reads were processed similarly to H3K27ac reads except MACS peak calling was performed with parameters -p 1E-5 –nomodel –nolambda –format = BAM –bw = 300 –keep-dup = 1. After peaks were called for each condition, BED files containing the union of peaks across relevant conditions (e.g. drug-sensitive, drug-treated, and drug-resistant) were created using the bedtools merge command. Note that for CUTLL1 H3K27ac ChIP-seq and ATAC-seq data, a slightly less stringent q-value cutoff of 1E-4 was used for peak calling to obtain a comparative number of peaks between cell lines. Exact steps for each workflow can be found at https://github.com/faryabiLab/dockerize-workflows/tree/master/workflows.RNA-seq analysisBulk total transcript RNA-seq data from T-ALL DND41 and CUTLL1, MCL Rec-1 (including ibrutinib-sensitive and -resistant), and TNBC MB157 and HCC1599 cells was used to annotate expressed genes and analyze RNA enrichment over hub intervals. Alignment to Ensembl GRCh37.75 primary assembly and normalization with RPKM was performed as previously described22. RNA-seq for DND41, Rec-1, MB157, and HCC1599 was originally performed on at least three biological replicates for each cell line, and expressed genes were determined using a cutoff of RPKM > 1 in at least two of three replicates. For CUTLL1, a single RNA-seq experiment was analyzed with expressed genes determined using the same cutoff of RPKM > 1.Topologically associating domain (TAD) boundary and differential TAD boundary identificationTAD boundaries were identified in both DND41 and Rec-1 drug-sensitive and drug-resistant cells from SMC1 HiChIP and Hi-C data as previously described22. Briefly, the cooltools insulation function (https://cooltools.readthedocs.io/en/latest/) was applied to a .cool file converted from HiC-Pro66 valid pairs output using the hicpro2higlass.sh function followed by the hic2cool command-line tool with default options. Using a window size of 100 Kb and a bin size of 5 Kb, insulation scores were calculated. Valid, adjacent boundaries were defined as those with total reads between them exceeding the 75th percentile. Differential Hi-C TAD boundaries between drug-sensitive and drug-resistant conditions were categorized as boundaries with absolute log2 fold change in insulation score greater than 0.75.Differential compartment identificationHi-C data was used to detect differential compartments between drug-sensitive and drug-resistant DND41 and Rec-1. Compartments were initially identified in each condition from the first principle component (PC1) of Hi-C data, which was calculated with the Homer v4.1167 runHiCpca.pl method using H3K27ac ChIP-seq data to avoid arbitrary sign assignment. Next, the getHiCcorrDiff command was used to calculate each compartment’s correlation difference between drug-sensitive and drug-resistant conditions. To identify differential compartments switching from A to B in the drug-resistant state, the findHiCCompartments command was used with a correlation difference threshold (-corr) and a PC1 threshold of at least 0.4 for DND41 and 0.65 for Rec-1. For detection of compartments switching from B to A in the drug-resistant state, the same findHiCCompartments command was used with the addition of the -opp flag.Enhancer-promoter hub identification and analysisSpatial interaction pre-processing overviewEnhancer-promoter hubs in T-ALL DND41, T-ALL CUTLL1, MCL Rec-1, TNBC MB157, and TNBC HCC1599 were identified using Hi-C or SMC1 HiChIP contact frequency data annotated for enhancers and promoters. Specifically, SMC1 HiChIP data was used for detection of hubs in DND41, HCC1599, MB157, and Rec-1 cells whereas Hi-C data was used for detection of enhancer-promoter hubs in CUTLL1, GSI-sensitive and -resistant DND41 cells and ibrutinib-sensitive and -resistant Rec-1 cells. A high-level description of the data filtering process to create the inputs for our hub identification program is as follows: in order to generate data tables of valid enhancer-enhancer (EE), enhancer-promoter (EP), or promoter-promoter (PP) interactions in each condition, Hi-C/HiChIP loop anchors were first filtered by intersecting them with BED files of H3K27ac ChIP-seq peaks (i.e. enhancers) and active transcription start sites (TSS) from RNA-seq (i.e. promoters). For this analysis, peaks were defined as the 5 Kb or 10 Kb centered around the summit of the original protein enrichment signal. Anchors that were annotated as active TSSes with H3K27ac peaks were considered to be promoters. Once spatial interactions between putative promoter and/or enhancer elements were isolated, they were assigned normalized contact frequency scores and further filtered (using a user-defined score cutoff for Hi-C) to create the final data table for input into the hub-calling algorithm. This process of interaction filtering was slightly different depending upon the source assay of contact frequency data (i.e. Hi-C vs. SMC1 HiChIP) and upon other genomic data (e.g. ATAC-seq) that was readily available for analysis.SMC1 HiChIP spatial interaction pre-processingFor SMC1 HiChIP data, valid EE/EP/PP input interactions into the hub-calling pipeline were detected from raw reads by filtering significant interactions identified from FitHiChIP v11.068 to only include those between enhancers and active promoters. Briefly, SMC1 HiChIP reads were processed with Hi-C Pro version v2.5.066. High-confidence loop anchors were identified from FitHiChIP using Hi-C Pro’s allValidPairs file as input, a significance cutoff of q = 0.05 or p = 0.05 (see next paragraph), coverage bias regression for normalization, an interaction type of all to all / IntType=4, and default values for the remaining options. Anchors of significant FitHiChIP interactions were divided into two BED files and separately intersected with a BED file containing the union of H3K27ac ChIP-seq peaks and actively transcribed TSSes, where active genes were defined as those with >1 RNA-seq RPKM in at least two thirds of the replicates across the conditions of interest. After valid promoter and enhancer anchors were identified, the original list of FitHiChIP interactions was filtered to keep only interactions between valid enhancer and/or promoter elements for input to the hub-calling pipeline.For filtering interactions from Rec-1, MB157, and HCC1599 with FitHiChIP, a significance threshold of q < 0.05 was used to yield over 100,000 significant interactions in each cell type. However, since FitHiChIP identified less than 25,000 significant interactions from DND41 SMC1 HiChIP data with a significance cutoff of q < 0.05, we decided that a significance threshold of p < 0.05 was more appropriate for filtering interactions in this cell line. To ensure stringent filtering of DND41 SMC1 HiChIP data given this lower threshold, FitHiChIP significant interactions (p < 0.05) for DND41 were subject to further filtering. Specifically, BED files containing anchors of significant FitHiChIP interactions, ATAC-seq peaks, SMC1 ChIP-seq peaks, and the union of H3K27ac ChIP-seq peaks and actively transcribed TSSes were intersected to create a list of high-confidence, accessible regulatory element anchors. Only significant FitHiChIP interactions between two of these anchors were considered valid and used for DND41 hub calling. Since enhancers were not filtered by accessibility in Rec-1, MB157, and HCC1599 cells, they were defined more stringently as H3K27ac ChIP-seq peaks of > 500 bp.In-situ Hi-C spatial interaction pre-processingFor Hi-C data, valid EE/EP/PP interactions were filtered from Hi-C reads by quantifying the number of contacts between accessible enhancers and accessible, actively expressed promoters. First, a data table of all possible cis interactions between accessible enhancers and accessible, active promoters was generated for the conditions of interest. BED files containing active, accessible promoters were created by intersecting a BED file of actively transcribed TSSes, where active genes were defined as those with >1 RNA-seq RPKM in 2 out of 3 replicates for either drug-sensitive or drug-resistant cells, with a BED file containing the union of ATAC-seq peaks from drug-sensitive and drug-resistant conditions. BED files of accessible enhancers were created by intersecting a BED file containing the union of H3K27ac ChIP-seq peaks from drug-sensitive and drug-resistant cells with a BED file containing the union of ATAC-seq peaks from drug-sensitive and drug-resistant cells. A matrix containing all possible combinations of cis linkages (maximum length of 2 Mb) between these accessible enhancer and promoter elements was then constructed and used to normalize Hi-C reads for a given condition. Specifically, Hi-C reads were processed with Hi-C Pro66, rearranged into BEDPE file format, and intersected with the aforementioned BED file of ATAC-seq union peaks such that only interactions between pairs of accessible anchors were kept. These contacts were then mapped onto the matrix containing all possible enhancer and promoter interactions, and the number of ATAC-filtered Hi-C reads overlapping with each possible EE/EP/PP interaction was counted. Next, these linkage counts were summed and normalized to contacts per 100 million, and interactions with a normalized contact frequency of >3 (for DND41 and Rec-1) or >5 (for CUTLL1) were considered valid interactions for hub calling.Hub-calling pipelineWe adapted an implementation of matrix-free divisive hierarchical spectral clustering (https://github.com/faryabiLab/hierarchical-spectral-clustering), which was originally developed for single cell RNA-seq21, to identify enhancer-promoter hubs from the enhancer-promoter connectivity graph. We reasoned that this approach can overcome limitations associated with heuristic global optimization-based community detection methods such as Louvain-based algorithms to efficiently identify enhancer-promoter hubs, which we define as groups of densely connected enhancers and promoters with high intra-group and sparse inter-group interactions. Briefly, our matrix-free divisive hierarchical spectral clustering uses all of the information embedded in the enhancer-promoter connectivity graph at each partitioning to create a tree of enhancer and/or promoter clusters by recursively bi-partitioning the input spatial interactions between genomic elements. Time and memory efficiency is achieved by replacing factorization of the normalized Laplacian matrix at each iteration with direct calculation of the second left singular vector corresponding to the second largest singular value of a new matrix derived from the sparse connectivity matrix21. To enable simultaneous detection of large and small interacting enhancer/promoter clusters and to avoid creating arbitrary small clusters, our approach uses Newman-Girvan modularity69 as a stopping criterion for recursive cluster bi-partitioning. Using modularity as a stopping criterion instead of an optimization parameter also bypasses limitations associated with heuristic global optimization-based clustering such as Louvain-based algorithms70,71.To this end, our approach produces a hierarchy of nested enhancer-promoter clusters where each inner node is a cluster at a given scale and each leaf node is the finest-grain cluster such that any additional partitioning would be as good as randomly splitting connected enhancers/promoters. Importantly, this divisive hierarchical spectral clustering approach maintains relationships among clusters at various levels. As a result, the nested cluster structure can be used for clustering tree pruning, which provides flexibility in analysis and interpretation of enhancer-promoter hubs when hub topologies are a priori unclear.To prevent the possibility that a functionally relevant cluster is divided into multiple clusters for downstream analysis, we used parameters cluster-tree -c dense -n 2 -s for all the analysis, and defined hubs as self-contained networks of connected regulatory elements. We next categorized hubs by the largest contiguous genomic interval covered by their enhancer/promoter anchors, and computed their within-hub spatial interaction counts and enhancer/promoter counts. We also annotated hubs with the expressed genes contained within their contiguous intervals (Supplementary Data 1). To ensure that we identified hubs rather than just a few interacting elements, we removed clusters containing fewer than 6 spatial interactions among less than 4 enhancer/promoters for downstream analysis. For each cell type, distributions of hubs were plotted on the basis of their interaction count and enhancer/promoter count. The interaction count cutoff for hyperinteracting hubs was determined by calculating the point of tangency on the elbow of the curve comparing hub rank to interaction count as previously described14.Hub vs. random loci RNA enrichment analysisScatterplots of median RNA enrichment over hub loci and random representative hub loci were created for each cancer type to analyze transcriptional activity of hubs. Lists of random representative hub loci were generated by iteratively calling the bedtools shuffle command with parameters -chrom -noOverlapping -g hg19.genome on input BED files containing observed hubs. This bedtools command generates a list of random loci that contains an identical number of loci as the input list of observed hubs, where each random locus on the output list has identical length and chromosomal localization as each observed hub in the input list. As noted in the bedtools documentation, the process of generating random representative loci using this command could very infrequently skip loci violating the noOverlapping criteria. Median RNA enrichment was then quantified over each list of random representative loci to create a single data point on the output scatterplot, and this process was repeated several thousand times (5000 times for all total hub analyzes and 10,000 times for all hyperinteracting hub analyzes) to enhance the accuracy of the simulated comparison. Additionally, comparisons of transcription levels between regular and hyperinteracting hubs were performed without genomic span normalization.SMC1 HiChIP vs. Hi-C hub similarity Venn diagramsIn order to compare DND41 and Rec-1 hubs identified from Hi-C with DND41 and Rec-1 hubs identified from SMC1 HiChIP (respectively), Venn diagrams were used to depict the similarity of hubs on the basis of their genomic overlap. These diagrams indicate the percentage of distinct, non-overlapping HiChIP hubs (relative to the total number of HiChIP hubs) and Hi-C hubs (relative to the total number of Hi-C hubs) as well as the percentages of overlapping HiChIP and Hi-C hubs reported both in terms of the total number of HiChIP hubs and the total number of Hi-C hubs. Note that the sizes of the Venn diagram circles are proportional to the number of hubs between those conditions.All hub and hyperinteracting hub similarity matricesSimilarity matrices of all hubs and hyperinteracting hubs identified from Rec-1, HCC1599, MB157, and DND41 SMC1 HiChIP were created by comparing the genomic positions of hubs between each pair of cancer types. For example, to compare SMC1 HiChIP hyperinteracting hubs between DND41 and Rec-1 cells, the number of non-overlapping (where overlap is defined as ≥ 1 bp) hyperinteracting hubs in each cancer was calculated by using bedtools intersect –v on two input BED files of hyperinteracting hubs in each cancer type. The similarity score for the comparison was then calculated using the following formula: similarity = 100 – (average percentage of distinct hubs). Since no more than 5% of within-condition hubs or hyperinteracting hubs for SMC1 HiChIP were overlapping, this methodology of similarity analysis was considered to be minimally biased.The ten shared (i.e. overlapping) hyperinteracting hubs across T-ALL DND41, MCL Rec-1, and TNBC MB157 and HCC1599 were identified by finding the intersection of the four SMC1 HiChIP hyperinteracting hub lists using bedtools intersect, and manually filtering the resulting overlapping intervals based upon their connectivity structure and size. Specifically, overlapping regions were only considered to be conserved hyperinteracting hubs if there were valid SMC1 HiChIP contacts over the overlapping region in all four cell lines and if the region spanned more than ~75 Kb (i.e. size of the smallest hyperinteracting hub across the four cell types).TAD and hub boundary similarity analysisHub boundaries were defined as the 5 Kb upstream and downstream from the start position and end position of hubs (respectively) resulting in 2 h intervals, where h is the number of hubs identified. Valid TAD boundaries from DND41 and Rec-1 SMC1 HiChIP or Hi-C data were identified as described in the previous section and intersected with the aforementioned HiChIP or Hi-C hub boundaries (respectively) to produce Venn diagrams depicting hub/TAD boundary overlap. Additionally, pile-up plots of Hi-C hub and Hi-C TAD boundaries were separately generated using coolpup.py with options –pad 250000 –local and graphed with plotpup.py72. Note that few boundaries in each condition were automatically excluded from pile-up plots by coolpup.py.Hyperinteracting vs. regular hub overlap with super-enhancersSuper-enhancers were identified in each cell line by applying previously described methods73 to H3K27ac ChIP-seq peaks for CUTLL1, DND41, Rec-1, MB157, and HCC1599 in R. Super-enhancers were intersected with hub genomic locations and stratified by type (hyperinteracting or regular). Super-enhancer and hub similarity was determined by calculating the proportion of regular and hyperinteracting hubs overlapping with 0, 1, or 2+ super-enhancers.Hyperinteracting vs. regular hub genomic length violin plotsGenomic lengths of hubs were calculated as the difference between the start coordinate of the farthest upstream regulatory element in the hub and the stop coordinate of the farthest downstream regulatory element in the hub for DND41, Rec-1, MB157, and HCC1599 hubs identified from SMC1 HiChIP data. These genomic distances were stratified by hub type (i.e. hyperinteracting vs. regular) and plotted in R using ggplot2’s geom_violin function.Hyperinteracting vs. regular hub highly expressed gene plotsHighly expressed genes were considered to be genes within the top 2.5% quantile of gene expression, as defined from RNA RPKM in DND41, Rec-1, MB157, and HCC1599. The genomic coordinates of these highly expressed genes were intersected with hyperinteracting and regular hubs, and the number of highly expressed genes overlapping each hyperinteracting or regular hub was plotted in R using ggplot2’s geom_violin function.Identification of structural variants in MCLStructural variations in ibrutinib-sensitive and ibrutinib-resistant Rec-1 were identified from ICE-balanced Hi-C data using the predicts command of EagleC v0.1.9 with default options74. Hi-C matrices were prepared as .cool files with resolutions 10 Kb, 50 Kb, and 100 Kb, as required by the EagleC pipeline.Differential Enhancer-Promoter Hub Identification and AnalysisDifferential hub-calling pipelineDifferential enhancer-promoter hubs were identified from the hub-calling output for two different conditions (i.e. two BED-like files containing hub genomic intervals and interaction counts). In order to compare hubs in each condition on the basis of interaction count, the bedtools merge command was used to find the union of hubs across the two conditions. For union hubs that were created from ≥3 overlapping precursor hubs, new interaction counts were calculated by summing the number of interactions of the individual precursor hubs in each condition. The union hub lists also contained de novo lost/gained non-overlapping hubs. Finally, the log2 fold change in interaction count over union hubs was calculated, where log2 fold change = log2(case interaction count) − log2(control interaction count), and union hubs were annotated by the expressed genes contained within their contiguous genomic intervals.Differential hub analysisFor each set of union hubs, a volcano-like plot comparing the drug-sensitive interaction counts and log2 fold change in interaction count of each union hub was graphed. These plots were used to illustrate genome-wide changes in hub interactivity, and to highlight large differential hubs of interest for downstream analysis. Hubs were considered to become more interacting in the drug-resistant state either if log2FC(interaction count) ≥1 or if they increased from <6 interactions to ≥6 in the drug-resistant condition (i.e. de novo gained hubs). Similarly, hubs were considered to become less interacting in the drug-resistant state either if log2FC(interaction count) ≤ −1 or if they decreased from ≥6 interactions to <6 interactions in the drug-resistant condition (i.e. lost hubs). In order to best visualize all of the hubs in the volcano-like plot for DND41 GSI-sensitive vs. GSI-resistant hubs, the log2 fold change of all hubs in the plot was adjusted using a pseudocount such that log2 fold change = log2[(GSI-resistant interaction count + 1)/(GSI-sensitive interaction count + 1)]. This pseudocount adjustment was not necessary to create the volcano-like plot of differential hubs in ibrutinib-sensitive vs. ibrutinib-resistant Rec-1. Instead, for visual clarity, the two outlier hubs with >1500 interaction counts in ibrutinib-sensitive and -resistant cells were excluded from the plot. RNA enrichment, H3K27ac enrichment, and ATAC-seq enrichment were then examined over these differential hubs. Gene ontology and pathway enrichment analysis over the expressed genes contained within differential hubs was also performed.Differential compartment and differential hub similarity Venn diagramsDifferential A to B and B to A compartments were identified as described in the previous section and intersected with differential hubs that lost or gained interactivity in the drug-resistant condition, respectively. For similarity analysis, the overlap between the full genomic length of differential hubs and differential compartments was calculated and plotted with Venn diagrams.Differential TAD and differential hub boundary similarity Venn diagramsDifferential TAD boundaries losing insulation or gaining insulation were identified as described in the previous section and intersected with boundaries of differential hubs that lost or gained interactivity in the drug-resistant condition, respectively. For similarity analysis, the overlap between the differential hub and differential TAD boundaries was calculated and plotted with Venn diagrams.Genomic feature intersection/similarity analyzesIntersection analysis to determine the linear genomic overlap between distinct populations of enhancer-promoter hubs and between hubs and other genomic features (e.g. TADs, compartments, super-enhancers, etc.) was performed using either the intersect command of bedtools75 or the findOverlaps command of R library GenomicRanges. Unless otherwise noted, for hub populations and/or genomic features that overlapped such that one hub in list A intersected more than one hub/feature in list B (or vice versa), bedtools was used to determine the degree of overlap between the two lists. For genomic features and hubs that overlapped such that at most one hub in list A intersected one feature in list B (and vice versa), GenomicRanges was used to determine the degree of overlap between the two lists.All Venn diagrams illustrating the output of feature intersection analysis were created in R using the eulerr package such that each segment’s size was proportional to the number of features that it represented. For Venn diagrams illustrating intersection analyzes that had at least one feature in one list intersecting more than one feature in the other, the sizes of the diagram segments were set such that they reflected the number of exclusive features in each set and the average degree of overlap, where average overlap = (total number features – total number exclusive features)*0.5. All Venn diagram segments were manually labeled with the actual number and percentage of exclusive and overlapping features in each list.Gene ontology (GO) and pathway enrichment analysisGO enrichment analysis was conducted using the enrichment function of Metascape76 or the PANTHER enrichment function of the Gene Ontology Knowledgebase77,78,79. For hyperinteracting hubs in each condition, the list of expressed genes contained within the intervals of hyperinteracting hubs was used as input for analysis of GO molecular function, biological process, and/or cellular component annotation enrichment. Metascape was used for GO enrichment analysis of SMC1 HiChIP hyperinteracting hubs in Rec-1, HCC1599, and MB157 while PANTHER was used for GO enrichment analysis of SMC1 HiChIP hyperinteracting hubs in DND41 and Hi-C hyperinteracting hubs in CUTLL1 because the number of expressed genes in T-ALL hyperinteracting hubs exceeded the gene limit for Metascape. Metascape was also used for GO enrichment analysis of DND41 and Rec-1 hyperinteracting hubs identified from Hi-C data. For GO enrichment analysis of differential DND41 and Rec-1 hubs, expressed genes in drug-resistant cells that were located within hubs that gained interactivity or expressed genes in drug-sensitive cells that were located within hubs that lost interactivity were used as inputs to Metascape. In addition to GO analysis, pathway enrichment analysis was also performed on these expressed genes within differential hubs in Metascape using the Hallmark Gene Set, Reactome Gene Set, Biocarta Gene Set, Canonical Pathway, WikiPathway, and KEGG Pathway annotations.For discussion of Metascape GO/pathway enrichment output in this paper, significant ‘summary’ annotations (p < 0.01) were presented (as opposed to significant ‘non-summary’ annotations) given that the ‘summary’ annotations represented overarching groups of GO terms and therefore limited redundancy in annotation reporting. Reported p-values from Metascape analyzes were computed using hypergeometric tests. For all Metascape hyperinteracting hub GO plots (Figs. 3g, 4g, and S4m), the top 10 most significantly overrepresented (i.e. enriched) ‘summary’ GO molecular function annotations from expressed genes in SMC1 HiChIP hyperinteracting hubs were ranked by p-value and plotted. For presentation of PANTHER GO enrichment output, we plotted the top 10 most significantly overrepresented (i.e. enriched) GO molecular function annotations ranked by p value, with a comprehensive list of significant GO annotations (including both underrepresented and overrepresented GO terms) contained in Supplementary Data 2. We also excluded the significantly underrepresented ‘molecular function’ GO term from Figs. 2g and S4h because this annotation could lead to confusion between underrepresented and overrepresented terms. Also note that the ‘binding’ GO term in these figures is not immediately relevant for direct gene annotation given its breadth as a parent GO annotation. Reported p-values from PANTHER analyzes were computed using Fisher’s exact test.DNA FISH analysisDNA FISH image analysis was performed similar to a previously described protocol22. Briefly, DAPI signal was used for manual nuclei segmentation, with 1548 GSI-sensitive allelic interactions and 533 GSI-resistant allelic interactions analyzed for the IKZF2 locus and 1319 GSI-sensitive allelic interactions and 640 GSI-resistant allelic interactions analyzed for the LEF1 locus. For each manually segmented nucleus, spots indicative of probe signal were manually thresholded. Centroid positions for each spot in xy were found by fitting a Gaussian. X, Y, and Z coordinates were extracted, and pairwise Euclidean distances between nearest neighbors were calculated. Representative FISH cell images were taken using a Leica SP8 63× oil immersion objective and photo brightness, contrast, and smoothing was adjusted in ImageJ to facilitate probe visualization.Genomic data visualizationNormalized reads from ATAC-seq, H3K27ac ChIP-seq, H3K27me3 CUT&RUN, RNA-seq, SMC1 HiChIP, and/or Hi-C over selected hubs were visualized using the R package Sushi (version 1.18.0)80. Specifically, bedgraph (.bg) files of reads normalized using reads per million from ATAC-seq, H3K27ac ChIP-seq, H3K27me3 CUT&RUN, and RNA-seq data were created with command genomecov, or by converting RPM-normalized BigWig (.bw) files into.bg format using the UCSC tools (version 329) BigWigToBedGraph81. These .bg files were visualized using the Sushi command plotBedgraph. For Hi-C data, normalized EE/EP/PP interactions exceeding the contact frequency cutoff of 3 (for DND41 and Rec-1) or 5 (for CUTLL1) were plotted using the Sushi command plotBedpe with intensity equivalent to normalized contact frequency. For SMC1 HiChIP data, EE/EP/PP FitHiChIP significant interactions (p < 0.05 for DND41 and q < 0.05 for Rec-1, MB157, and HCC1599) were plotted with intensity equivalent to –log10(X), where X is the minimum FitHiChIP p-value (for DND41) or q-value (for Rec-1, MB157, and HCC1599) for the interaction. Z-scored contact maps were created to visualize Hi-C interaction heatmaps over selected hubs. These plots were generated by first applying z-score transformation to 25Kb resolution VC_SQRT normalized Hi-C contact maps for each chromosome as previously described82 using the R command loessFit with parameters iter = 100, span = 0.02. The Sushi command plotHiC was used to plot these transformed maps.Data presentation & statistical analysisAll analysis and quantification of ATAC-seq and ChIP-seq output was performed using peak-called data with normalized RPKM measurements. Median RNA enrichment over observed spatial hubs and hyperinteracting hubs was compared to median RNA enrichment over random representative loci using empirical p-values conservatively calculated as (n + 1)/(r + 1), where r is the total number of replicates of random representative loci lists (i.e. 5000 or 10,000) and n is the number of replicates in which theoretical median RNA enrichment exceeded observed median RNA enrichment over hubs/hyperinteracting hubs. Two-tailed Wilcoxon Rank Sum tests were used for comparisons of regular hubs and hyperinteracting hubs and for differential hub RNA-seq/ATAC-seq/H3K27ac ChIP-seq enrichment analyzes with the wilcox.test function in R (version 4.2.1), and plot axes were abridged for the purpose of visualization. Comparisons of RNA-seq differential gene enrichment between DND41/Rec-1 drug-sensitive/drug-resistant cells for selected genes were evaluated with two-sided Student’s t-tests (n = 3). Finally, statistical values for the comparison of probe distances in cumulative distribution plots were calculated using a Kolmogorov–Smirnov test. All p values less than 1E-5 were rounded up one decimal place (e.g. p = 8E-7 becomes p = 1E-6) for reporting in the text and figures. Unless otherwise noted, all box and whisker plots are shown without outliers to facilitate data visualization.Reporting summaryFurther information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Hot Topics

Related Articles