Decoding the genomic landscape of chromatin-associated biomolecular condensates

ChIP-seq data collection and processingChIP-seq data of CAPs were collected from Cistrome Data Browser50 and filtrated using quality control procedures as described in the previous study51. In brief, only ChIP-seq data that satisfied at least four out of the five quality control metrics (sequence quality, mapping quality, library complexity, ChIP-enrichment, and signal-to-noise ratio) available in Cistrome Data Browser were kept. When multiple qualified ChIP-seq datasets were available for a given CAP in the same cell type, all qualified ChIP-seq data were sorted based on quality control metrics, and the highest-ranked dataset was selected.We downloaded ChIP-seq peak files (in BED format) and signal track files (in bigWig format) from Cistrome Data Brower. Although Cistrome Data Browser stored narrow peaks called by MACS252 for all CAPs, peak window sizes of distinct CAPs could differ significantly. Therefore, to obtain accurate occupancy regions for each CAP, especially CAPs with broad peaks, we first called broad peaks from the signal track using “bdgbroadcall” module of MACS2 (v2.1.3) with default parameters and then merged adjacent peaks within 5 kb. For each CAP, if more than 1000 re-called peaks were wider than 5 kb, we replaced the original narrow peaks with re-called broad peaks as the accurate occupancy regions.Condensation-related annotation for proteinsHuman and mouse proteins with reported LLPS capacity were collected from four databases, DrLLPS6, LLPSDB5, PhaSepDB (two versions, v1 and v2)3 and PhaSePro4. DrLLPS collected all proteins that could potentially be involved in LLPS, including scaffolds, regulators and clients. However, we only regarded scaffolds as LLPS proteins since DrLLPS contains too many regulators and clients. To create an annotation of LLPS proteins, we merged all LLPS proteins from different sources. Notably, since the number of collected mouse LLPS proteins (61) was much lower than human LLPS proteins (437), we also considered mouse orthologs of human LLPS proteins as mouse LLPS proteins.Component proteins of MLOs in human and mouse were collected from DrLLPS and PhaSepDB (v1 and v2). Proteins that were assigned to the same MLO in different sources were merged to form a comprehensive list of component proteins for that MLO. Similar to LLPS proteins, mouse orthologs of human proteins assigned to the same MLO was regarded as component proteins of that MLO in mouse.Pairwise protein-protein interactions were collected from three databases, BioGRID53, MINT54 and IntAct55, only physical associations were kept.Intrinsically disordered regions of proteins were predicted by MobiDB-lite (v1.0)56. This optimized method uses eight different predictors to derive a consensus, which is then filtered for spurious short predictions in a second step. For each protein, if more than 15.3% of its regions were predicted to be disordered by MobiDB-lite, the protein would be regarded as proteins with intrinsically disordered regions. The threshold of 15.3% corresponds to the 20th percentile of disordered region fractions of known human LLPS proteins.RNA-binding proteins were predicted by TriPepSVM (v1.0)57, a method to perform de novo prediction based on short amino acid motifs, with parameters “-posW 1.8 -negW 0.2 -thr 0.28”.Genome-wide RNA-binding strengthWe used genome-wide signals of R-ChIP data, an in vivo R-loop profiling approach using catalytically dead RNase H158, to quantify genome-wide RNA-binding strength in K562 cells. Raw sequencing reads from GSE9707258 were first aligned to human genome build via default –local mode of Bowtie2 (v2.3.5.1)59. Low mapping quality reads (mapping quality <30) and duplicates were discarded. Then signal tracks were generated using the “genomecov” command in Bedtools software (v2.28.0), and normalized to reads per million mapped reads (RPM).Motif scanMotif scans were performed using FIMO (v5.0.5)60 against the JASPAR core 2020 vertebrates database61 with the following parameters “–max-stored-scores 1000000”. Motifs with p-values 1 \(\times\) 10−5 were used for the following analysis.CondSigDetector workflowThe framework consists of three steps, data processing, co-occupancy signature identification and condensation potential filtration.In the first step, CondSigDetector first defines the mouse (mm10) or human (hg38) genomic regions as \({B}_{{mm}10}\) or \({B}_{{hg}38}\) as a sequence of 1 kb consecutive bins \(B=\left\{{b}_{1},{b}_{2},\cdots,{b}_{n}\right\}\), where each \({b}_{i}\) represents \(i\)-th 1 kb bin and \(n\) is the total number of 1 kb bins in the genome. And it defines the set of CAPs as \(C=\left\{{c}_{1},{c}_{2},\cdots,{c}_{m}\right\}\), where each \({c}_{j}\) represents \(j\)-th CAP and \(m\) is the total number of CAPs. Then it generates an occupancy matrix \(O\) with dimension \(n\times m\), each element \({O}_{i,j}\) of the matrix \(O\) represents the occupancy event of \(j\)-th CAP at \(i\)-th bin, which is defined as:$${O}_{i,j}=\left\{\begin{array}{c}1,\quad {ifCAP\; j\; has\; a\; peak\; within\; bin\; i}\\ 0,\quad \! {otherwise}\hfill \end{array}\right.$$
(1)
CondSigDetector further apply filters to the occupancy matrix \(O\) to refine the data. It excludes CAPs with fewer than 500 occupancy events to eliminate the effect of low-quality ChIP-seq data. And bins with too many occupancy events (occupied by more than 90% of CAPs) are removed to avoid sequencing bias. Additionally, bins in ENCODE Blacklist genomic regions are also removed.Identifying co-occupancy signatures from the entire occupancy matrix \(O\) is a complicated task and can result in the loss of low-frequency signatures in the local context. To address this issue, CondSigDetector iteratively segments the entire occupancy matrix into sub-matrices, each sub-matrix contains high-frequency co-occupancy events associated with the given CAP in each iteration. The segmentation of each iteration includes two aspects, identifying the segment of CAPs showing specific co-occupancy with the given CAP and identifying the segment of bins showing high co-occupancy events of these CAPs. During each segmentation iteration, CondSigDetector selects a focus CAP \({c}_{f}\), and identifies other CAPs \({C}_{{segment}}\subset C\) that are highly co-occupied with \({c}_{f}\). The identification of \({C}_{{segment}}\) is calculated as follows: for each CAP \({c}_{j}\in C\), CondSigDetector uses its occupancy events \({[{O}_{1,j},\, {O}_{2,j},…,\, {O}_{n,j}]}^{T}\) to classify occupancy events of \({c}_{f}\) \({[{O}_{1,f},\, {O}_{2,f},…,\, {O}_{n,f}]}^{T}\) and calculates a \({F}_{1}\) score as a measure of co-occupancy level with \({c}_{f}\), denoted as \({\beta }_{j}.\) The top \(q\) CAPs ranked by \({\beta }_{j}\) are kept as \({C}_{{segment}}\), where \(q=50\) by default. After that, CondSigDetector further selects bins \({B}_{{segment}}\subset B\) that are occupied frequently by CAPs \({C}_{{segment}}\). The selection of \({B}_{{segment}}\) is calculated as follows: for each bin \({b}_{i}\in B\), CondSigDetector calculates an occupancy score \({\delta }_{i}\) to evaluate the occupancy level of the CAPs \({C}_{{segment}}\) as:$${\delta }_{i}={\sum }_{j=1}^{q}{\gamma }_{j}{O}_{{ij}}$$
(2)
Where \({\gamma }_{j}\) denotes \(z\)-score-normalized \({\beta }_{j}\). Only \(p\) bins with \({\delta }_{i}\) > 0 are kept as \({B}_{{segment}}\). Sub-matrix in each iteration \({O}_{{segment}}\) is defined as:$${O}_{{segment}}={\left[{O}_{{ij}}\right]}_{{b}_{i}\in {B}_{{segment},}{c}_{j}\in {C}_{{segment}}}$$
(3)
In the second step, each sub-matrix \({O}_{{segment}}\) is classified into promoter and non-promoter contexts. Promoters are defined as upstream 3 kb to downstream 3 kb of transcription start sites. CondSigDetector builds a biterm topic model22 for each sub-matrix to learn specific collaborative pattern of CAPs, termed co-occupancy signatures. The topic model is a well-common used machine learning model for discovering latent topics in a particular set of documents, and it assumes that each document can be described as a mixture of a small number of topics, where a topic is a distribution of words. CondSigDetector lets \(D=\{{d}_{1},\, {d}_{2},\cdots,\, {d}_{p}\}\) be a collection of “documents”, where each document \({d}_{i}\) corresponds to occupancy events at \(i\)-th bin \({O}_{i}\), and lets \(W=\{{w}_{1},\, {w}_{2},\cdots,\, {w}_{q}\}\) be the “vocabulary”, where each word \({w}_{j}\) corresponds to a CAP \({c}_{j}\). Then the learned latent topics across “documents” can be regarded as specific collaborative pattern of CAPs across genome.The biterm topic model is a type of probabilistic topic model designed to find topics in collection of short texts, and the goal is to learn: (1) \({\theta }_{i,t}\), which is the probability of topic \(t\) occurring in document \({d}_{i}\); (2) \({\varPhi }_{t,j}\), which is the probability of word \({w}_{j}\) belonging to topic \(t\). We implemented the topic model in CondSigDetector using source code from the previous study22. Finally, the biterm topic model generates two probability distributions, matrix \({G}_{k\times q}\) representing occurrence probability of \(q\) words across \(k\) topics and matrix \({G}_{p\times k}\) representing occurrence probability of \(k\) topics across \(p\) documents.The topic number, \(k\), is a crucial parameter in topic modeling, as it affects the topic distribution. CondSigDetector empirically learns 2 ~ 10 topics for each context and then applies an automatic strategy to select the optimal topic number as described in the previous study62. The selection principle was based on the idea that the optimal topic number should distinguish between documents with different topics as much as possible. Hence an optimal topic number should match the following two criteria:

The occurrence probability of each topic in different documents should be as different as possible, which is measured by the specificity score (\({{SS}}_{k}\)) calculated for all topics under a certain topic number \(k\).$${{SS}}_{k}=\log \left(\frac{1}{k}\mathop {\sum }\limits_{j=1}^{k}\frac{{\sigma }_{j}}{{{\mu }_{j}}^{2}}\right)$$
(4)
where \({\sigma }_{j}\) and \({\mu }_{j}\) are the variance and mean, respectively, of the \(j\)-th column of \({G}_{p\times k}\). A higher specificity score indicates a better-selected topic number.

The fewer topics that occur in each document, the better. Such a measurement was defined as a purity score (\({{PS}}_{k}\)) for all topics under a certain topic number \(k\).$${{PS}}_{k}=\log \left(\frac{1}{p}\mathop {\sum }\limits_{i=1}^{p}{\sigma }_{i}\right)$$
(5)
where \({\sigma }_{i}\) is the variance of \(i\)-th row of \({G}_{p\times k}\).The larger the purity score, the better the selected topic number.
Finally, we defined the combination score (\({{CS}}_{k}\)), which is a weighted average of the specificity score and purity score. The combination score (\({{CS}}_{k}\)) is calculated as$${{CS}}_{k}=\alpha {{SS}}_{k}+\left(1-\alpha \right){{PS}}_{k}$$
(6)

where \(\alpha\) is calculated as$$\alpha=\frac{{{PS}}_{k}}{{{SS}}_{k}+{{PS}}_{k}}$$
(7)

We selected the optimal topic number \(k\) from 2 ~ 10 which have the highest combination score.
After the selection of optimal topic number \(k\), CondSigDetector interpretated learned topics to co-occupancy signatures. We determined component CAPs of each co-occupancy signature based on matrix \({G}_{k\times q}\) representing \(q\) CAPs’ occurrence probability in \(k\) co-occupancy signatures. For each signature \(t\), a CAP \({c}_{j}\) was considered as a component if \(Z({G}_{t,j}) > \lambda\), where \(Z\) is the \(z\)-score normalization function and \(\lambda\) is the threshold set to 1.3 by default. And a 1 kb bins \({b}_{i}\) was defined as signature-positive sites if it is occupied by more than 80% of components CAPs. By this way, we generated component CAPs \({C}_{{pos}}\) and signature-positive sites \({B}_{{pos}}\). Co-occupancy signatures with fewer than 3 components and fewer than 200 signature-positive sites are discarded.
In the third step, CondSigDetector screens out CondSigs from all co-occupancy signatures based on the condensation potential of each signature. To evaluate the condensation potential of each signature, we quantify associations between condensation-related features and signature presence at genome-wide bins for each signature. Intuitively, the higher condensation-related feature values of occupancy events at signature-positive bins, the higher condensation potential of the signature. We conduct Receiver Operating Characteristic (ROC) curve analysis to compare the distribution of condensation-related feature values at signature-positive versus signature-negative bins and measure the enrichment of condensation-related features at signature-positive bins. In ROC analysis, the positive set are positive bins for the given signature and the negative set are negative bins. Signature-positive bins have been defined in the above step, and signature-negative bins Bneg are defined using the following two criteria:

Comparability with the signature-positive bins. As the signature-positive bins are occupied by at least 80% of \({C}_{{pos}}\), so we required that signature-negative bins are occupied by at least \(h\) CAPs, with \(h=0.8\times |{C}_{{pos}}|\);

Differentiation from the signature-positive bins. We required that signature-negative bins are absence of co-occupancy events of \({C}_{{pos}}\); specifically, count of occupied \({C}_{{pos}}\) < 2.

For each signature, six condensation-related features are calculated according to co-occupancy events of \({C}_{{segment}}\) at signature-positive bins \({B}_{{pos}}\) and signature-negative bins \({B}_{{neg}}\):$${\bullet} \,\, F_{{LLPS}}=\frac{{Number\; ofoccupied\; CAPs\; with\; reported\; LLPS\; capacity}}{{Total\; number\; of\; occupied \,CAPs}}$$$${\bullet} \,\, F_{{MLO}}=\frac{{Number\; of\; occupied\; CAPs\; co}-{occuring \, in\; the\; same\; MLO}}{{Total\; number\; of\; occupied\; CAPs}}$$$${\bullet} \,\, F_{{IDR}}=\frac{{Number\; of\; occupied\; CAPs\; with\; predicted \, IDRs}}{{Total\; number\; of\; occupied\; CAPs}}$$$${\bullet} \,\, F_{{PPI}}=\frac{{Number\; of\; occupied\; CAP\; pairs\; with\; protein}-{protein\; interactions}}{{Total\; number\; of\; CAP\; pairs}}$$$${\bullet} \,\, F_{{RBP}}=\frac{{Number\; of\; occupied\; CAPs\; predicted\; as\; RBPs}}{{Total\; number\; of\; occupied\; CAPs}}$$$${\bullet} \,\, S_{{RBS}}={RNA\; binding\; strength\; at\; the\; bin}$$A signature is identified as a CondSig if at least three out of six condensation-related features exhibit a positive correlation with the presence of the signature, which is measured by the Area Under the ROC Curve (AUROC). The criteria for this identification are an AUROC greater than 0.6 for individual features and a mean AUROC greater than 0.65 for the top three features.In the final stage, CondSigs within the same cell type are pooled, and any duplicated CondSigs are discarded. The redundancy of two CondSigs is measured based on the extent of overlap among their top five components, ranked by their probability of occurrence within each CondSig. We computed the Jaccard index for each pair of CondSigs. If the Jaccard index suggests a high redundancy (a value greater than 0.25), we then compare the mean AUROC of the two CondSigs and discard the one with low mean AUROC.Comparison of BTM and HDPWe built HDP and BTM models on the entire occupancy matrix separately, and compared the quality of learned topics. HDP determines the topic number automatically while BTM asks for a given topic number. So we first built an HDP model and generated k topics, then we built a BTM model to generate topics with the given topic number k. The quality of each learned topic was evaluated by the coherence score of the top five words, a common quality evaluation metric in topic model22,63. HDP modeling was implemented by using a Python package “tomotopy”.Clustering of component CAPsWe performed a k-means clustering for component CAPs in mESC or K562 according to their potentials for self-assembly (PS-Self) or interaction with partners (PS-Part) to undergo phase separation. A recent study employed two machine-learning models, SaPS and PdPS model, to estimate proteins’ potentials and provided SaPS and PdPS ranking scores (ranging from 0 to 1) for the human and mouse proteome. We utilized the SaPS and PdPS ranking scores of component CAPs in mESC or K562 to carry out k-means clustering. In the clustering, the number of clusters was set as 4, and the initial cluster centroids were set as (0.8, 0.8), (0.8, 0.4), (0.4, 0.8), (0.4, 0.4), which corresponds to four clusters: “both Self and Part”, “Self-only”, “Part-only”, and “none”, respectively.Annotation for charged amino acid blocksWe calculated NCPR (net charge per residue) employing a 10-residue sliding window with a step size of 1. This calculation factored in both positively charged amino acids (R, K and H) and negatively charged amino acids (D and E). Windows with NCPR greater than 0.5 or less than -0.5 were defined as charged amino acid blocks, and overlapping blocks were merged.Identification of CondSig-positive and -negative peaks and domainsCondSigDetector identified CondSigs and assigned genome-wide 1 kb bins to each CondSig. In the chromatin properties and disruption effect evaluation of CondSigs, we defined CondSig-positive and -negative peaks/domains for each component CAP to examine its chromatin properties and disruption effect within these identified CondSigs. To determine CondSig-positive and -negative peaks for the given CAP, we classified its ChIP-seq peaks into CondSig-positive or -negative peaks based on whether they overlapped with sites where the CAP was identified as a component of any CondSigs. To determine CondSig-positive and -negative domains, we transformed peaks of the given CAP into domains by merging adjacent peaks not further than n kb. For component CAPs using narrow peaks as accurate occupancy regions in ChIP-seq data processing procedure as mentioned above, we set n = 5, and for component CAPs using broad peaks as accurate occupancy regions, we set n = 10. Then domains of each component CAP were classified into CondSig-positive domains and -negative domains based on overlapping with CondSig-positive peaks.To ensure a fairer comparison, two additional criteria were further applied to refine the identification of CondSig-positive and -negative peaks, ensuring they are matched in terms of chromatin accessibility or the number of co-occupied CAPs. Both refined CondSig-positive and -negative peaks were required to overlap with ATAC-seq peaks or have the occupancy events of more than 10 CAPs. These refined CondSig-positive and -negative peaks were then transformed into refined CondSig-positive and -negative domains in the same way.3D chromatin contact analysisPublic Micro-C data in mESC, ChIA-PET data against SMC1 in mESC, and ChIA-PET data against RNA Pol II in K562 were used in this study. Micro-C contact matrices from 2.6 billion reads were downloaded from GSE13027533, and boundary strength for 400 bp resolution calculated by Cooltools64 was used for the following analysis. SMC1 ChIA-PET data in mESC were downloaded from GSE5791136 and processed with ChIA-PET265. RNA Pol II ChIA-PET loops were directly downloaded from ENCSR880DSH37.Definition for target genes of CondSig-positive sitesIn defining target genes of CondSigs, positive sites of all identified CondSigs were merged into a total set of CondSig-positive sites. Genes whose promoter overlaps with the CondSig-positive sites, or which have long-range chromatin contacts with those sites, were defined as target genes. These long-range chromatin contacts were determined using ChIA-PET data from the corresponding cell type. In this study, SMC1 ChIA-PET data in mESC and RNA Pol II ChIA-PET data in K562 were used.To rule out the possibility that higher burst frequencies are attributable to the stronger epigenetic modifications at CondSig-positive sites, target genes of all CondSig-positive and all CondSig-negative sties with the same histone modifications or chromatin accessibility were also defined. Here, all CondSig-negative sties (i.e., the total set of CondSig-negative sites) were specified as any 1 kb genomic bins occupied by at least two CAPs and not identified as CondSig-positive. Both CondSig-positive and -negative sites were first intersected with ChIP-seq peaks of H3K4me3 or H3K27ac, or ATAC-seq peaks, and their target genes were then defined in the same manner mentioned above.Cell cultureMouse embryonic stem cells (mESC), C57BL/6 strain, were purchased from ATCC (SCRC-1002) and cultured on a feeder layer of mitomycin C (Stemcell, 73272) treated mouse embryonic fibroblast (MEF) in tissue culture flask coated with 0.1% gelatin. The cells were grown in complete mESC medium, which was composed of EmbryoMax DMEM (Millipore, SLM-220-B), 15% (v/v) fetal bovine serum (Hyclone, SH30070.03), 0.1 mM nonessential amino acids (Millipore, TMS-001-C), 1% (v/v) nucleoside (Millipore, ES-008-D), 2 mM L-glutamine (Millipore, TMS-002-C), 0.1 mM β-mercaptoethanol (Millipore, ES-007-E), and 1000 U/mL recombinant LIF (Millipore, ESG1107).Cell treatment1,6-hexanediol (Sigma, 240117) was dissolved in a complete mESC medium at a concentration of 15% (w/v) to make a storage solution, similarly, 2,5-hexanediol (Sigma, H11904) was prepared at a 15% (v/v) concentration in the same medium. mESC were detached using trypsin, pelleted by centrifuging, and then resuspended in a complete mESC medium. The resuspended cells were transferred into a gelatin-coated flask and cultured in a 37 °C incubator for 1 hr to remove the feeder cells. The supernatant cells were collected and washed twice with PBS. After cell resuspending with medium, either 1,6-hexanediol or 2,5-hexanediol storage buffer was added at a final concentration of 1.5%. The dishes were put into the incubator immediately for 30 min, and treated cells were immediately used for CUT&RUN assay.CUT&RUNThe CUT&RUN assay was conducted on 0.2 million cells per sample, utilizing the Hyperactive pG-MNase CUT&RUN assay kit (Vazyme, HD102) with slight modifications to the manufacturer’s protocol. Briefly, cells were harvested and incubated for 10 min at room temperature with Concanavalin A-coated magnetic beads, which had been activated prior to use. Following this, the ConA beads bound cells were collected using a magnet and resuspended in 100 µl of antibody buffer containing either 2 µl of DDX21 (Proteintech, 10528-1-AP, lot # 00088037), 4 µl of CTR9 (Bethyl Laboratories, A301-395A, lot # 4), 4 µl of SUPT6H (Novus Biologicals, NB100-2582, lot # 2 A), 1.5 µl of SS18 (Cell Signaling Technology, 21792 (D6I4Z), lot # 1), 1.5 µl of EP300 (Santa Cruz, sc-48343 (F-4), lot # A1323), or 0.5 µl of ELL3 (generously gifted by Prof. Chengqi Lin, Southeast University, China) primary antibody respectively. The samples were then incubated at 4 °C overnight on rotator. The next day, cells were washed twice with Dig-wash buffer and resuspended in 100 µl of a premixed pG-MNase Enzyme solution before incubation at 4 °C for 1 hr with rotation. Following this, the cells were washed twice with Dig-wash buffer and resuspended in 100 µl of premixed CaCl2 solution, then incubated for 2 h on ice. Following the stop of the reaction, the cut chromatin was released from cells by incubation at 37 °C for 30 min in the absence of agitation. After centrifuging at 13,400 g for 5 min, the supernatant was collected, and DNA was purified using FastPure gDNA mini columns. The libraries were prepared using NEBNext Ultra II DNA library prep kit (NEB, E7645) with modified amplification condition as 98 °C for 30 s, 15 cycles of 98 °C for 10 s and 65 °C for 17 s, and final extension at 65 °C for 2 min and hold at 4°C.Single-cell RNA-seqSingle-cell RNA sequencing (scRNA-seq) libraries were prepared using 6000 mES cells, either in a wild type state or treatment with 1,6-hexanediol at 1.5% or 2,5-hexanediol at 1.5% for 30 min, and K562 cells (National Collection of Authenticated Cell Cultures, TCHu191), either in wild type or treatment with 1,6-hexanediol at 10% for 20 min. The libraries were created using the Chromium Single Cell 3’ Library and Gel Bead Kit V3.1 (10x Genomics, Catalog No. PN1000268) to create single-cell gel beads in emulsion (GEM). Following preparation, the libraries were sequenced using the Illumina Novaseq 6000 platform in a 150 bp paired-end mode.Immunofluorescence stainingCTR9 antibody was labeled with Mix-n-Stain CF488 Antibody labeling kit (Sigma, MX488AS20), while SUPT6H and ELL3 (Sigma, HPA028938) antibodies were labeled using Mix-n-Stain CF568 Antibody labeling kit (Sigma, MX568S20) according to the manufacturer’s instruction. For co-immunofluorescence study of SUPT6H/CTR9/ /SUPT5H, mESC were grown as mentioned above on pre-coated coverslips and fixed with 4% paraformaldehyde solution (Beyotime, P0099) at room temperature for 10 min. permeabilization was performed using 0.5% Triton X-100 (Sigma-Aldrich, 93443) in PBS for 10 min. Cells were blocked with IF blocking solution (Beyotime, P0102) for 1 h at RT, and subsequently incubated with a 1:100 diluted SUPT5H primary antibody (Santa Cruz, 133217 (D3), lot # G1217) in QuickBlock dilution buffer (Beyotime, P0262) at 4 °C overnight. Following three washes, cells were incubated with Alexa Fluor 594 goat anti-rabbit secondary antibody (ThermoFisher, A11037) at a concentration of 1: 1000 in PBST for 1 h at RT. After three additional washes with PBST, cells were labeled with both CF488-conjugated CTR9 (1:250 diluted) and CF568-conjugated SUPT6H (1:200 diluted) antibodies at RT for 2 h. After three washes with PBST, the coverslips were mounted onto glass slides using Vectashield medium with DAPI (Vector Laboratories, H-1200) and sealed with nail polish. Similarly, For the co-IF experiment of SS18/EP300/ELL3, blocked mESC were incubated with 1:400 diluted SS18 and 1:200 diluted EP300 (Santa Cruz, 32244 (NM11), lot # H1921) primary antibodies, followed by incubation with Alexa Fluor 488 goat anti-rabbit (ThermoFisher, A11008) and Alexa Fluor 594 goat anti-mouse (ThermoFisher, A11032) secondary antibodies at a concentration of 1: 1000 for 1 h at RT, then labeled with 1:200 diluted CF568-conjugated ELL3 antibody for 2 h. Images were acquired using a Zeiss LSM 710 confocal microscope with 100 × oil objective and ZEN acquisition software.Fluorescence recovery after photobleaching (FRAP)FRAP assay was conducted using the FRAP module of the Leica SP8 confocal microscopy system. The CTR9 and SUPT6H endogenously tagged with EGFP in mESC was bleached using a 488 nm laser beam. The mScalet-SUPT5H overexpressed in mESC was bleached using a 561 nm laser beam. Similarly, mESC overexpressed with SS18-EGFP and EGFP-ELL3 were bleached using a 488 nm laser beam. Bleaching targeted a specific circular region of interest (ROI) using 100% laser power and time-lapse images were collected. Fluorescence intensity was measured using Fiji software, with background intensity subtracted and values normalized to pre-bleaching time points.CUT&RUN, single-cell RNA-seq data processingCUT&RUN reads were first processed using TrimGalore (v0.6.0) to trim adaptor and low-quality reads. Trimmed reads were then aligned to the mouse genome build mm10 or human genome build hg38 using Bowtie2 (v2.3.5.1)59 with parameters “–no-mixed –no-discordant –no-unal”. Low mapping quality reads (mapping quality <30) and duplicates were discarded. Then biological replicates that passed quality control were pooled together. For the same CAP, the reads in each condition was down-sampled to the same number. This number was determined by the minimum reads of the CAP across different conditions: 40 million for DDX21 and SS18, and 50 million for the others. CUT&RUN peaks were called by MACS2 (v2.1.3)52. Signal tracks were generated using the “genomecov” command in Bedtools software (v2.28.0), and normalized to reads per million mapped reads (RPM). Single-cell RNA-seq data (10x Genomics) were processed with DrSeq2 (v2.2.0)66 and transcriptome-wide transcriptional burst kinetics were inferred using the model from the previous study45.Statistics and reproducibilityStatistical analysis was performed using Python, and the statistical details are shown in the figure legends and Source Data. No statistical method was used to predetermine sample size. No data were excluded from the analyses. This study did not include complex treatment conditions, all cells were randomly assigned to each group for imaging and sequencing. All samples were prepared blinded.Reporting summaryFurther information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Hot Topics

Related Articles