Z-DNA formation in promoters conserved between human and mouse are associated with increased transcription reinitiation rates

DeepZ predictions in human and mouse genomes based on the common set of omics featuresHere we applied the same DeepZ pipeline as described in25 using for training human24 and mouse genome ChIP-seq data15. The schematic of the DeepZ approach is presented in Fig. 1A (see Methods, Supplementary Fig. 1). For omics data we took all available HM, TF, DNase accessible sites, CC and RP (see full list in Supplementary Table 1). We selected only experiments available for both genomes. In total there were 544 features used (Fig. 1B, C), including 65 HM, 466 TF and CC, 3 methylation maps, 8 RP, a map of dinucleotide energy transitions from B- to Z-form, and DNase hypersensitivity sites. We generated whole-genome annotations for human and mouse genomes with Z-DNA regions (Supplemental Data 1–2), which comprise 30,083 segments in human and 17,569 in mouse.Figure 1DeepZ predictions in human and mouse genomes based on the common set of omics features. (A) General Schema of DeepZ approach, with PR curves showing model performance. (B) Number of common and unique features used in DeepZ model. (C) Distribution of features over functional groups. (D) Distribution of Z-DNA regions over genomic regions. (E) Whole-genome distribution of DeepZ predicted Z-flipons in human genome. (F) Whole-genome distribution of DeepZ predicted flipons in mouse genome. In (E,F) regions conserved between human and mouse are highlighted in blue. Bar-graphs in (E,F) present distribution of conserved regions, DeepZ and conserved DeepZ predictions over genomic regions.The number of potential Z-DNA forming sites in human comprise ~ 3 Mb compared to ~ 2.6 Mb in mouse genome, but due to the different genome sizes, both comprise ~ 0.1% of cumulative genome length in both genomes (Supplementary Table 2). The distributions of the DeepZ predicted Z-DNA over genomic regions for mouse and human are given in Fig. 1D. In both genomes the distribution is qualitatively the same with enrichment in promoters, exons, 5′UTR and 3′UTR.Since DeepZ distribution over genomic regions is shifted towards enrichment in regulatory regions—promoters, exons, 5′UTR, 3′UTR (Fig. 1D), we verified DeepZ performance metrics on each genomic region separately. The results for promoters are concordant with those found in other regions of the genome (Table 1 and Supplementary Table 3).Table 1 DeepZ performance metrics for promoter regions.To verify how groups of features contribute to model performance, we performed an ablation analysis with gradient boosting used to assess group feature importance (see “Methods” and Supplementary Fig. 2). We confirmed that information only from DNA sequence is not sufficient to predict Z-DNA formation, and that each omics feature group improves model performance as measured by the F1 metric. We further assessed the effects of different thresholds to distinguish between flipons that form Z-DNA under physiological conditions and those that do not. All the results presented are based on the use of threshold 3 for both mouse and human genomes (See Methods and Materials).Z-flipons are enriched in conserved human-mouse regionsWe found significant enrichment of the predicted DeepZ regions in the conserved regions between human and mouse genome (Fig. 1E, F, Supplementary Data 3–4). The enrichment is 2.6-fold (p < 0.001, permutation test) in human and 1.6-fold in mouse (p < 0.001, permutation test), comprising 20% (human) and 10% (mouse) of all DeepZ predicted Z-DNA regions (Supplementary Table 4). The number of genes with Z-DNA from regions conserved in vertebrate clades is almost three times larger (7188 genes) in human than in mouse (2310 genes) (Supplementary Table 4), likely reflecting the differences in the nature and size of the training sets we used. From both lists, 966 genes are human and mouse orthologs with predicted Z-DNA sites in the body of the gene. GO analysis reveals enrichment of human and mouse orthologs, with conserved Z-flipons in regulation of metabolic process (546 genes, FDR e−36), regulation of transcription by RNA polymerase II (220 genes, FDR e−21), response to stimulus (495 genes, FDR e−09), binding to protein and nucleic acids (758 genes; FDR e−21), alternative splicing (614 genes, FDR e−12), chromatin organization (77 genes, FDRe−8), MAPK signaling pathway (37 genes; FDR e−06), with location in nucleus (590 genes, FDR e−45), nuclear lumen (423 genes, FDR e−37) (Supplementary Table 5). Full list of the most-enriched pathways and processes are given in Supplementary Table 6.Common patterns of transcription factors, regulators and histone marks enriched around Z-flipons conserved in human and mouseWe aimed to find common TF, CC, RP and HM (here and after referred to as an omics feature) that are enriched in regions around conserved Z-flipons, to assess the association of Z-DNA forming regions with active transcription. We performed this analysis initially at the genome-wide level, then in promoter regions, CpG- and non-CpG-promoters and found that the majority of TFs (86%, 404 out of 470 in human and 93%, 439 out of 470 in mouse) and HMs (68%, 46 out of 68 in human and 72%, 49 out of 68 in mouse) showed statistically significant enrichment (p < 0.001, permutation test) around conserved Z-regions in promoters (Supplementary Table 7).We selected the top-20 features that are both most frequently co-localized and significantly enriched (p < 0.001, permutation test) with conserved Z-flipons in each category both for human and mouse and the combined feature importance plot is presented in Fig. 2. In line with our previous studies15,25,28, we observed enrichment of Z-flipons in promoters. The result for Z-forming sequences in CpG-promoters is unsurprising as alternating d(CG) d(C1–3G1–3) are prone to Z-DNA formation26. We also observed an enrichment of Z-prone sequences in non-CpG promoters.Figure 2Conserved patterns of transcription factors and histone marks enriched around Z-flipons in human and mouse genomewide. (A) Enrichment of transcription factors around Z-flipons for the entire genome, promoters, CpG-promoters and non-CpG promoters. (B) Enrichment of histone marks around Z-flipons.Features that are enriched with conserved Z-flipons in promoter regions, both in human and mouse, are exactly those transcription factors, chromatin remodelers and epigenetic regulators that are enriched in transcriptionally active genes and associated with regions of open chromatin, where DNA conformation is less restrained. For genes presented in Fig. 2 significant Gene Ontology (GO) enrichment was found for the terms “Regulation of transcription by RNA polymerase II” (GO:0006357, FDR 6.08e−25)); “Chromatin organization” (GO:0006325, FDR 1.08e−18); “Chromatin remodeling” (GO:0006338, FDR 1.32e−09); “Positive regulation of transcription, DNA-templated” (GO:0045893, FDR 2.99e−23); “Regulation of gene expression, epigenetic” (GO:0040029, FDR 1.40e−05), “Developmental process” (GO:0032502, FDR 1.33e−08), “Cell differentiation” (GO:0030154, FDR 2.71e−05).Histone marks that are enriched with conserved Z-flipons in promoter regions both in human and mouse include many acetylated marks indicating active promoters and transcription activation (H3K12ac, H3K14ac, H3K18ac, H3K56ac, H3K122ac, H4K5ac, H4K8ac, H4K12ac), H3K4me1/me2/me3 that are indicators of active transcription, and also some specific markers associated with different chromatin states—H2A.Z, H2AXS139ph, and H2AK119Ub.The same analysis showed that the HM and TF features identified by DeepZ in non-conserved DNA regions are quite variable, highlighting the differences in the evolutionary selection of mouse and human genomes (Supplementary Fig. 3). We also noted that many other HM and TF showed no association with Z-DNA formation, further confirming the specific nature of DeepZ predictions.Clusters of common conserved Z-flipons between human and mouse reveal functional groups of LINEs, embryonic development, and neurogenesisTo further analyze DeepZ predictions, we applied UMAP clustering of Z-flipons defined as the vectors of omics features. We combined human and mouse data sets and extracted features mapping to experimental and predicted Z-DNA regions, (Fig. 3A). As expected, the UMAP shows that DeepZ predictions incorporate information obtained from many different cells lines and collected under many conditions helped to overcome the limitations of a single experiment by enabling better discrimination of signal from noise. The approaches exploited the differences in the training sets used for human and mouse models. Whereas the human ChIP-seq set was enriched for active Z-flipons that bound Zα, the mouse dataset identified inactive flipons that were derepressed by the treatment with curaxin. Together, the datasets helped validate flipons features common to both human and mouse genomes and also improved the mapping of both active and suppressed flipons in each species.Figure 3UMAP clustering of Z-flipons based on vectors of common omics features. (A) UMAP clustering of experimental and DeepZ predicted Z-flipons in human and mouse genomes. (B) Cluster of common human and mouse genome. (C) Distribution of Z-flipons over genomic regions, regulatory elements, and transposons.Figure 3B, C shows how Z-flipons are distributed over genomic regions. The clusters with conserved, overlapping human and mouse flipons are highlighted in Fig. 3B. Interestingly, the UMAP generates clusters that correspond to differences between genomic regions that were previously annotated in other ways, even though this prior information was not included in the training data. For example, SINE and LINE repeats form a separate cluster in the map. These regions were strongly enriched for Z-DNA formation when cells were treated with the curaxin CBL0137 that derepresses these regions by altering chromatin structure, enabling their transcription (light blue in Fig. 3A)15. We did not find enrichment of the ALU family of human SINE in our analysis, even though they contain Z-DNA forming sequences1. Most likely they were absent from our training set as these sequences are routinely cleaned out by the pipeline used for processing experimental data.The UMAP clusters observed are associated with different feature vectors (Fig. 4, Supplementary Table 8). Feature importance analysis of these vectors allows their assignment to processes involved in embryonic development and morphogenesis, RNA Polymerase II and RNA Polymerase III dependent transcription, heterochromatin formation associated with LINEs, chromatin binding complexes, negative regulation of transcription and cellular component organization. The LINE cluster in mouse genome due to CBL0137 is enriched for MECP2 (methyl-CpG binding protein 2), a reader of DNA methylation that is a feature of the heterochromatin present in intergenic regions.Figure 4UMAP clusters of human and mouse Z-flipons with one top marker feature highlighted for each cluster. The first 5 marker features are given in the boxes.GO-analysis of orthologs with Z-flipons from the common cluster (Supplementary Table 9), revealed that genes with conserved Z-flipons are enriched in the development and differentiation processes. Interestingly, Z-flipons with a strict purine-pyrimidine alternation were clustered together with morphogenesis related genes that are evolutionarily old. This alignment suggests the early evolutionary selection of Z-flipons composed of GT- or GC-repeats. Notably, the GO enrichment we observe mirrors that found for Z-flipons bound for conserved microRNAs that produce phenotypic variation during development29 .Many of the enriched genes are involved in neurogenesis, and in particular, are related to synapse organization and function (Supplementary Table 9). Examples of flipons in the human and mouse Wnt Family Member 5A (WNT5A) gene, which is involved in the presynapse assembly pathway, are given in Supplementary Fig. 4. This gene harbors many Z-flipons that are detected by three methods (DeepZ, Z-DNABERT, chemical footprinting), and which are located in 5’UTR, near splice-sites, in exons, at alternative promoters. Such analyses independently confirmed that Z-flipons have conserved functional roles in both genomes and are marked by shared omic features.Functional Z-flipons at promoters is consistent with a role in transcriptionWe found significant enrichment of conserved Z-flipons in alternative and bidirectional promoters. We then explored in more detail cases where DeepZ predicted Z-DNA regions were detected in both human and mouse orthologs by two other methods KEx and the Z-DNABERT. The results we show in these figures are representative of findings genome-wide. We supply tracks to enable other researchers to evaluate genome regions that they study (Supplemental Data 1–4).Z-flipons at alternative promotersMany genes have alternative promoters, but the mechanism of activation and how they control gene expression is not clear. Our analysis revealed that, depending on the size of the upstream region, the Z-flipons predicted by DeepZ show a 30-fold enrichment within 10-bp and sevenfold within 1 kb from TSS in alternative promoters (p < 0.001, permutation test) (Supplementary Table 10). Here we highlight different cases with alternative promoters that are very close to transcription start sites or located at a distance from the main promoter. The alternative PLEKHA7 promoters overlap with Z-flipons detected by three different methods (DeepZ, KEx and ZDNABERT) (Fig. 5A,B). The alternative promoters in both genomes have many noticeable columns of omics signals (Fig. 5C,D). Because DeepZ was trained on a broad ChIP-seq data, the width of DeepZ prediction is comparable to ChIP-seq peak widths and does not have the higher resolution possible with KEx and Z-DNABERT. The right column of omics features at the main promoter (right column in Fig. 5C,D) has Z-flipons confirmed by three methods. DeepZ predicted two Z-flipons around this promoter based on two peaks of omics features corresponding to tandem promoters. There are species differences. For example, the Human Z-flipon present in a distant alternative promoter (left column at Figs. 5C,D) is composed of CA-repeats (GT-repeats at the opposite strand), while in mouse the equivalent Z-flipon lacks this sequence. The splicing graph (Fig. 5A,B) also shows that there are alternative promoters nearby start sites for this gene that are not conserved between human and mouse.Figure 5Z-flipons at alternative promoters of PLEKHA7 orthologs in human and mouse genome. (A,B) Alternative splicing graph for PLEKHA7 in (A) human and (B) mouse genomes. (C,D) Region of PLEKHA7 in human (C) and mouse (D) genomes with Z-flipon signals detected by three methods—DeepZ, KEx, Z-DNABERT, and signals from omics features enriched in Z-flipons both in human and mouse genomes. Omics features are aggregated signals from all tissues as they were used in DeepZ model. (E,F) Selected omics features for human and mouse genome for neural tissue type. Features that are common in human and mouse are highlighted in yellow.Since DeepZ were trained on aggregated experimental signals from different tissues, the regions mapped to the genome do not capture tissue-specific or species-specific differences. We find many examples where feature distributions differ between mouse and human. An example from neural tissue is presented in Fig. 5E,F. We can see that despite the differences, many proteins and histone marks are common to both species. These include MYC, BRD4, SMARCA4, CTCF and SUZ12—all regulators involved in chromatin organization. Presence of both H3K9ac and H3K9me3 indicates that this promoter is being turned on and turned off in neural tissue, and this is conserved between human and mouse.We also can observe that the complexity of marks for tissue specific gene expression is greater than suggested by viewing our aggregated maps, reflecting differences in genomic structure at a particular locus in each lineage and can differ by sepcies (Supplementary Fig. 4). For the TMEM51-AS1 transcript, we see that the pattern for blood tissue type differs from other tissues, with a bidirectional promoter present in mouse but not in humans. Such a variation is not apparent from the “All Tissues” track (Supplementary Fig. 4G). Similar results were obtained for data generated from liver tissue (Supplementary Fig. 5). The patterns observed in blood tissues are complex with binding of different TF occurring at one or two of the two, three, or even four separated peaks. The variable coding of bidirectional and alternative promoters allows gene expression to change in a tissue- and context-specific manner through their effects on Z-flipon conformation.Mapping of marks around transcription start sites at bidirectional promotersWe made anchor plots for human bidirectional promoters to further explore the connection between Z-flipons, histone marks and transcription start sites (Fig. 6). Bidirectional promoters are of interest as the energy available to flip sequences to Z-DNA is highest when RNA polymerases transcribe each DNA strand in an opposite direction. As a result, negative supercoiling that is generated 5′ of each polymerase accumulates in the region between the two TSS. The prediction then is that active marks of transcription should be highest in these regions, and that these marks should be enhanced in those promoters containing Z-DNA forming elements compared to the non-Z-DNA set. We see such an outcome in the anchor plots centered on bidirectional promoters. The upper part of each panel shows promoters with predicted Z-DNA elements and the lower part is for non-Z-DNA promoters. The difference in the vertical length of each set is because Z-DNA containing promoters are less frequent than non-Z-DNA promoters. The relative proportion of bidirectional promoters that are enriched for a particular mark is higher for promoters with Z-flipons than for those promoters without, with permutation p-values < 0.001 (as given in Fig. 2). For example, H2A.Z marks are present at most Z-DNA containing promoters, but a much smaller fraction of non-Z-DNA promoters. Overall, the analysis revealed 130-fold enrichment within 500-bp and 85-fold within 1 kb from TSS of DeepZ-predicted Z-flipons in bidirectional promoters (p < 0.001, permutation test).Figure 6Anchor maps of features colocalization around transcription start sites of bidirectional promoters with and without Z-flipons.When examining particular examples of bidirectional promoters, we observed cases where Z-flipons were detected with three methods (KEx, DeepZ, Z-DNABERT) (Supplementary Table 10). The Z-DNA forming segments align best with active HM at each promoter. Marks for TF and CC are enriched either side of the bidirectional promoter region, as seen for the bidirectional promoter of TMEM51-AS1 (Supplementary Fig. 5) and of BRCA1-NBR2 (Supplementary Fig. 6).Association of Z-flipons with transcription reinitiation rateHow then do Z-flipons affect gene expression? Early models proposing sequence- and Z-DNA- specific TF currently lack experimental support30. To explore this question further, we explored the association of Z-flipons with parameters that measure transcription kinetics, using the data generously provided by the Cramer laboratory31. We tested the initialization rate per cell, the elongation rate per cell, and polymerase pause duration, exploring all human regions with DeepZ and Z-DNABERT predictions (Fig. 7 and Supplementary Table 11).We note that the initiation frequency is actually a measure of transcription reinitiation rather than initiation, which depends on a different set of pioneering transcription factors to activate gene expression32.Figure 7(A) UMAP clusters of human and mouse Z-flipons predicted by different experimental and in silico methods. (B) Deep-Z is highlighted over Z-DNABERT to show the colocalization of predictions. (C) Z-DNABERT is highlighted over Deep-Z to show the colocalization of predictions. The black dots represent the genes from regions conserved between mouse and human genomes that have transcription data provided by the Cramer laboratory and that are analyzed in Fig. 8.Our analysis of these regions revealed significant differences in the distributions of reinitiation frequencies for promoters with conserved DeepZ predicted Z-flipons, as compared to promoters without Z-flipons (p = 5.62e−5, Kolmogorov–Smirnov test, Fig. 8A). The reinitiation frequency trends higher for conserved Z-flipon promoters. We also see a similar difference in reinitiation for conserved non-CpG promoters (p = 1.76e−2, Kolmogorov–Smirnov test, Fig. 8B) and for conserved CpG-promoters with Z-flipons (p = 1.2e−4, Kolmogorov–Smirnov test, Fig. 8C). There was no difference in pause duration or elongation rate associated with Z-flipons.Figure 8(A,B) Transcription initiation rate for different types of promoters with or without Z-flipons. (A) Transcription initiation rate for promoters with conserved DeepZ-predicted Z-flipons is higher than the rate for promoters without Z-flipons. (B) Transcription initiation rate for promoters with conserved Z-flipons, but lacking CpG islands, is higher than rate for promoters that lack Z-flipons and CpG islands. (C) Transcription initiation rate for CpG-promoters with Z-flipons is higher than for CpG promoters without Z-flipons. (D) Transcription initiation rate for promoters with conserved Z-DNABERT-predicted Z-flipons is higher than the rate for promoters without predicted Z-flipons. (E) Transcription initiation rate for promoters with Z-flipons identified by both DEEP-Z and Z-DNABERT is higher than for all other groups. (F) Transcription initiation rate for promoters containing Z-flipons identified by both DEEP-Z and Z-DNABERT is higher compared to those lacking any predicted Z-DNA forming elements. (G) Schematic representation of mechanism of action of Z-flipons in resetting the transcription initiation complex. The negative supercoiling generated by the RNA polymerase can be used to either wrap DNA around histones or for the assembly of preinitiation complexes. The outcome depends on the action of chromatin remodelers and mediator proteins.We verified the results with Z-DNABERT predictions that are based on an independent algorithm and an independent training set (p = 2.35e−2, Kolmogorov–Smirnov test, Fig. 8D). We also confirmed the results in a set with overlapping DeepZ and Z-DNABERT predicted Z-flipons that we compared to promoters lacking Z-flipons (p = 1.52e−3, Kolmogorov–Smirnov test, Fig. 8E) and to any promoter excluded from the overlapping DeepZ and Z-DNABERT set (p = 7.70e−4, Kolmogorov–Smirnov test, Fig. 8F). We extended the analysis by using statistical tests with different underlying assumptions, applying the t-test, median-test and variance-test to all groups in Fig. 8. With all these approaches we confirmed the significance of the relationship between conserved DeepZ predicted Z-flipons and higher reinitiation rate (Supplementary Fig. 7).We also examined DeepZ predictions by plotting Z-DNABERT scores for each promoter against the reinitiation rate, dividing the graph into four quadrants (Supplementary Fig. 8). In quadrant 1, promoters have low reinitiation rate, while in quadrant 4 they have low Z-DNABERT-scores. In quadrant 3, both measures are low, while in quadrant 2 both Z-DNABERT scores and reinitiation rates are high. We calculated the ratio of counts in quadrant 2 relative to quadrant 4 for both DeepZ and non-DeepZ promoters. Consistent with our previous analyses, the ratio was significantly higher for DeepZ promoters (Supplementary Fig. 8, p < 0.00001). The Z-DNABERT scores in quadrant 2 were within a relatively restricted band, with a slight upwards trend as reinitiation rate moves higher. The lack of correlation between these measures suggests that Z-DNA formation in itself does not solely determine the overall transcription rate. The results are consistent with a mechanism where Z-DNA formation is under selection to optimize reinitiation by resetting the promoter for reuse, with other factors determining pause release and reformation of pre-initiation complexes.The plot of non-DeepZ promoters also reveals that there are functional Z-flipons, according to their Z-DNABERT scores, that are not classified as such by DeepZ. This outcome could reflect the use of threshold 3 for DeepZ to optimize the F1 metric. While we reduced false positive calls with this choice, we also have increased the false negative rate. The result also could arise from the various limitations in the training sets available for calibrating DeepZ in which some sets of tissue-specific promoters were not well represented in the data used to fully specify the model, or a set of promoters that were not active at the time the ChIP-seq experiments were performed.The proposed mechanism of action for how Z-DNA modulates transcription reinitiation is presented in Fig. 8G. Here Z-DNA captures the negative supercoiling generated 5′ to an elongating polymerase33. The accumulated energy can then be used to either turn a promoter on or off by powering the assembly of the complexes required. Either the negatively supercoiled DNA can be absorbed by chromatin remodelers as they wrap DNA around a nucleosome to suppress transcription, or by mediator proteins that promote the reassembly of preinitiation complexes34. These outcomes can be further tuned by topoisomerases that regulate local supercoiling35 and potentially by small RNAs that alter promoter conformation36. Our data shows enrichment in promoters of topoisomerase I (odds ratio of 2.25 in human and 5.9 in mouse, p < 0.01) and topoisomerase 2B (odds ratio of 9.2 in human and 8.18 in mouse, p < 0.0001) relative to promoters without Z-flipons (Supplementary Table 7).To further examine the role of Z-DNA in reinitiation, we checked for GTF2E2 localization to Z-flipons. The GTF2E2 data was not available at the time DeepZ was trained, so it provides information independent of that used to calibrate the model. The GTF2E2 protein product Transcription Factor E (TFE), subunit B, is of interest given previous yeast experimentation showing the essential role of TFE in transcriptional reinitiation37, and given that transcription-induced Z-DNA formation in yeast has been shown to occur in a promoter-specific fashion38. We observed odds ratio of 23 for human (p-value < 2.2e−16, Fisher’s exact test, Fig. 8H) for GTF2E2 in DeepZ and Z-DNABERT promoters, a result consistent with involvement of Z-DNA in reinitiation of transcription. The finding is also supported by evolutionary analyses of the relationship between TFE and the Zα domain39, one that awaits experimental validation.

Hot Topics

Related Articles