Exploring the roles of RNAs in chromatin architecture using deep learning

In order to characterize the roles of caRNAs in 3D genome folding, we downloaded the genome-wide RNA-chromatin interactions in human foreskin fibroblast cells (HFFc6) and human embryonic stem cells (H1ESC) captured by iMARGI and the corresponding genome-wide DNA-DNA interactions captured by Micro-C from 4DN data portal (https://data.4dnucleome.org/) (Supplementary Table 1)20,46,47. We chose iMARGI data over other techniques that map genome-wide RNA-DNA contacts, because iMARGI has been performed in human cell lines that have rich transcriptomic and epigenomic datasets we could use to interpret the high-quality chromatin interaction data. We used HFFc6 for our primary analyses, and leveraged H1ESC for identifying cell-type differences.To disentangle the roles of nascent transcription versus trans-located caRNAs, both of which are involved in 3D genome organization, we broke down the human genome into 2048-bp bins and defined nascent transcripts as all the RNAs transcribed from a given 2048-bp DNA bin and trans-located caRNAs as all RNAs transcribed from at least 1 Mb away from the bin. We opted for 1 Mb to identify trans-located caRNAs instead of the 100 or 200 Kb used in previous studies39,48 in order to align with the window size of our predictive models and also to remove self-interactions for the vast majority of genes ( ~ 99.9%). As different types of caRNAs may engage in different trans-interactions (Fig. 1a) and contribute to different chromatin features, we further classified trans-located caRNAs into eight groups: snRNAs, snoRNAs, other small RNAs, lncRNAs, misc_RNAs, RNAs from protein coding genes, RNAs from other types of genes and RNAs from regions without known gene annotation.Fig. 1: Chromatin-associated RNAs preferentially bind open chromatin and many interactions occur in trans.a The proportion of trans-interactions for RNAs transcribed from each gene that is annotated as lncRNA (n = 17,673), snRNA (n = 727), protein-coding (n = 19,317), miRNA (n = 900) or snoRNA (n = 765). The center line and triangle within the box represent the median and mean value, respectively. The box represents the interquartile range (IQR), with whiskers set to 1.5 times the IQR. Outliers are shown in points. b The number of RNA-DNA interactions (log2 normalized count) within and across compartments (left panel) and SPIN states (right panel). The interaction frequencies were normalized to the size of compartments and SPIN states. c The abundance of all caRNAs or trans-located caRNAs at ATAC-seq peaks and their flanking regions. d The percentage of genome-wide cis-interactions (RNA-DNA distance 1 Mb or less) and trans-interactions (RNA-DNA distance > 1 Mb on the same chromosome or RNA and DNA on different chromosomes). e Histogram of RNA-DNA interaction frequencies as a function of genomic distance between DNA and RNA loci on the same chromosome. SPIN: Spatial Position Inference of the Nuclear genome, Interior_Act 1: Interior Active 1, Interior_Act 2: Interior Active 2, Interior_Act 3: Interior Active 3, Interior_Repr1: Interior Repressive 1, Interior_Repr2: Interior Repressive 2, Near_Lm1: Near Lamina 1, Near_Lm2: Near Lamina 2. Source data are provided as a Source Data file.CaRNAs interact with cis and trans located open chromatinTo explore how caRNAs are spatially localized inside the nucleus, we examined whether caRNAs identified by iMARGI preferentially interact with any parts of the genome. Similar to DNA-DNA interactions, we observed that RNAs tend to interact with DNA regions that share the same spatial or functional annotations as the loci from which they are transcribed, such as being in the same compartment or having the same Spatial Position Inference of the Nuclear genome (SPIN) state49 or chromatin state identified by chromHMM50,51 (Fig. 1b and Supplementary Fig. 1). Beyond that, caRNAs interact more frequently with DNA regions with high versus low transcriptional activity (Fig. 1b and Supplementary Fig. 1). This trend is confirmed by the enrichment of caRNAs at open chromatin regions (Fig. 1c). Interestingly, the enrichment was also observed for trans-located caRNAs (Fig. 1c), and the amount of trans-located caRNA attached to DNA regions was positively correlated with the region’s chromatin accessibility (Pearson’s R = 0.37).Considering that many RNA-DNA interactions across spatial or functional annotations may be from trans-interactions, we assessed the percentage of RNA-DNA interactions occurring in trans in HFFc6 both globally and for each annotated gene. We included caRNAs transcribed from the DNA loci on the same chromosome but at least 1 Mb away plus those encoded on different chromosomes. CaRNAs primarily interacted with proximal DNA regions (Fig. 1d), and of all the interactions on the same chromosome, over 90% spanned a distance of less than 1 Kb (Fig. 1e). Nevertheless, 38.38% of RNA-DNA interactions occurred in trans, including 6.14% within the same chromosome and 32.24% on different chromosomes (Fig. 1d). These results are quite different from DNA-DNA interactions from the Micro-C data, where trans-interactions across chromosome are less frequent (6.46% within the same chromosome and 11.92% on different chromosomes). This difference suggests that the proximity of most caRNAs to chromatin is not due to their being transcribed from DNA that is nearby in the 3D nucleus.Notably, we observed that the majority of the small ncRNAs and a number of lncRNAs and RNAs from protein-coding genes were engaged in trans-interactions (Fig. 1a and Supplementary Fig. 2). Given the well-established importance of several snRNAs and snoRNAs in nuclear structures, these results suggest that other ncRNAs and transcripts of some protein-coding genes may also regulate chromatin structures.
Trans-located caRNAs are particularly enriched at TAD boundariesTo investigate whether caRNAs play roles at particular landmarks within the 3D genome, we first used the iMARGI data to examine their abundance at TAD boundaries. Since most TAD boundaries are located in compartment A in HFFc6 (Fig. 2a) and tend to have higher chromatin accessibility compared to surrounding regions (Fig. 2b), we hypothesized that caRNAs would be enriched at TAD boundaries. In order to check whether nascent transcripts and trans-located RNAs follow similar patterns, we conducted separate analyses for each of them. As anticipated, trans-located caRNAs peaked at TAD boundaries and greatly decreased in flanking regions ( ± 50 Kb) (Fig. 2b). After categorizing TAD boundaries in HFFc6 and H1ESC based on their strength and cell type specificity (see Methods, Fig. 2b), we found that HFFc6 trans-located caRNAs are significantly less prevalent at TAD boundaries unique to H1ESC or with higher insulation strength in H1ESC than at TAD boundaries shared with or more prominent in HFFc6. Similar but weaker patterns were also observed for open chromatin signals (ATAC-seq) (Fig. 2b). These results suggest the potential involvement of trans-located caRNAs at TAD boundaries and their contribution to TAD dynamics across cell types. Additionally, strong TAD boundaries exhibited significantly higher ATAC-seq and trans-located caRNA signals than did weak boundaries, and the association between boundary strength and trans-located caRNA abundance held after normalizing to the corresponding ATAC-seq signals (Fig. 2c). These results further indicate that the accumulation of trans-located caRNAs at TAD boundaries is not solely driven by DNA accessibility.Fig. 2: Trans-located caRNAs are particularly enriched at TAD boundaries.a The percentage of HFFc6 TAD boundaries located in A and B compartments. b Chromatin accessibility (ATAC-seq, left panel), and the abundance of nascent transcripts (middle panel) and trans-located caRNAs (right panel) at TAD boundaries (center) and their flanking regions in HFFc6. c Chromatin accessibility, the abundance of nascent transcripts and trans-located caRNAs, and the abundance of trans-located caRNAs normalized to chromatin accessibility, at strong (n = 7175) versus weak (n = 3971) TAD boundaries. The center line and triangle within the box represent the median and mean value, respectively. The box represents the interquartile range (IQR), with whiskers set to 1.5 times the IQR. Outliers are shown in points. Two-sided Mann-Whitney U tests were used to evaluate the differences between strong and weak TAD boundaries. U statistics and p-values are shown in the plot. Source data are provided as a Source Data file.Unlike trans-located caRNAs that peaked at all HFFc6 TAD boundaries, nascent transcripts in HFFc6 mostly accumulated at TAD boundaries unique to HFFc6 (Fig. 2b). They also tended to be more frequent at strong TAD boundaries than weak ones (Fig. 2c). Overall, these results indicate that nascent transcripts could also contribute to the formation of TAD boundaries, particularly cell-type-specific ones, aligning with the enrichment of TAD boundaries at active promoters and the barrier function of RNAPs3,13.CaRNAs increase the accuracy of 3D genome folding predictionsTo learn how caRNAs contribute to 3D genome organization beyond TAD boundaries and in an unbiased way, we developed a deep learning framework called AkitaR. The models we implemented extend Akita40 to predict chromatin interaction maps by incorporating both DNA sequence and RNA features extracted from nascent transcripts or trans-located RNAs (Supplementary Fig. 3). Similar to the original Akita, AkitaR uses 1D convolution neural networks to learn representations from ~1 Mb DNA sequence segments. The learned representations at the resolution of 2048 bp were subsequently concatenated with the RNA features, and dilated convolution neural networks were used to learn long-range dependencies. Lastly, 1D representations were averaged to 2D and further processed by dilated 2D convolutional neural networks to predict the ~1 Mb x 1 Mb contact matrices at 2048 bp resolution (Fig. 3a). These were contact frequencies after observed-over-expected normalization and log transformation (see Methods).Fig. 3: Chromatin-associated RNAs contribute to accurate prediction of 3D genome folding.a The architecture of the models in the AkitaR framework. b Barplots of performance for different types of AkitaR models on the held-out test set of genomic regions. Pearson’s correlation and mean squared error (MSE) between experimental and predicted contact maps are used as performance metrics. Error bars represent the mean ± standard error of the mean for each model type independently trained five times. Two-sided Mann-Whitney U tests were used to evaluate differences between all pairs of models. Every comparison was significant (p-value < =0.05) except those labeled as not significant (ns). U statistics and p-values for the comparisons are shown in Supplementary Data 1. Individual data points are shown as dots. c Violin plot of Pearson’s correlation of insulation tracks from observed test set maps versus predicted maps (n = 413) for the best model of each type. The box represents the interquartile range (IQR), with whiskers set to 1.5 times the IQR. The dot within the box represents the median value. d Examples showing better prediction of contact maps with nascent transcripts (top panel) or trans-located RNAs (bottom panel). The 3D genome contacts with better prediction are highlighted with green rectangles. Source data are provided as a Source Data file.We also designed additional models as controls or for comparison with the iMARGI based models (Supplementary Table 1 and Supplementary Fig. 3). For instance, since caRNAs are enriched in open chromatin, one of these models combined DNA sequence with features from chromatin accessibility (ATAC-seq) or ATAC-seq plus trans-located caRNAs. To disentangle the expression level of RNAs from their DNA contact frequencies and from nascent transcription, we incorporated steady-state transcription (RNA-seq). A control model with randomized signals from a standardized normal distribution was also built to alleviate the possibility that the improved performance was solely due to more features as input. Natural log transformations were applied on the RNA or open chromatin features before model fitting.We found that all models with additional informative features achieved better predictions than the model with DNA sequence alone as input (Fig. 3b–d, Supplementary Data 1 and Supplementary Figs. 4–6). This is consistent with results from models that incorporate epigenetic features such as CTCF binding or histone modifications44. Of the three RNA features, trans-located caRNA signals led to the AkitaR model with the highest performance, closely followed by nascent RNA, and then steady-state transcription (Fig. 3b, d, Supplementary Data 1, Supplementary Figs. 4, 6). On the other hand, at some regions, nascent RNA signals contributed to more accurate predictions than trans-located RNA inputs did (Fig. 3d, Supplementary Fig. 5). These results suggest that all the RNA features carry useful information about 3D genome folding, particularly trans-located caRNAs, though nascent transcription is more helpful at some loci. Adding chromatin accessibility signals yielded better performance than adding RNA features did (Fig. 3b, Supplementary Data 1 and Supplementary Fig. 4). However, adding trans-located caRNA plus chromatin accessibility signals achieved even better performance than chromatin accessibility signals alone (Fig. 3b, Supplementary Data 1 and Supplementary Fig. 4), suggesting that RNA-DNA interactions provide additional information beyond marking open chromatin. In support of this hypothesis, we found that incorporating trans-located caRNAs into the models increased the correlation between predicted and observed insulation signals at TAD boundaries (Fig. 3c). Thus, deep learning clearly highlights the information that RNA-DNA interactions carry about chromatin organization.CaRNAs help predict cell-type-specific genome foldingSince RNAs, particularly ncRNAs, are often expressed in cell-type-specific ways52, we hypothesized that the performance boost provided by incorporating RNA features into the AkitaR models would be most notable in regions with cell-type-specific genome folding. To evaluate this hypothesis, we first identified test regions that showed the largest differences in chromatin organization between H1ESC and HFFc6 based on MSE (34 regions) or MSE plus stratum-adjusted correlation coefficient (SCC) and structural similarity index measure (SSIM) (109 regions; see Methods) (Supplementary Fig. 7). We then evaluated the performance of our models in these cell-type-specific regions, and found that they showed a notably larger performance gap between models with additional features and the model with DNA sequence alone as compared to the ensemble of all test regions (Figs. 3b and 4a, b, Supplementary Data 1, Supplementary Figs. 8, 9). This finding demonstrates the capability of the AkitaR models to capture dynamic chromatin organization. Since genome compartmentalization correlates with RNA-chromatin interaction20, we further evaluated model performance on cell-type-specific regions with compartment changes. Interestingly, the model with trans-located caRNA signals achieved similar or even better performance than the model with chromatin accessibility signals on cell-type-specific regions with a compartment transition from B in H1ESC to the more active A compartment in HFFc6 (Fig. 4a, b, Supplementary Data 1, Supplementary Figs. 8, 9). This suggests the potential association of trans-located caRNAs with compartment transitions. Furthermore, by visually checking the cell-type-specific regions with better predictions from the models incorporating trans-located caRNAs, we observed that trans-located caRNAs helped capture some cell-type-specific chromatin interactions better than all other RNA and ATAC-seq features (Fig. 4c, Supplementary Fig. 10), prompting us to explore where these interactions mapped and what caRNAs they involved.Fig. 4: Chromatin-associated RNAs help predict cell-type-specific genome folding.a Barplots of Pearson’s correlation and mean squared error (MSE) between experimental and predicted contact maps on the cell-type-specific subsets (MSE > 0.3) and cell-type-specific subsets (MSE > 0.3) with compartment changes from B compartment in H1ESC to A compartment in HFFc6. Error bars in the barplots represent the mean ± standard error of the mean for each model type independently trained five times. Two-sided Mann-Whitney U tests were used to evaluate differences between all pairs of models. Every comparison was significant (p-value < =0.05) except those labeled as not significant (ns). U statistics and p-values for the comparisons are shown in Supplementary Data 1. Individual data points are shown as dots. b Stratified Pearson’s correlation between experimental and predicted contact maps for the best model of each type on the held-out test set, cell-type-specific subsets of test regions identified by MSE (MSE > 0.3) and cell-type-specific subsets (MSE > 0.3) with compartment changes from B compartment in H1ESC to A compartment in HFFc6. c An example showing the contribution of trans-located RNAs to the prediction of some chromatin interactions. The regions with better prediction are highlighted with green rectangles. Source data are provided as a Source Data file.Nuclear landmarks associate with trans-located caRNAsTo identify the caRNAs that contribute to 3D genome organization and the DNA regions to which they associate, we used DeepExplainer53,54, which allowed us to quantify the importance of each RNA and ATAC-seq feature to the contact map predictions. DeepExplainer generates a score for each feature at each 2048 bp bin (see Methods). A negative score indicates that contact frequency decreases when the feature increases at that bin (e.g., caRNA is associated with loss of a chromatin loop or increased insulation at a TAD boundary); a positive score indicates that contact frequency rises when the feature increases. We observed that the distributions of contribution scores for the multiple RNA types were asymmetric, with slightly elongated left tails (Supplementary Fig. 11), hinting that RNA-DNA interactions may be more linked to insulation than to enhancing chromatin interactions. Though the absolute contribution scores of caRNA features showed moderate positive correlation with caRNA signals, many genomic bins with high caRNA signals received low contribution scores, indicating that our models learned where the caRNAs might contribute to genome folding (Supplementary Fig. 12). Trans-located caRNAs, which we already showed are enriched at TAD boundaries (Fig. 2b, c), tended to have higher absolute contribution scores at TAD boundaries than at their flanking regions (Fig. 5a). In contrast, the contribution scores of nascent transcripts were less elevated at TAD boundaries and remained high in flanking regions, as we might expect when there is active transcription within TADs.Fig. 5: Chromatin-associated RNAs are associated with TAD boundaries, loop anchors and nuclear structures.a The absolute contribution scores of nascent transcripts (left panel) and trans-located caRNAs (right panel) at TAD boundaries and their flanking regions. b Heatmap showing enrichment (log2 enrichment score) of genomic regions with top 1% (positive) and bottom 1% (negative) contribution scores of each type of caRNA across TAD boundaries, loop anchors, Spatial Position Inference of the Nuclear genome (SPIN) and chromHMM states. c Example of four RNAs that preferentially interact with genomic regions with high absolute contribution scores (top 1%, top 5%, top 10%, bottom 1%, bottom 5%, bottom 10%) rather than with regions with lower absolute contribution scores (middle 10%). d The proportion of RNAs transcribed from repetitive elements for the interactions between DNA and RNAs derived from unknown genes. e The proportion of the RNA sequences in the candidate interactions that might form R-loops originated from Alu sequences and the proportion of the corresponding DNA sequences were annotated as Alu elements. f The proportion of TAD boundaries (left panel) and loop anchors (right panel) having RNA-DNA interactions that could form R-loops. g An example locus showing candidate R-loops at Alu elements illustrating the contribution of trans-located RNAs to the prediction of chromatin interactions. The Alu elements that may form R-loops with RNAs at the loop anchors of a chromatin interaction that was specifically predicted by the model with trans-located caRNAs are shown in the track denoted Repetitive element. The candidate R-loops are numbered as 1, 2, 3 and 4 at the associated Alu elements. The best local alignment (with gaps) of the RNA and DNA sequences of the candidate R-loops (the complementary DNA sequences to RNAs are not shown) is shown. The nucleotides matching between RNA and DNA sequences are highlighted in red. Interior_Act 1: Interior Active 1, Interior_Act 2: Interior Active 2, Interior_Act 3: Interior Active 3, Interior_Repr1: Interior Repressive 1, Interior_Repr2: Interior Repressive 2, Near_Lm1: Near Lamina 1, Near_Lm2: Near Lamina 2. Source data are provided as a Source Data file.To test the hypothesis that AkitaR has learned a relationship between caRNA features and TAD boundary insulation, we performed two simulations. First, we generated 1000 random sequences of ~1 Mb and introduced TAD boundaries at randomly selected loci by inserting convergent CTCF motifs. Progressively adding 1 to 4 CTCF motifs to increase the insulation strength led the sequence model to predict a decrease in average contact frequency (Supplementary Fig. 13a), suggesting a possible negative correlation between insulation strength and average contact frequency. Next, we selected 60 regions from the held-out test dataset with relatively simple structures and gradually increased the caRNA signal of each RNA type at selected TAD boundaries. We observed an increase of insulation strength for most of the RNA types, and the majority of them showed a decrease of the average contact frequency, including lncRNAs (Supplementary Fig. 13b). These simulation results suggest that AkitaR has inferred a potential causal role of trans-located caRNAs in strengthening TAD boundaries.To further characterize the regions where caRNAs might shape chromatin organization, we ranked the contribution scores for each feature type and identified the genomic regions with scores in the top (positive contribution scores) or bottom (negative contribution scores) 1%. We found that the regions with bottom snRNA scores were preferentially located in nuclear speckles, aligning with their well-established roles in pre-mRNA splicing within nuclear speckles (Fig. 5b)55,56. Additionally, we observed that regions with top snoRNA scores were enriched in loci annotated as Interior_Repr2 (Interior Repressive 2) by SPIN (Fig. 5b), which was putatively associated with nucleoli49 where snoRNAs function57. These two expected associations validate the capability of AkitaR to capture the functional roles of caRNAs.Beyond these cases, we observed that genomic regions with bottom scores across RNA types were enriched at active chromatin, CTCF peaks, active promoters, enhancers, TAD boundaries and loop anchors (Fig. 5b). LncRNAs, snoRNAs and RNAs from unknown genes were particularly enriched at CTCF peaks, stable TAD boundaries between H1ESC and HFFc6, and shared TAD boundaries with higher insulation in HFFc6 (Fig. 5b), suggesting that these RNAs contribute to TAD boundaries, potentially by recruiting or stabilizing CTCF, in active chromatin. Interestingly, snoRNAs showed enrichment in nuclear speckles, consistent with the increasing evidence of the regulatory roles of some box C/D snoRNAs in alternative splicing58,59,60. On the other hand, regions with top scores were predominantly found in heterochromatin, particularly near lamina or at lamina associated regions (Fig. 5b), indicating that caRNAs play different roles at active and repressed chromatin, potentially via different mechanisms. CaRNAs from protein coding genes, however, showed very different patterns from all other caRNAs, with bottom rather than top scoring bins being enriched in compartment B and in particular at TAD boundaries with elevated insulation in HFFc6. Extending this analysis to the top and bottom 5% regions produced similar patterns, indicating that the enrichments are robust to the contribution score threshold (Supplementary Fig. 14).In contrast to these nuanced patterns that differ across caRNA types and between active versus repressed chromatin, ATAC-seq features were significantly enriched in active chromatin, regardless of whether they had top or bottom scores (Supplementary Fig. 15). These findings suggest that AkitaR captures differences between caRNA-DNA interactions and chromatin accessibility, motivating us to explore specific trans-located caRNAs. To further disentangle the independent effects of caRNAs beyond their association with open chromatin, we identified specific DNA regions where trans-located caRNAs have high absolute contribution scores and ATAC-seq features do not (absolute normalized contribution score > 0.25, |fold change | >5) (Supplementary Fig. 16). These regions showed similar chromatin and SPIN state enrichments as the top and bottom scoring regions more generally (Supplementary Fig. 16), confirming the contribution of trans-located caRNAs to chromatin features beyond chromatin accessibility.CaRNAs may form trans R-loops with Alu sequencesTo identify the caRNAs with the largest contributions to AkitaR’s chromatin map predictions, we ranked them based on their association with DNA regions that have high absolute contribution scores (top or bottom 5% for each RNA type; Supplementary Data 2). We observed that the top 10 RNAs were all highly prevalent in HFFc6 (Supplementary Fig. 17 and Supplementary Data 2). These included multiple ncRNAs previously known to play roles in chromatin structures, such as lncRNAs MALAT1 and NEAT1 and snRNAs RNU2-2P, RNU12 and RN7SK16,26,29,35,61,62 (Fig. 5c, Supplementary Fig. 18 and Supplementary Data 2). Interestingly, all top 10 snoRNAs are C/D box snoRNAs, including SNORD47, SNORD79 and SNORD27 (Supplementary Fig. 18). Beyond these, many RNAs without previous evidence for roles in chromatin organization stood out, such as lncRNAs RNY5, RPPH1, POLG-DT and differentially expressed lncRNAs between H1ESC and HFFc6, such as THBS1-IT1 and ENSG00000260772. In addition, these lncRNAs were preferentially associated with regions with high absolute contribution scores compared to the regions with low scores (Fig. 5c, Supplementary Fig. 18). Since the pattern of enrichment of these caRNAs mirrors that of MALAT1 and NEAT1, we hypothesize that these caRNAs also play mechanistic roles in 3D genome organization.To explore the caRNAs that might shape genome structure over chromatin accessibility, we investigated the caRNAs (top10) that were preferentially associated with genomic regions having higher absolute trans-located caRNA contribution compared to chromatin accessibility. We identified a list of RNAs that was nearly identical to the one identified genome-wide (Supplementary Data 2). However, some RNAs were found to preferentially interact with these differentiated regions but were not enriched in top or bottom scoring regions overall. These included the ncRNAs ZNRF3-AS1, MIR6726 and MIR4796, which not only showed higher interaction ratios with the differentiated regions but also interacted with more than one of these DNA regions (Supplementary Data 3 and Supplementary Fig. 19). These caRNAs are high-confidence candidates for contributing to nuclear structures in specific ways beyond being generally associated with accessible chromatin.Since genomic regions with bottom contribution scores from RNAs of unknown genes were enriched at TAD boundaries and loop anchors, we further explored the interactions between DNA and RNAs derived from unknown genes. We found that around 37% of these RNAs were transcribed from repetitive elements (Fig. 5d). Since Alu sequences were proposed to promote long-range enhancer-promoter interactions, possibly through R-loops25,63, we aligned the sequences of each pair of DNA-RNA trans interactions in HFFc6 using the local alignment function of the python module pairwise2 in search of potential candidates for R-loop formation. Around 0.3% of trans interactions were considered as candidates by exhibiting over 80% identity between RNA and DNA sequences plus continuous, uninterrupted perfect matches exceeding 10 base pairs. We found that 32% of the RNA sequences in these candidate interactions originated from Alu sequences, and 93% of the DNA sequences were annotated as Alu elements (Fig. 5e). Furthermore, these candidate interactions tended to increase at stable TAD boundaries, TAD boundaries having higher insulation strength in HFFc6 or unique to HFFc6 in contrast to TAD boundaries with higher strength in H1ESC or unique to H1ESC (Fig. 5f). The same trend was also observed for loop anchors (Fig. 5f), aligning with the roles of Alu sequences in long-range enhancer-promoter interactions. More importantly, both loop anchors of the cell-type-specific interaction that was captured by the models with trans-located caRNAs but no other models in Fig. 4c could form trans R-loops at Alu sequence loci (Fig. 5g). This provides a potential mechanism for loop formation at loci with trans Alu RNA-DNA interactions, demonstrating the capability of the AkitaR model to capture these interactions and generate testable, mechanistic hypotheses.

Hot Topics

Related Articles