LoDEI: a robust and sensitive tool to detect transcriptome-wide differential A-to-I editing in RNA-seq data

The goal of LoDEI is to detect biologically relevant differences in A-to-I editing between two sets of RNA-seq samples (Fig. 1). To identify differential A-to-I editing, the detection method of LoDEI can be separated into two high-level steps:Fig. 1: Schematic overview of differential A-to-I editing index calculation.a Two sets of samples S and \(S^{\prime}\) with different conditions undergo RNA-sequencing followed by standard RNA-seq data analysis of QC checking and read alignment. b and c Next, differential A-to-I editing signals are calculated by LoDEI’s sliding window approach (b) yielding the final output table (c).

1.

A sliding window estimates the A-to-I editing signal for each set and calculates the difference between the editing signals of the two sets.

2.

To separate calculated A-to-I editing signal differences caused by true A-to-I events from noise, the same sliding window approach is applied to non-A-to-I mismatches (all non-A  → G mismatches), generating the data to allow calculating q values empirically.

This paper proceeds with a precise description of the A-to-I editing signal calculation and q-value estimation used by LoDEI, followed by an analysis of A-to-I and non-A-to-I signals in various datasets to demonstrate the general applicability of an empirically q-value calculation for differential A-to-I signal detection. Finally, we first assess LoDEI’s A-to-I editing detection performance by comparing findings with results of the global A-to-I detection as provided by the AEI, and second, we compare the differential A-to-I editing detection of LoDEI with results from the site-specific differential A-to-I detection tools REDIT and JACUSA2.Calculating the A-to-I editing difference between two sets of samplesThe general idea to account for the editing behavior of ADAR1 and keeping as much positional information as possible is addressed by estimating A-to-I editing using a sliding window approach. Having two sets of samples of RNA-seq data, LoDEI first estimates the A-to-I editing signal for each sample within a window individually. The mean of the estimated editing signals of a window of all samples of a set is calculated, and the difference between the means of the windows of two sets defines the change in RNA-editing (Fig. 1).More formally, let S and \(S^{\prime}\) be two sets of samples s of RNA-seq data of two different conditions. The observed mismatch counts \({m}_{i,s}^{x\to y}\) of nucleotide x to y, and the sum of all matches and mismatches ci,s at genomic position i in sample s define the editing ratio$${r}_{i,s}^{A\to G}=\frac{{m}_{i,s}^{A\to G}}{{c}_{i,s}}.$$
(1)
We then define the editing signal es,w (Fig. 1b [1]) for a sample s and a sliding window w with genomic start and end positions k and l as$${e}_{s,w}^{A\to G}={\sum }_{i=k}^{l}{r}_{i,s}^{A\to G}.$$
(2)
Note, LoDEI uses non-overlapping windows with a default size of 51 nucleotides. Using the editing signals of all samples of a set S for a given window w, we can calculate$${z}_{S,w}^{A\to G}=\frac{1}{| S| }{\sum}_{s\in S}{e}_{s,w}^{A\to G},$$
(3)
where \({z}_{S,w}^{A\to G}\) is the mean of the editing signals of all samples of a set and serves as an estimate for the A-to-I editing of set S (Fig. 1b, [2]).After performing the same calculation for \(S^{\prime}\), the change of the editing signal \({\delta }_{S,S^{\prime},w}\) (Fig. 1b, [3]) between sets S and \(S^{\prime}\) is given by$${\delta }_{S,S^{\prime},w}^{A\to G}={z}_{S^{\prime},w}^{A\to G}-{z}_{S,w}^{A\to G},$$
(4)
and describes the difference of the A-to-I editing between the sets S and \(S^{\prime}\) for a given window w.Next, we propose an empirical q-value estimation for \({\delta }_{S,S^{\prime},w}^{A\to G}\) values based on non-A → G differences to detect true editing events and differentiate those from false positive events. A q value gives each δ value its own individual measure of significance, similar to a p value. A p value is a measure in terms of the false positive rate, whereas the q value is a measure in terms of the false discovery rate and the typical quantity of interest in genome-wide testing34. The q value provides a measure of significance for each window in such a way that filtering of windows with a q value ≤ 10% yields an overall FDR of 10% among the filtered windows.For the q value estimation, we exploit that most non-A → G differences are not generated by any editing processes and are likely to derive from various sources like PCR or NGS-induced errors and SNPs. The estimated q value of an A-to-I editing difference δA→G is the expected proportion of false positives among all differences as or more extreme than the current one. The key to estimating a q value for an observed change of the editing signal is to approximate the number of expected false positives.Considering A → G mismatches as a mixture of A-to-I editing events and noise, the eleven remaining non-A → G mismatches (e.g. A → C, G → A, etc.) serve as the basis for approximating expected false positives.Let Dx→y be the set of δx→y values calculated by LoDEI, where x and y symbolize a non-A →G mismatch used to calculate the differences. First, LoDEI generates an individual set Dx→y for each non-A → G mismatch. Each set approximates the number of expected false positives for a given value of δ and thus yields an individual q value estimate$$\hat{q}(\delta,{D}^{A\to G},{D}^{x\to y})=\frac{\,{{\rm{\#signals}}}\,\,\le \delta \,\,{{\rm{in}}}\,\,{D}^{{\rm{x\to y}}}}{\,{{\rm{\#signals}}}\,\,\le \delta \,\,{{\rm{in}}}\,\,{D}^{{\rm{A\to G}}}},$$
(5)
for each non-A → G set. The median of the eleven estimated q values is the final estimated q value for a given A-to-I difference δA→G. Note, for δ < 0 the number of signals ≤δ, and for δ > 0 the number of signals ≥δ is used to estimate the number of false positives.The stronger the calculated δ values by LoDEI, the smaller the number of those observed differences. This negative correlation can lead to strong variability of the estimated q values if the number of observed δ values is small. Typically, the q value decreases the stronger the values for δA→G get. For extreme values of δA→G, where the variability of the q value estimate increases, situations can arise where higher signals get higher q values again, while lower values of δA→G already had lower q values (Supplementary Fig. 8). These situations are caused by a limited number of available windows and not by underlying biology. Consequently, we assume increasing q values for strong values of δ is counterintuitive and avoid varying q value estimates by not allowing those estimates to increase again.Finally, LoDEI reports a BED-like output file including the differential editing signals of all windows, their genomic coordinates, and corresponding q values (Fig. 1c).LoDEI confirms known and reveals novel regulators of A-to-I editingTo demonstrate the general applicability of LoDEI’s differential editing index calculation and the empirical q value estimation, we analyzed differential A-to-I editing in four different human RNA-seq datasets with 27 samples produced by four different laboratories to cover a broad spectrum of protocols. The first two of the four datasets are known to show differential A-to-I editing and are used to show the general applicability of LoDEI’s approach. All datasets show a significant change in the gene expression in at least one of the genes of the ADAR family (Supplementary Table 2).The first analyzed RNA-seq dataset consisting of two samples per set is known to show a strong reduction of A-to-I editing upon siRNA-induced ADAR1 knockdown (KD) in the glioblastoma cell line U87MG when compared to a control group16. Other RNA-binding proteins are known to regulate A-to-I editing besides ADAR, and several publications could show an increase in A-to-I editing upon the reduction of RO60 (TROVE2)24,37,38. Hence, we used the RNA-seq dataset consisting of two control and three knockout samples derived from the RO60 knockout (KO) Lymphoblastoid cell line GM12878 as the second dataset17.After evaluating the general applicability of LoDEI on the ADAR KD and RO60 KO datasets, we applied LoDEI to novel datasets to search for differential A-to-I editing.Across all datasets, a consistent contrast between A → G and non-A → G differences can be observed. Non-A → G differences, such as G → A, show a different pattern compared to A → G differences (Fig. 2 left-column (a, d, g, j) versus middle column (b, e, h, k)). Strong editing differences are almost exclusively observed for A → G differences (Fig. 2 middle column (b, e, h, k), orange), and the shape of weak A → G differences resembles the shape of non-A → G differences represented by G → A values, supporting the general applicability of LoDEI’s window-based approach to describe differential A-to-I editing and the usage of non-A → G differences for the estimation of false positive signals.Fig. 2: Observed signal differences and empirically derived q values.Rows correspond to the ADAR1 KD (a–c), RO60 KO (d–f), MYCN-amp (g–i), and sncRNA7SL OE (j–l) datasets. The left and middle columns show Bland–Altman plots51 of δG→A values considered as noise (left column), and δA→G values considered as a mixture of signal and noise (middle column). The third column shows empirical q values. Orange lines show the signal cutoffs derived by LoDEI corresponding to a q value ≤ 0.1 in columns 1 and 2 and a q value of 0.1 in column 3.To further support LoDEI’s general applicability, we analyzed differential A-to-I editing in non-human datasets. We analyzed RNA-seq data from ADAR mutant and wildtype C. elegans worms and observed the same consistent contrast between A → G and non-A → G differences as in the human datasets (Supplementary Fig. 12). Strong editing differences are almost exclusively observed for A → G differences.LoDEI confirms known regulators of A-to-I editingIn the ADAR1 KD dataset, A → G differences calculated by LoDEI show a unidirectional, reduced A-to-I editing in ADAR1 KD cells. Concordantly, q values ≤ 0.1 only appear for negative δA → G values (Fig. 2c, orange line). Besides A → G differences, LoDEI detects a small number of T → C mismatches (Supplementary Table 1).Taken together, LoDEI can calculate differential A-to-I editing and confirms data showing a reduction of A-to-I editing upon ADAR1 knockdown.Applying LoDEI on the RO60 KO dataset reveals that all A → G differences with a q value ≤ 0.1 are exclusively detected with a positive sign (Fig. 2e, f; orange line/points), indicating that A-to-I editing is increased in RO60 KO cells. No other significant changes caused by other types of mismatches are found in this dataset. LoDEI confirms that RO60 represses A-to-I editing and that LoDEI is applicable for identifying factors or conditions regulating A-to-I editing.To demonstrate potential novel discoveries facilitated by LoDEI’s results, we selected a LoDEI window with differential A-to-I editing in the open reading frame of the SRP9 gene in the RO60 dataset. The analysis of the window in the IGV genome browser revealed two potential silent mutations and one differentially edited site potentially leading in about 30% of the mRNAs to a serine to glycine mutation resulting in an expression of an SRP9 protein isoform (Supplementary Fig. 9).LoDEI reveals novel regulators of A-to-I editingLoDEI reported expected negative as well as positive differential A-to-I editing from published datasets with known negative (ADAR1 KD) and positive effects on A-to-I editing (RO60 KO).Next, we applied LoDEI on a third dataset using previously published RNA-seq data of different neuroblastoma cell lines, a more challenging scenario compared to the analysis of data from the same cell line39. The eight samples were grouped into MYCN-amplified (SK-N-BE(1), LAN-1, IMR-5, CHP-212) and MYCN-non-amplified cell lines (SK-N-AS, SK-N-SH, SH-SY-5Y, SK-N-F1) to analyze the impact of MYCN-amplification (MYCN-amp) on A-to-I editing. First, similar to the ADAR1 KD and RO60 KO datasets, non-A → G differences, such as G → A, show a different pattern compared to A → G differences (Fig. 2g, h). All A → G differences with a q value ≤ 0.1 are exclusively detected with a positive sign (Fig. 2h, i; orange line/points), highlighting an increased A-to-I editing in MYCN-amp cells. To confirm these findings, we manually inspected detected differential A-to-I edited windows in the IGV genome browser verifying increased A-to-I editing in MYCN-amp cells (Supplementary Fig. 1). Another analysis of a LoDEI window in the MYCN-amp dataset, located in the open reading frame of the CHAF1A (Chromatin Assembly Factor 1 Subunit A) gene shows a differential editing site that is also a 3’ splice acceptor site. The editing of this site could affect the splicing of the CHAF1A mRNA and further suggest that the increase in A-to-I editing associated with MYCN-amplification could have an impact on CHAF1A expression (Supplementary Fig. 10). By applying LoDEI, we show for the first time that A-to-I editing is positively regulated in MYCN-amp cell lines.In the fourth dataset, comprised of 5 replicates per set, we tested the hypothesis that a small non-coding RNA with complementarity to reverse Alu elements modulates A-to-I editing40. The neuroblastoma cell line Kelly was transfected with a control RNA or the small non-coding RNA sncRNA7SL (piR-hsa-1254, DQ571003, piR31115) yielding an overexpression of sncRNA7SL (sncRNA7SL OE). A → G differences with a q value ≤ 0.1 are exclusively detected in the negative direction, demonstrating a unidirectional and reduced A-to-I editing in sncRNA7SL OE cells compared to control-treated cells (Fig. 2k, l; orange line/points, Supplementary Fig. 2). Besides unidirectional A → G differences, LoDEI detects more G → C differences compared to the number of A → G differences in the sncRNA7SL OE dataset (Supplementary Table 1). Taken together, reduced A-to-I editing is detected by LoDEI upon sncRNA7SL transfection.Global changes in A-to-I editing identified by LoDEI are supported by the Alu Editing IndexFrom a global perspective, A-to-I editing differences with q values ≤ 0.1 as identified by LoDEI show a unidirectional change in all four datasets (Fig. 2). All windows with a q value ≤ 0.1 in the ADAR1 KD and sncRNA7SL OE cells show a decrease, and the RO60 KO and MYCN-amp cells show an increase of A-to-I editing, underscoring the complex and dynamic regulation of A-to-I editing.Here, we compare these global trends detected by LoDEI with the results of the Alu Editing Index (AEI). The AEI reports a single value for each sample that summarizes the global amount of A-to-I editing of a sample and thus does not keep any positional information of the A-to-I editing events. We calculate the AEI for each sample and take the mean of all AEI values of a set as a representative of the A-to-I editing (Table 1, Supplementary Fig. 3).Table 1 Comparison of global trends of LoDEI and the Alu editing indexThe means of the AEI values are smaller for ADAR1 KD cells and sncRNA7SL OE cells compared to their control counterparts and support the global characteristics of a decrease of the A-to-I editing as detected by LoDEI (Table 1 rows 1 and 2). Both tools report a strong reduction in A-to-I editing in the ADAR1 KD cells and a small reduction in the sncRNA7SL OE cells. Similarly, both tools calculate an increase in A-to-I editing in RO60 KO and MYCN-amp cells (Table 1 rows 3 and 4).In the ADAR1 KD dataset, the observed difference of  −1.17 of the average AEI values is much stronger compared to the change of  −0.06 in the sncRNA7SL OE dataset (p values < 0.01). Consistently, LoDEI detects more windows with decreased A-to-I editing in the ADAR1 KD dataset (1624 windows with q ≤ 0.1, Table 1) compared to the sncRNA7SL OE dataset (64 windows with q ≤ 0.1).In contrast, the AEI differences of 0.09 and 0.18 indicate an increase—though not statistically significant—of the A-to-I editing for the RO60 KO and MYCN-amp cells. Likewise, LoDEI detects 114 and 217 differential A-to-I windows with increased (δA→G > 0) A-to-I editing and q values ≤ 0.1 in these datasets.In summary, the global trends detected by LoDEI are supported by the AEI. The number of found windows by LoDEI and the differences in the AEI values show a Pearson correlation of 0.99. Both tools identify an overall decrease of A-to-I editing in ADAR1 KD and sncRNA7SL OE cells and an increase in RO60 KO and MYCN-amp cells.LoDEI detects more A-to-I editing at the same FDR compared to alternative methods while maintaining signals found by single-site approachesTo the best of our knowledge, no window-based approach, including a statistical framework for differential A-to-I editing detection, exists besides LoDEI. Due to the lack of a window-based competitor, we used the single-site detections offered by REDIT and JACUSA2 to assess and compare the differential A-to-I editing detection performance. To obtain q values from REDIT and JACUSA2 for a direct comparison with results from LoDEI, we used a similar procedure as utilized in LoDEI. Therefore, REDIT and JACUSA2 were applied on G/A mismatches to approximate the number of false positives to finally calculate q values for detected differential A-to-I editing.Overall, LoDEI detects more A-to-I editing sites at any q value threshold for all datasets compared to REDIT and JACUSA2 (Fig. 3). REDIT and JACUSA2 only detect differential A-to-I editing in the ADAR1 KD dataset (Fig. 3a) and could not detect any differential A-to-I editing in the other three datasets at reasonable q values (Fig. 3b–d). LoDEI’s detected windows can contain both single and clusters of A-to-I editing sites (Supplementary Figs. 4 and 14).Fig. 3: Performance comparison of differential A-to-I site detection.Shown are the number of detected differentially edited A-to-I sites as a function of the q value threshold for the ADAR1 KD (a), RO60 KO (b), MYCN-amp (c), and sncRNA7SL OE (d) datasets.At a q value threshold of 0.05, LoDEI detects 1231 differentially edited non-overlapping windows containing 79% (959) of the 1219 single sites found by REDIT in the ADAR1 KD dataset, demonstrating that LoDEI detects the majority of single sites detected by REDIT. In contrast, 52% (639) out of the 1231 windows detected by LoDEI contain 1795 single sites that are exclusively found by LoDEI and missed by REDIT (Supplementary Figs. 5 and 6).With 2276 found sites at the same q value of 0.05, JACUSA2 detects more A-to-I sites than REDIT, and less than LoDEI. Of those 2276 JACUSA2 sites, 78% (1784) are overlapping with the 1231 windows detected by LoDEI showing that the majority of JACUSA2 sites are detected by LoDEI (Supplementary Fig. 5). The 26% (322) of windows uniquely found by LoDEI correspond to 818 sites missed by JACUSA2.Overall, more sites are found by LoDEI at the same FDR compared to REDIT and JACUSA2 in all datasets (Fig. 3). In experiments with strong A-to-I editing changes, LoDEI can detect differential A-to-I editing between single samples (Supplementary Figs. 15 and 16).LoDEI detects A-to-I editing at different genomic locationsIn the ADAR1 KD datasets the majority of all detected differential A-to-I sites by LoDEI, REDIT, and JACUSA2 are located in 3’UTRs for all datasets (Fig. 4). However, whereas LoDEI detects A-to-I editing in all genomic regions of the RO60 KO, MYCN-amp and sncRNA7SL OE datasets, REDIT and JACUSA2 do not detect any editing, suggesting a high sensitivity for A-to-I editing by LoDEI. Most differential edited sites detected by LoDEI are located in the 3’UTR, followed by sites in introns, exons, and in 5’UTRs. This order of detected locations remains identical for all datasets, but the relative occurrences differ slightly between the datasets (compare Fig. 4a, b vs. Fig. 4c, d).Fig. 4: Genomic locations of detected differentially edited A-to-sites.The number of differentially edited A-to-I sites found by LoDEI (red), REDIT (blue), and JACUSA2 (light blue) at a q value threshold of 0.1 at different genomic locations are shown for the ADAR1 KD (a), RO60 KO (b), MYCN-amp (c), and sncRNA7SL OE (d) datasets.Taken together, we demonstrate that LoDEI detects differential A-to-I editing with higher sensitivity in all regions of mRNAs when compared to REDIT and JACUSA2.Differential A-to-I events detected by LoDEI overlap with known editing sites in the REDIportalWe tested whether A-to-I events detected by LoDEI are listed in the REDIportal database to provide further evidence for real differential editing events. As of this writing, the REDIportal consists of around 16 million A-to-I sites based on 9642 human RNAseq samples from 31 tissues of the GTEx project23.In general, for small q value thresholds, most detected A-to-I events by LoDEI overlap with editing sites listed in the REDIportal, and the percent of overlap decreases with increasing q value thresholds (Fig. 5). At a q value threshold of 0.05, 75.9%, 78.1%, 65.3%, and 93.8% of A-to-I events detected by LoDEI in the ADAR1 KD, RO60 KO, MYCN-amp, and sncRNA7SL OE datasets respectively, are overlapping with the REDIportal database. Of note, since REDIT and JACUSA2 did not detect differential A-to-I editing in all but the ADAR KD dataset, only minor or no overlap of the REDIT and JACUSA2 datasets with the REDIportal database was found, except for the ADAR KD dataset.Fig. 5: Overlap of detected differentially edited A-to-I events with REDIportal A-to-I sites as a function of the q value.The percent of overlap of detected differentially edited A-to-I events found by LoDEI (red), REDIT (blue), and JACUSA2 (light blue) with REDIportal A-to-I sites are shown for the ADAR1 KD (a), RO60 KO (b), MYCN-amp (c), and sncRNA7SL OE (d) datasets.Next, we performed the REDIportal overlap analysis for the different genomic locations (Fig. 6). Again, only in the ADAR KD dataset a strong overlap with A-to-I sites available in the REDIportal database was found for all tools, however, in the RO60, MYCN, and sncRNA7SL datasets, with moderate changes in A-to-I editing (Fig. 2), only differential A-to-I editing sites detected by LoDEI are also found in REDIportal.Fig. 6: Overlap of detected differentially edited A-to-I events with REDIportal A-to-I sites for different genomic locations.The percent of overlap of detected differentially edited A-to-I events found by LoDEI (red), REDIT (blue) and JACUSA2 (light blue) at a q value threshold of 0.1 at different genomic locations with REDIportal A-to-I sites are shown for the ADAR1 KD (a), RO60 KO (b), MYCN-amp (c), and sncRNA7SL OE (d) datasets.Taken together, the majority of A-to-I sites detected by LoDEI are also found in REDIportal. Even sites detected by LoDEI but not detected by REDIT or JACUSA2 were found in REDIportal, supporting the high sensitivity and robustness of LoDEI.

Hot Topics

Related Articles