An updated resource for the detection of protein-coding circRNA with CircProPlus

Overview of the CircProPlus workflow and characteristicsWe developed CircProPlus from CircPro by updating functional modules and optimizing the overall workflow for better data mining. Overall, CircProPlus also implements an automated computational pipeline for de novo detection of protein-coding circRNAs (Fig. 1). However, major modifications listed as follows are applied to boost software performance. (1) Module 1: De novo circRNA detection. CIRI2 is set by default for circRNA detection while other algorithms are available for users to customize their circRNA analysis. Well-established genome index files will be reused whereby parallel calculation is allowed. (2) Module 2: Coding potential prediction. Newly developed CPC2 and CPAT are implemented to crosscheck prediction results. (3) Module 3: Ribo-seq reads identification. Ribo-seq reads are allowed to align to any site of circRNA other than BSJ. As a result, CircProPlus offers users with more options and availability for harnessing the power of state-of-the-art algorithms. Meanwhile, CircProPlus improves the detection of circRNAs bound with Ribo-reads, providing a much larger repertoire for translated circRNAs. Additionally, by rendering genome indexes reusable, repeated steps are avoided for each run and parallel calculation is available. Taken together, these improvements remarkably reduce computational load while prominently boost performance of CircProPlus by offering more flexibility, accessibility and reliability.Figure 1The workflow of CircProPlus. Three functional modules constitute the pipelines of CircProPlus. BSJ, back-splicing junction.Performance of CircProPlus in processing circRNA-seq dataTo minimize the impacts of linear RNAs, poly(A)-depleted RNA library (RNase R treated) is highly recommended for an efficient characterization of genome-wide circRNAs. We first tested the performance of CircProPlus using real circRNA-seq data and matched Ribo-seq reads from human breast tissue. Similarly, CIRI2 and CirComPara2 were both adopted by CircProPlus for circRNA detection. At the same computer hardware level, we found that CIRI2 implementation yielded 20,777, 9754, 9110 and 18,153 translated circRNAs in each sample, whereas CirComPara2 discovered 42,917, 31,455, 23,431 and 35,418 translated circRNAs in corresponding groups (Fig. 2a, Table S1, 2).Figure 2The number of translated circRNAs detected by CircProPlus from circRNA-seq reads (RNase R treated) using CIRI2 or CirComPara2 implement. (a, b) Performance of CircProPlus was tested in human (a) and mouse (b) datasets. Paired t test.Next, we also applied CircProPlus to analyzing mouse sequencing data. As expected, 2175, 4541 translated circRNAs were collected in two embryonic stem cells (ESCs) groups while 5055, 7670 translated circRNAs were distinguished in neural progenitor cells (NPCs) groups by CircProPlus using CIRI2 implement (Fig. 2b, Table S1, 2). CirComPara2, by contrast, retrieved much more translated circRNAs, adding up to 11,540, 19,263 translated circRNAs in ESC groups and 20,438, 28,785 in NPC groups when integrated with CircProPlus.Performance of CircProPlus in analyzing total RNA-seq dataFor deep sequencing of RNase R untreated libraries, linear RNAs are profoundly enriched, leaving only a small fraction of circRNAs attainable. We next asked whether CircProPlus could still efficiently discern circRNAs from total RNA-seq data, an alternative, potential source of circRNAs. Cumulatively, 3237, 5268, 3786 and 5730 translated circRNAs in each human sample were revealed by implementing CIRI2 within CircProPlus, which was in contrast to 12,194, 17,776, 15,581 and 20,167 translated circRNAs retrieved by replacing CIRI2 with CirComPara2 (Fig. 3a, Table S3,4).Figure 3The number of translated circRNAs detected by CircProPlus from total RNA-seq reads (RNase R untreated) using CIRI2 or CirComPara2 implement. (a, b) Performance of CircProPlus was tested in human (a) and mouse (b) datasets. Paired t test.Besides human samples, mouse RNA-seq reads of RNase R untreated libraries were also utilized for test. Consistently, CIRI2-implemented CircProPlus discovered only 661, 893 translated circRNAs in ESC groups and 1721, 2133 in NPC samples, exhibiting inferior efficacy than CirComPara2-boosted CircProPlus whose outputs add up to 3527, 4305 translated circRNAs in ESC groups and 7779, 7824 in NPC samples (Fig. 3b, Table S3, 4).Runtime and memory consumptionCircProPlus inherited the highly modularized framework from CircPro and improved its redundant computational design. For example, genome index files are generated only once. Meanwhile, CPC implementation, the most time-demanding tool in CircPro whose running speed is decided by online environment, has been upgraded to much faster CPC2. As a result, CircProPlus using default CIRI2 implementation finished each run within hours using a maximum of 32 threads. Besides, the running time of CircProPlus almost scaled linearly with the number of input reads, suggesting a positive association of them (Fig. 4a,b). Peak memory, compared to the running time, exhibited mild variation with the growth of input reads.Figure 4Runtime of CirComPara2 with CIRI2 implement compared to the amount of processed reads from circRNA-seq (a) or RNA-seq (b) reads using a maximum of 32 threads. Simple linear regression was performed.

Hot Topics

Related Articles