RankCompV3: a differential expression analysis algorithm based on relative expression orderings and applications in single-cell RNA transcriptomics | BMC Bioinformatics

Performance on the null datasetWe tested the FPR using the null dataset of profiles of 80 cells lysed from frozen mouse embryonic stem cells under the same culture conditions, provided by Grün et al. [35]. The dataset was randomly divided into two subsets, each with 40 profiles. In 10 randomized trials with a false discovery rate (FDR) < 0.05 as the threshold, no or very few DEGs were detected by RankCompV3, Bimod, LR, MAST, DEsingle, Wilcoxon, edgeR, limma, and scDD algorithms. On average, 43.9, 7.5, 0, 0, 4.7, 0, 0, 0, and 77.5 DEGs were detected, respectively. The Monocle2 and SigEMD algorithms detected 132.6 and 1291 DEGs, which showed relatively high FPRs. These results show that the RankCompV3 method performs well in controlling the FPR in the negative dataset.Performance comparison with Squair et al.’s benchmark testSquair et al. performed a benchmark test on various DEG algorithms. It included the algorithms specially designed for single-cell mode and the algorithms designed for bulk profiles. The results from bulk RNA-seq profiles of the same samples were used as the ground truth to test the performance of the algorithms in single-cell and pseudo-bulk modes. Eleven gold standard datasets were used to evaluate the performance of RankCompV3 using the same benchmark protocol as in Ref. [26]. The results are listed in Table 2. In most datasets, the concordance between the pseudo-bulk and the ‘ground-truth’ is very high. The median AUCC of RankCompV3 in the pseudo-bulk mode is 0.45, which is higher than that of the best method in Ref. [26] (edgeR-LRT, median AUCC = 0.38) (Fig. 2). More importantly, the concordance is also high for RankCompV3 applied to single-cell profiles directly (median AUCC = 0.39), which is better than the best method for single-cell in the Ref. [26] (t-test and Wilcoxon rank-sum test, the median AUCC is 0.24) (Fig. 2).Table 2 Area under the concordance curve (AUCC) for RankCompV3 for pseudo-bulk analysis scheme and single-cell analysis scheme in the benchmark datasets from Squair et al. [26]Fig. 2Median AUCC for RankCompV3 and 14 other DEGs recognition algorithms for pseudo-bulk analysis scheme in the benchmark datasets from Squair et al.Only in two datasets, Angelidis2019_alvmac and Reyfman2022_pneumo, did RankCompV3 perform poorly. The data quality of Angelidis2019_alvmac dataset was very poor and many genes were not detected in either the bulk or single-cell profiles. For the Reyfman2022 datasets, no bulk profiles were provided, and the given bulk DEGs were used as the ‘ground-truth’.Furthermore, in comparison with the ‘ground-truth’ obtained with edgeR-LRT, our results also show strong concordance. The median AUCC metric is 0.36 for the pseudo-bulk method and 0.29 for the single-cell method.These results suggest that RankCompV3 can be effectively used for pseudo-bulk analysis, which can improve the performance of differential expression analysis in scRNA-seq data.Performance evaluation with the simulated single-cell datasetSince we could not obtain all the truly differentially expressed genes in a real dataset, we simulated multimodal scRNA-seq profiles with scDD for cells under two conditions with 75 cells in each to test the performance. The dataset contains 2000 DEGs equally distributed in four different modes and 18,000 non-DEGs equally distributed in two modes. The ROC curves are shown in Fig. 3A. RankCompV3 has an AUC of 0.865, which is in the middle-tier of all 12 algorithms. The AUCs of Monocle2, SigEMD, Bimod, LR, MAST, DEsingle, Wilcoxon, edgeR, DESeq2, limma, and scDD are 0.649, 0.915, 0.820, 0.713, 0.620, 0.949, 0.632, 0.951, 0.966, 0.837, and 0.963, respectively.Fig. 3Simulated single-cell dataset. A ROC curves of RankCompV3 and other algorithms in the simulated scRNA-seq dataset. The area under each ROC curve (AUC) is shown in the legend. B TNRs and TPRs for the non-differential and differential genes with different expression modes in the simulated scRNA-seq dataset. C The radar map shows the Accuracy, Specificity, True positive rate and Precision of each algorithmThe performances of the different algorithms are compared in Table 3 at a FDR < 0.05 threshold. RankCompV3 had a FPR of 0.018, a precision of 0.815, and an accuracy of 0.955, which were all better than or similar to the other algorithms. Compared to LR, MAST, and Wilcoxon, RankCompV3 showed a higher TPR while maintaining an extremely low FPR. Monocle2, SigEMD, and edgeR obtained extremely high TPRs, but they also contained a large number of false positives (FPs). Monocle2 had a TPR of 0.998, but its accuracy was only 0.122. This is because Monocle2 identified 16,383 DEGs among 20,000 genes, which introduced a larger FPR of 0.799 (Fig. 3C).Table 3 Detection performance of RankCompV3 and 11 other algorithms in the simulated scRNA-seq dataset (FDR < 0.05)We evaluated the genes of different modes separately and compared the TNRs of non-DEGs of the two different modes and the TPRs of DEGs of the four different modes, as shown in Fig. 3B. The results showed that the average TNR of the two modes of non-DEGs, EE and EP, was 0.982 using RankCompV3. The highest TPR of the four modes of DEGs, DD, DP, DB, and DM, was 0.962. For the DEG modes with no or low multimodality, DE and DM, the average TPR was 0.927. For the highly pleiotropic DEGs, DP and DB, the identification ability for DP was also very high (0.946), but it failed to detect DB genes. These results demonstrate that RankCompV3 strictly controls the FPR in the single-cell simulation dataset and has a good ability to detect DEGs of low multimodality and some DEGs with high multimodality.In the simulated dataset, we randomly generated 10 pseudo-bulk profiles for each group to identify DEGs. The mean number of detected DEGs from the 10 random experiments was 1737.5 (standard deviation, SD = 13.3), and the mean number of true DEGs was 1413.3 (SD = 3.1). These numbers show a slight improvement compared to those obtained with the single-cell profiles directly (1731 and 1411, respectively). The mean AUCC metric was 0.884 (SD = 0.01), showing a strong concordance between the pseudo-bulk and the single-cell methods.Performance evaluation in real scRNA-seq datasetAlthough the simulated dataset mimics numerous aspects of single-cell expression profiles, it cannot fully capture the complex characteristics of real data. To evaluate the performance of RankCompV3 on real data, we used the scRNA-seq dataset provided by Islam et al. [38], which consisted of the profiles of 48 mouse ES cells and 44 mouse embryonic fibroblasts. The top 1000 differential genes verified by RT-qPCR experiments were used as the gold standard DEGs (top1000).Figure 4A shows the number of DEGs identified by each algorithm, the true number of DEGs (the number of genes that intersect with the top1000 genes), AUCC, and precision. RankCompV3 identified 1045 DEGs, of which 242 were true DEGs (TPR = 0.242). Although the number of true DEGs is the smallest, RankCompV3 has the highest precision and accuracy. RankCompV3 has a strictly conservative FPR (0.037), while maintaining a higher accuracy (0.932) than the other 11 methods which show high FPRs, ranging from 0.122 to 0.873. Among the other 11 algorithms, limma, Monocle2, DEsingle, and edgeR have the highest TPRs, which are 0.761, 0.931, 0.797, and 0.753, respectively. However, these algorithms also have high FPRs, ranging from 0.330 to 0.873. As a result, their accuracies are relatively low, ranging from 0.150 to 0.651.Fig. 4Real scRNA-seq dataset. A Performance in identifying DEGs in the scRNA-seq test dataset of GSE29087 where the top1000 genes confirmed by RT-qPCR were used as the gold standard. The purple column represents the number of DEGs identified by each algorithm. The green area represents the number of intersection genes between the DEGs identified by the algorithm and the top1000 genes. The connected blue dots indicate TPRs and the connected orange lines indicate the area under the concordance curve (AUCC) metric between the top 1000 DEGs identified by the algorithm and the gold standard. B ROC curves in the scRNA-seq positive test dataset of GSE29087. The top1000 genes were used as the gold standard and the AUC values are shown in the legendSince the top1000 is a partial list of true DEGs which might cause higher FPRs in some algorithms, we further evaluated the performance with AUC and AUCC. The ROC curves are shown in Fig. 4B. The AUC of RankCompV3 is 0.662, which is comparable to the other 11 algorithms. The AUCs of Monocle2, SigEMD, Bimod, LR, MAST, DEsingle, Wilcoxon, edgeR, DESeq2, limma, and scDD are 0.699, 0.628, 0.700, 0.686, 0.670, 0.629, 0.674, 0.758, 0.598, 0.571, and 0.575, respectively. The AUCC in Fig. 4A measures the concordance between the top 1000 DEGs detected by each algorithm and the gold standard. EdgeR and Monocle2 obtained the best concordance scores (0.60), followed by RankCompV3 and DESeq2 (0.59).In conclusion, RankCompV3 can identify DEGs in scRNA-seq positive datasets. Compared to the high FPRs of many other methods, RankCompV3 may be more suitable for studies that require strict control of FPR.Effect of sample size to performanceTo investigate the dependence of performance on sample size, we used the scRNA-seq dataset of LTHSC of two age-group mice provided by Kowalczyk et al. [40]. Subsets with sizes of 10, 30, 50, and 70 were randomly sampled for each age-group, and the sampling experiments were repeated 10 times. DEGs identified by the same algorithm in the entire dataset were taken as the gold standard. We assessed the concordance of the gene list sorted according to the significance level between the entire dataset and the subsets. Figure 5 shows the trend of the AUCC for the complete gene list and the top 1000 genes. The AUCC for the top 1000 genes is more meaningful since they are concentrated with more DEGs. As the sample size increased, the TPR and AUCC for the complete gene list increased whereas the FNR decreased (Fig. 5, see also Supplementary file 1: Fig. S2). At a sample size of 30, RankCompV3 identified approximately half of the gold standard DEGs while the AUCC for the top 1000 genes reached a plateau. This indicates that most of the genes of interest have been correctly placed at the top of the gene list, despite some not being identified as DEGs.Fig. 5Performance dependence on sample size. The AUCC was calculated for all the genes in the left panel and for the top 1000 genes in the right panel. The genes were sorted according to their significance levels given by each method in each datasetApplication in weak-signal datasetsIn Misharin et al.’s study [44], they found that the differential expression of Siglecf can reliably distinguish Mo-AMs from TR-AMs in the bleomycin-induced early fibrosis. However, after 10 months of treatment with bleomycin, TR-AMs and Mo-AMs expressed similar levels of Siglecf and could not be distinguished by flow cytometry [41]. Only 330 DEGs were identified by two-way ANOVA (FDR < 0.05). With FDR < 0.05, RankCompV3 identified 5023 DEGs. SigEMD, Monocle2, edgeR, scDD, Bimod, DESeq2 and limma detected 2688, 1456, 1224, 720, 182, 111 and 79 DEGs, respectively. Other algorithms failed to detect or detected very few DEGs.Through functional analysis, we confirmed that many DEGs identified by RankCompV3 have been shown to be associated with the differentiation of Mo-AMs and TR-AMs and the development of pulmonary fibrosis. For example, Siglecf is a reliable marker to distinguish Mo-AMs from TR-AMs [44]. Vcam-1 is a TGF-β1 responsive mediator that is upregulated in idiopathic pulmonary fibrosis [45]. SPARC drives pathological responses in non-small cell lung cancer and idiopathic pulmonary fibrosis by promoting microvascular remodeling and the excessive deposition of ECM proteins [46]. FGF2 inhibits pulmonary fibrosis through FGFR1 receptor action [47]. Adam8 deficiency increases CS-induced pulmonary fibrosis [48]. Macrophages expressing SPP1 proliferate during pulmonary fibrosis [49]. Sparc, Fgfr1 and Adam8 were identified by RankCompV3 only. Many other algorithms failed to detect any of these DEGs, and only Monocle2 (Siglecf, Spp1, and Vcam1), SigEMD (Spp1), edgeR (Siglecf, Spp1, and Vcam1), and DESeq2 (Spp1) identified 1 to 3 of the above functional-meaningful genes.We performed pathway enrichment analysis (FDR < 0.05) for the DEGs identified by RankCompV3. The DEGs were enriched in 82 pathways, of which five were associated with pulmonary fibrosis (Fig. 6). Similarly, we performed pathway enrichment analysis on the DEGs identified by the other algorithms. Only five algorithms, Monocle2, SigEMD, DEsingle, edgeR, and DESeq2, recognized 1 to 3 of these functional pathways. Even when the significance threshold was relaxed to FDR < 0.2, the DEGs identified by Bimod and MAST were still not enriched in any of these pathways.Fig. 6Pulmonary fibrosis-related pathways enriched with DEGs identified by RankCompV3. The x-axis value (also shown next to the bars) indicates the number of DEGs and color indicates the FDR value of the enriched pathwayThe Wnt signaling pathway was specifically identified by RankCompV3. The Wnt signaling pathway is an important pathway promoting pulmonary fibrosis [50], and studies have shown that targeting the Wnt pathway is a potential new treatment option for fibrosis [51].In conclusion, RankCompV3 can detect weak biological signals that are functionally meaningful.

Hot Topics

Related Articles