Directional integration and pathway enrichment analysis for multi-omics data

Directional integration of multi-omics dataWe developed directional P-value merging (DPM), a statistical method for multi-omics data fusion that prioritises genes across multiple omics datasets by integrating their P-values and directional changes such as fold-changes (FC) (Fig. 1A, Supplementary Fig. 1, Methods). DPM implements a user-defined constraints vector (CV) to specify directional associations between input datasets. For each gene, DPM computes a score based on the P-values and directional changes from the omics datasets. Genes showing significant directional changes that comply with the CV are prioritised, while the genes with significant but conflicting directional changes are penalised. DPM builds on our ActivePathways method18 and provides a directional extension of the empirical Brown’s P-value merging method19,20. For a given gene, a directionally weighted score XDPM is computed across k datasets as$${X}_{{DPM}}=-2(-{{{{{\rm{|}}}}}}{\Sigma }_{i=1}^{j}{\ln}({P}_{i}){o}_{i}{e}_{i}{{{{{\rm{|}}}}}}+{\Sigma }_{i=j+1}^{k} {\ln}({P}_{i})).$$
(1)
Fig. 1: Directional integration of multi-omics data using DPM.A The DPM method combines gene significance and directions in multi-omics datasets for gene prioritisation and pathway analysis. Four inputs are required: (1) gene activities in input omics datasets quantified as P-values; (2) directional changes of genes such as fold-change (FC) values, used as positive ( + 1) or negative ( − 1) unit values, or zeroes for directionless data; (3) user-defined constraints vector (CV) showing expected directional relationships between the omics datasets; and (4) gene sets of biological processes, pathways, or gene annotations. DPM combines gene P-values and directions with the CV using a data fusion approach, prioritising genes whose directions significantly agree with the CV and penalising those whose directions are inconsistent with the CV. Three examples of CVs are shown. B The integrated gene list is analysed for pathway enrichments using ranked hypergeometric tests in ActivePathways to identify the strongest pathway enrichments in top fractions of the ranked gene list and evaluate evidence from input datasets. C Enriched pathways are visualised as an enrichment map. The network shows enriched pathways where edges connect pathways that share many genes. Colours indicate the omics datasets that contribute most to pathway enrichments. Node outlines indicate pathways identified using directional or non-directional analyses.To incorporate directionality to P-value merging, we compute sums of log-transformed P-values Pi that are weighted by directional information. Here, oi shows the observed directional change of the gene in dataset i. For example, in differential expression analysis, oi is the gene fold-change direction relative to a control condition. Directions are considered as unit signs (i.e., + 1 or −1) because effect sizes are generally not comparable between various omics datasets. Besides log-FC values, directions may include correlation coefficients, log-transformed hazard ratio (HR) values from survival analyses, or other values used as unit signs. To obtain XDPM, the scores are multiplied by two in line with Fisher’s method21.The constraints vector CV defines the directional association ei showing how the direction of dataset i is expected to interact with other input datasets. CV defines the structure of the multi-omics analysis. Series of positive ( + 1) or negative ( − 1) values prioritise genes that have the same observed directions in corresponding datasets (e.g., transcript and protein expression). In contrast, mixed values in CV ( + 1 and −1) prioritise genes with inverse directions in corresponding datasets (e.g., DNA methylation and transcript expression). The absolute function in the XDPM formula ensures that CV is globally sign invariant (i.e., [−1, + 1] ≡ [+1, − 1] and [+1, + 1] ≡ [−1, − 1]): the CV [ + 1, + 1] prioritises genes with up-regulation or down-regulation in both datasets and the CV [ − 1, − 1] results in an equivalent analysis. In contrast, the CVs [+1, − 1] and [−1, + 1] prioritise genes upregulated in one dataset and downregulated in the other dataset. Importantly, the CV is not limited to the central dogma or any other cellular logic. As a user-defined parameter, it can be configured to highlight genes and pathways with arbitrary directional relationships. An example of data integration with DPM is shown in Supplementary Fig. 1.DPM can jointly analyse directional and directionless omics datasets. XDPM adds scores over datasets (1 … j) with directional information and datasets (j + 1 … k) lacking directional information. Either part of the sum can be omitted if needed. In directionless datasets, genes or proteins are only scored based on P-values and are encoded as zeroes in the CV. For example, this can be used for mutational burden tests, epigenetic annotations, or network topology analyses that provide P-values but no directional information.We compute the merged P-value P’DPM to reflect the joint significance of the gene across the input datasets given directional information. The merged P-value is derived from the cumulative χ2 distribution as \({P}_{{DPM}}^{{\prime} }=1-{{{{{{\rm{\chi }}}}}}}^{2}\left(\frac{1}{c}{X}_{{DPM}},{k}^{{\prime} }\right)\). For more accurate significance estimation, we account for gene-to-gene covariation in omics data and estimate degrees of freedom k’ and scaling factor c from the input P-values using the empirical Brown’s method20. In addition to DPM, we also provide directional extensions to P-value merging methods by Stouffer22 and Strube23 based on the METAL method for genome-wide association studies24. We adapted METAL for joint analyses of directional and non-directional multi-omics datasets (Methods).Our workflow includes four major steps. First, we process upstream omics datasets into a matrix of gene P-values and another matrix of gene directions (Fig. 1A). Dedicated upstream processing of input omics datasets is required to obtain these values. We define a CV with directional constraints based on the overarching hypothesis, experimental design, or biological insights. We also collect up-to-date pathway information25 from databases such as GO2 and Reactome3. Other types of functional gene sets such as disease genes or transcription factor targets can be used as well. Second, P-values and directions are merged into a single gene list of P-values using DPM or related methods22,23. This is useful for multi-omics gene prioritisation. Third, the merged gene list is analysed for enriched pathways using a ranked hypergeometric algorithm in the ActivePathways method18 that also determines which input omics datasets contribute most to individual pathways (Fig. 1B). Finally, the resulting pathways are visualised as enrichment maps1,26 that reveal characteristic functional themes and highlight their directional evidence from omics datasets (Fig. 1C). DPM provides a general and adaptable framework to explore understudied intersections of complex multi-omics datasets.Benchmarking directional P-value mergingWe evaluated DPM and the modified Strube’s method using synthetic data (Fig. 2A, B, Supplementary Data 1). Two input datasets of 10,000 genes were integrated in three directional configurations having all genes in directional agreement, all genes in directional conflict, or 50% genes in directional conflict. First, we simulated uniformly distributed P-values as negative controls to evaluate false positive rates of DPM. We tested two scenarios where the two sets of input P-values were either independent (Pearson r < 0.001) or strongly correlated with each other (r = 0.97). With full directional agreement, DPM expectedly found ~5% of merged P-values at (P < 0.05) in independent and correlated datasets, corresponding to the expected fraction of significant P-values in uniform data. This indicates a favourable false positive rate. As directional penalties were applied in DPM, the dataset with 50% directional conflicts showed proportionally fewer significant merged P-values. In contrast, the Strube method found two-fold more significant merged P-values when merging independent P-values suggesting a higher false positive rate while merging of correlated P-values was not inflated.Fig. 2: Evaluating directional P-value merging (DPM) with simulated data.Two sets of 10,000 genes with simulated P-values and directional information were merged using DPM and the modified Strube method. Input P-values P1 and P2 were generated randomly from the uniform distribution (Uni) as negative controls, or from the exponential distribution (Exp) to reflect datasets with significant signals. Input P-values were generated independently or with strong correlations. Three types of directions were considered: directional agreement of all genes, directional conflict of all genes, and 50/50 mixed directions. Unadjusted P-values are shown. A Bar plots show significant merged P-values at various cut-offs. DPM finds the expected fraction of significant results in uniformly sampled data while Strube’s method shows inflated results when merging independent P-values. B Scatter plots show the distributions of input P-values. Points are coloured based on merged significance from the two methods (P < 0.05). N1 and N2 show numbers of significant input P-values. Scatter plots suggest that DPM is more sensitive in directionally integrating genes in which the directional conflicts are not supported by significant P-values (yellow points).Next, we integrated two independent omics datasets having significant signals. We simulated both input datasets using exponentially distributed P-values such that many significant genes were included ( ~ 26% at P < 0.05 or ~1% at FDR < 0.05). When all genes were in directional agreement, 39% of merged P-values were significant (P < 0.05). This higher fraction is expected as the two input datasets independently contributed to merging. With 50% directional conflicts, 22% of genes were found significant, indicating the role of directional penalties. Even with full directional conflicts, a small fraction of genes (5%) was found significant (P < 0.05). Further study of this subset indicated that DPM prioritised directional conflicts where the gene was supported by strong significance in one dataset while the directionally conflicted evidence from the second dataset was not significant (Fig. 2B). This suggests increased sensitivity of DPM towards weaker effects. The modified Strube method again showed a consistently higher rate of significant findings, suggesting an inflation in merging independent P-values.Finally, we integrated two correlated omics experiments having significant signals. We simulated exponentially distributed P-values with a large fraction of significant genes that were highly correlated between the two datasets (r = 0.97). DPM found fewer significant merged P-values compared to independent datasets. This is expected as DPM adjusts for covariation of input P-values for more conservative merging. DPM and the Strube method behaved similarly in integrating correlated datasets. In both cases, no significant results were found when all genes were in conflict, indicating that directional penalties were stronger in highly correlated input datasets. These benchmarks suggest that DPM is a statistically well-calibrated approach for directional integration of multi-omics data.Analysing transcriptomic targets of HOXA10-AS lncRNA in gliomaWe then studied real omics datasets using DPM. First, we analysed an earlier transcriptomics dataset in which the oncogenic lncRNA HOXA10-AS was profiled in knockdown (KD) or overexpression (OE) experiments in patient-derived glioblastoma (GBM) cells27. To identify target genes and pathways of the lncRNA, we prioritised genes that changed in opposite directions in the two experiments and penalised genes that were either upregulated or downregulated in both experiments (Fig. 3A). DPM revealed 2236 significant and directionally consistent genes (P < 0.05) (Fig. 3B, Supplementary Data 2). Further, we found 773 genes that were penalised by DPM due to directional constraints, however these were identified in the reference non-directional analysis (P < 0.05). Among prioritised genes, CPED1 was a top result found by DPM (P = 2.8 × 10−7). CPED1 was significantly upregulated in HOXA10-AS KD experiment and downregulated upon OE (Fig. 3C), indicating a potential negative regulatory target of HOXA10-AS. CPED1 is a little-studied gene that encodes a cadherin-like protein with a PC-esterase domain. Also, the tumour suppressor gene FAT1 was prioritised due to upregulation in HOXA10-AS OE and no significant change in KD, exemplifying another mode of gene prioritisation in DPM. FAT1 encodes a cadherin protein and tumor suppressor that controls organ growth, cell polarisation, and cell-cell contacts and is involved in tumor invasion, metastasis, and drug resistance28,29. In contrast, the top directionally penalised genes included NEGR1, a neuronal growth regulator, and CACNA1H, a calcium voltage-gated channel, that were either jointly upregulated or jointly downregulated in KD and OE experiments (Fig. 3C). NEGR1 and CACNA1H are involved in neuronal development and cell adhesion, respectively30,31.Fig. 3: Directional integration of transcriptomics data from functional experiments of HOXA10-AS lncRNA in GBM cells.A We integrated differential gene expression data from HOXA10-AS knockdown (KD) and overexpression (OE) experiments from a previous study27 that compared sets of three replicates. DPM prioritised genes that showed different fold-change (FC) directions in KD and OE experiments and penalised genes with matching directions using the constraints vector (CV) [KD = −1, OE = +1]. B Scatter plot of merged P-values from directional analysis (DPM, Y-axis) and non-directional analysis (the Brown method, X-axis). Prioritised genes with directionally consistent changes are shown on the diagonal or closely below it (blue), while directionally penalised genes with conflicting directional changes are further below the diagonal (red). Unadjusted P-values are shown. C Examples of prioritised genes (top) and penalised genes (bottom). D Venn diagram of enriched pathways found with directional and non-directional analyses (family-wise error rate (FWER) < 0.05). E Enrichment map of pathways and processes from directional and non-directional analyses (FWER < 0.05). Pathways are shown as node in the network that are connected by edges if the pathways share many genes. Subnetworks represent functional themes. Node colours indicate dataset contributions (KD, OE, both, or combined-only). Node size reflects number of genes per pathway. Node outlines show directionally prioritised pathways (spiky edges), directionally penalised pathways (dotted edges), or pathways found using both approaches (solid edges). Major groups of directionally prioritised or penalised pathways are grouped on the right. F Dot plots of significant genes involved in cell migration and oxygen response processes visualised with P-values and fold-change values from the HOXA10-AS transcriptomics study27. Genes penalised in the non-directional analysis are indicated with asterisks. Carets show known cancer genes from the COSMIC Cancer Gene Census database53.Directional pathway analysis using DPM revealed 138 enriched GO processes and Reactome pathways (ActivePathways with DPM, family-wise error rate (FWER) < 0.05) (Fig. 3D, E, Supplementary Data 3 and 4). The reference non-directional analysis found 219 pathways and processes (ActivePathways with Brown, FWER < 0.05). Six pathways were only found by DPM through directional information: vesicular transport, RAB geranylgeranylation, TGF-beta signalling, muscle development, DNA replication, and phospholipid biosynthesis. On the other hand, a third of the enriched pathways from the non-directional analysis (87/219), including cell motility, brain development, and oxygen response, were excluded by DPM due to directional disagreements in related genes such as DPP4, STC1, and ADGRL2 (Supplementary Fig. 2). Although these processes are central to glioma biology32,33,34, our analysis suggests that these are not directly regulated by HOXA10-AS since related genes often showed directional conflicts in KD and OE experiments. For example, the GO process ameboidal-type cell migration found in the non-directional analysis included 37 differentially expressed genes (FWER = 7.3 × 10−4). Eight genes were directionally inconsistent due to either upregulation or downregulation in both experiments (WNT11, SEMA3E, APOE, HAS2, EFNB1, ITGA2, DPP4, RHOJ) (Fig. 3F). Penalising these genes directionally led to loss of pathway enrichment. Similarly, four oxygen-related processes were lost, such as the GO process response to oxygen levels (FWER = 0.0012), in which directional conflicts occurred in 6 of 23 enriched genes (Fig. 3F).This analysis demonstrates the integration of transcriptomic data from two functional experiments on a target gene of interest. We expect that genes and pathways with opposite directional changes in KD and OE experiments are regulated by HOXA10-AS, an oncogenic lncRNA in glioma27. On the other hand, genes and pathways that are unidirectionally regulated in KD and OE experiments may respond to HOXA10-AS levels through feedback loops or post-transcriptional regulation or alternatively reflect a broader cellular response downstream of HOXA10-AS. We can prioritise such genes and pathways using an alternative CV that prioritises matching gene directions (Supplementary Fig. 3). Integrating directional associations from functional experiments improves the resolution of gene prioritisation and pathway enrichment analysis.Proteogenomic analysis of ovarian cancer for biomarker discoveryNext, we integrated cancer transcriptomics and proteomics data with patient overall survival (OS) in ten cancer types from the CPTAC project10 (Fig. 4A, Supplementary Fig. 4, Supplementary Data 5). First, we asked which genes significantly associated with OS via transcript or protein expression using Cox proportional-hazards (PH) regression using patient age, sex, and tumor stage as covariates. P-values and hazard ratios (HR) for transcript- and protein-level OS associations were integrated using DPM such that genes with consistent OS associations were prioritised while inconsistent associations were penalised.Fig. 4: Integrating ovarian cancer transcriptomes and proteomes with patient survival information for pathway and biomarker analyses.A We correlated mRNA (R) and protein (P) levels for each gene with patient overall survival (OS) in 169 ovarian serous cystadenocarcinoma (OV) samples using clinical covariates (patient age, patient sex, tumor stage) in Cox proportional-hazards (PH) models. We prioritised genes that showed matching OS associations with mRNA and protein levels and penalised genes with opposite OS associations using the constraints vector (CV) [R = +1, P = +1]. Unadjusted chi-square P-values and hazard ratio (HR) values from Cox-PH models were used for directional data integration and are shown in panels C, D, and H. B Scatter plot of merged P-values of OS associations in OV from directional analysis (DPM, Y-axis) and non-directional analysis (Brown, X-axis). Prioritised genes with consistent OS associations are shown on the diagonal or closely below it (blue), while directionally penalised genes are further below the diagonal (red). Unadjusted P-values are shown. C Log-transformed HR values of top 100 genes prioritised or penalised by DPM. Prioritised genes associate with either higher or lower risk at mRNA and protein levels, while penalised genes have mixed risk associations with mRNA and protein expression. D Kaplan-Meier plots of OS associations of top genes. High mRNA and high protein levels of the top prioritised gene ACTN4 associate with worse prognosis. In contrast, mRNA and protein levels of the top penalised gene PIK3R4 show inverse OS associations. E Scatterplots of mRNA and protein expression of ACTN4 and PIK3R4. Spearman correlation coefficients and P-values from two-sided correlation tests are shown. Correlation trendline is shown with 95% confidence intervals. F Venn diagram of enriched pathways of OS associations with mRNA and protein levels from directional and non-directional analyses (ActivePathways, false discovery rate (FDR) < 0.05). G Enrichment map of pathways and processes with OS associations. The network shows pathways as nodes that are connected by edges and grouped into functional themes if the corresponding pathways share many genes. Major groups of directionally prioritised or penalised pathways are grouped on the right. H Dot plot of significant genes involved in mitochondrial translation. This process was penalised in the directional analysis due to several genes showing inconsistent OS associations with mRNA and protein expression. Asterisks show directionally penalised genes.We focused on the ovarian cancer dataset (OV) with 169 serous cystadenocarcinoma samples. DPM identified 907 significant genes (PDPM < 0.05). 192 genes were penalised due to inconsistent survival associations compared to a reference non-directional analysis (PBrown <0.05) (Fig. 4B, Supplementary Data 6). Directionally prioritised genes had consistently positive or negative OS associations with protein and transcript expression, while penalised genes showed mixed OS associations (Fig. 4C). The top prioritised gene ACTN4 (PDPM = 5.4 × 10−9) encodes a cytoskeletal actin-binding protein and an emerging oncogene linked to poor prognosis in ovarian cancer35. Higher transcript and protein expression of ACTN4 associated with worse prognosis in OV (Fig. 4D), and mRNA and protein levels of ACTN4 were highly correlated (Spearman ρ = 0.75, P < 2.2 ×10−16) (Fig. 4E). In contrast, the top penalised gene PIK3R4 showed inconsistent OS associations: higher transcript expression associated with worse prognosis while higher protein expression associated with improved prognosis, and transcript and protein expression levels were not correlated (Fig. 4D-E). PIK3R4 encodes a regulatory kinase subunit in the PI3K/AKT pathway, a central signalling network that controls cancer cell proliferation, survival, and metabolism36,37. Inconsistent survival associations of PIK3R4 expression suggest additional modes of regulation that remain masked in these transcriptomics and proteomics datasets.Pathway analysis with DPM revealed 170 significant pathways and processes with multi-omics survival associations (ActivePathways FDR < 0.05), including major functional themes of proliferation, focal adhesion, cell motility, immune cell activity, and development, and signalling pathways such as Hedgehog, Notch, and NFKB (Fig. 4F, G, Supplementary Data 7 and 8). Compared to the reference non-directional analysis, DPM penalised multiple pathways due to directional conflicts in OS associations with transcript and protein expression. For example, biological processes of protein translation and degradation, RNA modifications, and mitochondrial function were penalised, in line with previous reports that indicated low correlations of transcript and protein expression levels in such genes38,39,40. For example, the GO process mitochondrial translation was identified in the non-directional analysis; however, it was penalised in the directional analysis since several enriched pathway genes (8/33) had inconsistent OS associations with transcript and protein expression (Fig. 4H). This analysis demonstrates the integration of multi-omics datasets with clinical information to discover biomarkers and biological mechanisms in heterogeneous datasets of patient cancer samples.Integrating multi-omics data to study IDH-mutant gliomaLastly, we compared glioma samples based on the mutation status of isocitrate dehydrogenase 1 (IDH1), a well-established molecular marker of glioma that indicates lower-risk disease41. We integrated DNA methylation, transcriptomics, and proteomics datasets from TCGA and CPTAC by modelling positive and negative directional associations between the three data types (Fig. 5A). DNA methylation of gene promoters is a repressive epigenetic mechanism that often correlates with reduced gene expression; therefore, we can obtain more accurate multi-omics maps by inversely associating methylation with gene expression. First, we analysed differential transcript and protein expression and DNA promoter methylation in IDH-mutant GBMs relative to IDH-wildtype GBMs and found hundreds of significant genes (Fig. 5B). However, only few genes (32) were significantly detected across all three datasets, and even fewer consistently up-regulated and down-regulated genes were found.Fig. 5: Integrating transcriptomic, proteomic, and DNA methylation profiles of IDH-mutant gliomas.A We compared transcript and protein expression and promoter DNA methylation of IDH-mutant and IDH-wildtype gliomas. We prioritised mRNA (R) and protein (P) expression levels that directly associated with each other and inversely associated with promoter DNA methylation (M) using the constraints vector (CV) [M = +1, R = −1, P = −1]. At least six IDH-mutant and 90 IDH-wildtype samples were included depending on data type. B Venn diagrams of significant genes found separately in three input datasets (false discovery rate (FDR) < 0.1, Mann-Whitney U-tests). Downregulated genes showed reduced mRNA and protein expression and increased promoter methylation, while upregulated genes showed decreased promoter methylation. C Scatter plot of merged P-values from directional analysis (DPM, Y-axis) and non-directional analysis (Brown, X-axis). Prioritised genes with consistent multi-omics directions are shown on the diagonal or closely below it (blue), while directionally penalised genes are further below the diagonal (red). Unadjusted P-values are shown. D Heatmap of significantly penalised or prioritised top genes (Brown, FDR < 0.001). Prioritised genes were often characterised by high promoter methylation and reduced mRNA and protein expression, while penalised genes often showed high promoter methylation and increased expression. Known cancer genes are listed and coloured as directionally penalised or prioritised. E Venn diagram of enriched pathways from the directional and non-directional analyses (ActivePathways, family-wise error rate (FWER) < 0.05). F Enrichment map of pathways and processes in IDH-mutant glioblastoma. The network shows pathways as nodes that are connected by edges if the corresponding pathways share many genes. Major groups of directionally prioritised or penalised pathways are grouped on the right. G Dot plot of significant genes involved in the gliogenesis process. This process was only detected in the directional analysis as several related genes showed significant and directionally consistent changes. Unadjusted P-values from Mann-Whitney U-tests are shown. Carets show known cancer genes. H Validating the multi-omics analysis of IDH-mutant gliomas in an independent dataset. Functional themes from the discovery dataset (TCGA, CPTAC) and validation dataset (GLASS48, Oh et al.49) were compared. Known cancer genes were retrieved from COSMIC Cancer Gene Census53 (panels D, G).To study the molecular makeup of IDH-mutant gliomas in greater detail, we analysed the multi-omics dataset directionally by prioritising inverse associations of promoter methylation levels with direct associations of protein and transcript levels (Fig. 5A). DPM analysis revealed 2023 significant genes (P < 0.05; Fig. 5C, Supplementary Data 9). In addition, 267 genes were penalised due to directional conflicts compared to the reference non-directional analysis (Brown, P < 0.05). Directionally prioritised genes were often driven by elevated promoter methylation and reduced transcript and protein expression that is consistent with the hypermethylator phenotype of IDH-mutant gliomas42. In contrast, the genes penalised by DPM often showed elevated promoter methylation combined with gene upregulation at transcript or protein level (Fig. 5D), potentially due to additional epigenetic regulation that is not measured in our data. We found 98 known cancer-associated genes using DPM (FDR < 0.05), of which 26 (27%) were consistently regulated between the three datasets. Pathway enrichment analysis of directionally prioritised genes revealed 72 pathways and processes (FWER < 0.05, ActivePathways), while 33 pathways from the non-directional reference analysis were penalised by DPM (Fig. 5E, Supplementary Data 10 and 11). DPM penalised biological processes and pathways that appear to be less relevant to glioma biology. For example, the GO process muscle organ development was found in the non-directional analysis, however it was penalised by DPM due to directional conflicts in 80 of 195 genes (Fig. 5F). Fibroblast growth factor receptor (FGFR) signalling pathways were also penalised in the directional analysis (Supplementary Fig. 5), such as the GO process negative regulation of fibroblast growth factor receptor signalling pathway that included ten genes in the non-directional analysis. However, three genes FGF2, WNT5A, and SULF1 were penalised due to directional conflicts in increased promoter methylation coupled with higher gene expression. FGFR signalling regulates tumor progression in gliomas43,44 and oncogenic alterations of FGFR genes have been found in IDH-wildtype gliomas, such as FGFR-TACC fusions in GBM45 and structural variants of FGFR1 in pediatric gliomas46. However, our analysis was focused on IDH-mutant gliomas and indicated inconsistent regulation of FGFR-related genes.Encouragingly, some processes such as gliogenesis were only found in the directional analysis as several related genes showed significant and directionally consistent changes in IDH-mutant gliomas (FWER = 0.0207) (Fig. 5G). For example, OLIG2 was upregulated in IDH-mutant gliomas at the mRNA and protein level. OLIG2 encodes a core neurodevelopmental transcription factor that controls a stem-like tumor-propagating cell state in GBM47.Finally, we validated our analysis of IDH-mutant gliomas in an independent set of cancer samples. We integrated promoter methylation and gene and protein expression datasets from the GLASS project48 and the proteogenomics dataset by Oh et al.49 (Fig. 5H, Supplementary Data 12–14). Directional analysis revealed 170 significant pathways in the validation dataset (FWER < 0.05, Supplementary Fig. 6). Major functional themes such as cell adhesion, cell motility, hypoxia, apoptosis, and cell proliferation were found in both datasets. The validation dataset revealed additional processes of immune system, MAPK signalling, and others, while a few cell differentiation and growth factor signalling pathways were only found in the discovery dataset. This pathway-level validation in an independent set of glioma samples lends confidence to our method and demonstrates data integration through diverse clinical multi-omics datasets.

Hot Topics

Related Articles