Multioviz: an interactive platform for in silico perturbation and interrogation of gene regulatory networks | BMC Bioinformatics

Demonstration of Multioviz on real dataTo demonstrate the utility of Multioviz, we apply the software to real genetic data from a heterogeneous stock of mice collected by the Wellcome Trust Centre of Human Genetics (http://mtweb.cs.ucl.ac.uk/mus/www/mouse/index.shtml) [16]. The genotypes from this study were downloaded directly using the BGLR R package [20]. This study contains \(N =\) 1,814 heterogeneous stock of mice from 85 families (all descending from eight inbred progenitor strains) and 131 quantitative traits that are classified into 6 broad categories including behavior, diabetes, asthma, immunology, haematology, and biochemistry. Phenotypic measurements for these mice can be found freely available online to download (details can be found at http://mtweb.cs.ucl.ac.uk/mus/www/mouse/HS/index.shtml). In this study, we focus on modeling the percentage of CD8+ cells in these mice as our \(\textbf{y}\) vector. For preprocessing, we corrected this trait for sex, age, body weight, season, and year [16]. The \(\textbf{X}\) matrix that we input into Multioviz contains single nucleotide polymorphisms (SNPs) as variable, each of which are encoded as \(\{0, 1, 2\}\) copies of a reference allele at each locus. For mice with missing genotypes, we imputed values by the mean genotype of that SNP in their corresponding family. Only polymorphic SNPs with minor allele frequency above 5% were kept for the analyses. This left a total of \(J =\) 10,227 SNPs that were available for all mice. Lastly, to create biological annotation file \(\textbf{M}\), we used the Mouse Genome Informatics database (http://www.informatics.jax.org) [21] to map SNPs to the closest neighboring gene(s). Unannotated SNPs located within the same genomic region were labeled as being within the “intergenic region” between two genes. Altogether, a total of \(G =\) 2,616 annotations were analyzed.We input these files into Multioviz where we assumed that significant SNPs and genes would produce PIPs greater than or equal to 0.5—this is also known as the median probability model threshold in Bayesian statistics [19]. When viewing the corresponding GRN produced by the software, this resulted in 15 associated SNPs variables and 19 enriched genes (Fig. 4). Notably, we observed the SNP CEL-17_31069801 and gene hlb156 on chromosome 17 as both being significant (PIPs = 1). As corroborating evidence, the genomic region where these molecular variables reside has been reported to contain highly significant SNPs that contribute to non-additive variation for CD8+ T-cells [16]. To investigate this region further, we perturbed the GRN in Multioviz by deleting CEL-17_31069801 and observed the emergence of CEL-17_31214920 as being important which also maps to the hlb156 gene (PIP = 1). Two new gene-level variables that also became enriched upon perturbation and are both associated with CD8+ T-cell differentiation are Anapc1 (PIP = 0.726) and Pard3 (PIP = 0.998). Anapc1 functions in the metaphase-to-anaphase transition in the cell cycle and has been associated with poor prognosis in T-cell acute lymphoblastic leukemia [22]. Pard3 directs polarized cell growth and asymmetric cell division [23]. The asymmetric division of T-cells has been uncovered as a potential means by which effector and memory T cells are differentiated during immune responses[24]. Overall, we show here that Multioviz has the potential to enable users to generate new testable hypotheses in silico through its perturbation framework. These results suggest a honed set of molecular variables to explore in investigating mechanisms underlying the percentage of CD8+ T cells in heterogeneous mice.Fig. 5Comparison of gene regulatory network outputs from Multioviz, OpenXGR, and vissE.cloud during a perturbation analysis. Leveraging the same CD8+ cell percentage in the mice dataset from the Wellcome Trust Centre of Human Genetics (similar to Fig. 4) [21], we set out to compare these platforms. To ensure compatibility between platforms, we first preprocess data inputs by removing intergenic regions, and converting from mice gene names to human gene nomenclature where applicable. Then we proceed with perturbation analysis. a Perturbation analysis using Multioviz. The top panel shows an inferred GRN using Multioviz. The bottom perturbed GRN is generated by removing the most significant gene, FMN2 and clicking “Rerun” on the platform. The total runtime for this analysis was approximately 10 min. b Similar perturbation analysis using OpenXGR [9]. The platform’s “Subnetwork Analyzer for Genes” (SAG) requires a list of genes and associated p-value statistics. To achieve this, we ran a series of univariate linear regressions for each SNP and determined the list of significant genes using the minSNP approach [25, 26]. This step was then followed by removing intergenic regions and converting from mice gene names to human gene nomenclature where possible. In OpenXGR, nodes represent genes in the inferred GRN, with darker colors indicating more significant genes. OpenXGR lacks in silico perturbation functionality. Thus to replicate the Multioviz pipeline, we manually remove the most significant gene DNAH8 (\(P= 3.25\times 10^{-60}\)) from the dataset. We then rerun the OpenXGR pipeline to obtain the new GRN. Runtime for this analysis was approximately 30 min. c Similar perturbation analysis using vissE.cloud [10]. The vissE.cloud platform uses “Gene Set Enrichment Analysis” (GSEA) [27] which requires a list of genes and their paired summary statistics. To achieve this, we again use the minSNP approach, removed intergenic regions, and where applicable, converted from mice gene names to human gene nomenclature. Given that the gene set network that vissE.cloud outputs does not directly show which genes are most significant, we again perturbed DNAH8 as we did in OpenXGR, resulting in the shown perturbed gene set network. The total runtime was approximately 40 minComparing platforms during a perturbation analysisTo comprehensively assess Multioviz’s performance during an in silico perturbation analysis, we compared Multioviz with two comparable platforms, OpenXGR [9] and vissE.cloud [10] (Table 1), which leverage Gene Set Enrichment Analysis (GSEA)[27] to visualize significant SNPs and genes that belong to subnetworks and enriched pathways (Table 1). For this platform comparison, we again utilized the heterogeneous stock of mice dataset from the Wellcome Trust Centre of Human Genetics[21]. While all platforms similarly aim to infer GRNs from high-dimensional multi-omics data, there are several differences, predominantly in data preprocessing, ease-of-use, and interface functionality.Similar to Multioviz, both OpenXGR and vissE.cloud take precomputed statistics for the molecular level (e.g., genes) of interest. However, neither OpenXGR and vissE.cloud accommodate information about intergenic regions and, despite their potential significant regulatory influence, statistics corresponding to these features must be removed before these platform analyses can proceed [28]. While OpenXGR and vissE.cloud provide statistical confidence scores for molecular variables, these scores are presented in the form of lists and graphs (rather than being integrated into the output networks), which makes interactive variable selection less user-friendly. In the context of OpenXGR, only the gene table with functional descriptions and statistical significance scores is interactive, not the GRN itself. This can be limiting in settings where the goal for users is to interpret the GRNs. For vissE.cloud, various visuals for multi-scale analyses including GRNs and gene set enrichment exist. However, functional connection between these scales is unclear, making it challenging to identify gene set affiliations and discern genes within specific subnetworks. Perhaps the most noticeable limitation in functionality for the OpenXGR and vissE.cloud platforms is that they lack integration support for a wide range of statistical models, a key component that is available for method developers to integrate in the Multioviz R package. Further, while Multioviz enables GRN generation that incorporates both genes and SNPs simultaneously, mirroring biological networks, OpenXGR and vissE.cloud only perform single molecular level GRN construction (e.g., creating only a SNP GRN or gene GRN, but not both). Currently, OpenXGR is restricted to human genomes, with plans to include compatibility with mouse data in future iterations of the platform [9]. Lastly, neither OpenXGR and vissE.cloud support direct in silico perturbation analysis.Given the platform restrictions for both OpenXGR [9] and vissE.cloud [10], we needed to implement a few additional human-in-the-loop steps to their workflows in order to compare their performance with Multioviz. The GSEA implementation in vissE.cloud requires that all G genes be input as a ranked list (in ascending order), a corresponding z-score statistic, or as p-values \((P_1, \ldots P_g)\). OpenXGR, on the other hand, only accepts p-values as input. Thus, in order to re-analyze the same percentage of CD8+ cells phenotype, we used the minSNP procedure [25, 26]. Here, we ran a univariate linear model for each SNP individually, and attributed a p-value for each gene by using the SNP with the lowest p-value in that gene’s region. This produced a list of genes with p-values from which we could determine a set of statistically significant genes. To ensure compatibility between platforms, we then filtered out any genomic features labeled as “intergenic” regions. Next, where applicable, we leveraged the Mouse Genome Informatics (MGI) database [29] to convert the mouse gene names to their corresponding human gene names to ensure compatibility with OpenXGR. With these paired gene and statistical inputs prepared, we were able to proceed with running perturbation analyses for all three platforms.To implement a perturbation analysis in OpenXGR and vissE.cloud, we carry out the following steps. First, we run each platform with the paired gene and statistical measure inputs derived from the full mouse dataset. Second, we manually remove a statistically significant feature. Then, third, we rerun each platform without the significant feature to imitate an in silico knock-out (Fig. 5). It is worth noting that, for this particular dataset, neither of the competing platforms were able to generate GRNs using their default settings. Consequently, we had to manually adjust each of their hyper-parameters to investigate relationships between genes and pathways connect to CD8+ cell percentage. In OpenXGR, this meant setting the “functional interaction” to the lowest value of “medium confidence”; while, for vissE.cloud, we needed to fix the overlap threshold for gene set similarity measurement to the minimum value of 0.1. Notably, Multioviz simplifies the process of identifying the degree of meaningful functional interactions by incorporating a toggle directly into its platform interface. In Fig. 5, we display Multioviz with a selected edge threshold of 0.1 to ensure fair comparisons with the other platforms.Each of the platforms we compare generates a slightly different type of visual GRN. Multioviz provides an interactive GRN of SNPs and genes with their associated significance scores that enables users to explore and interact with the network dynamically (Fig. 5a). Conversely, OpenXGR outputs a static image of a gene level GRN, with a scroll-able table of gene names and associated statistical measures below it (Fig. 5b). While the vissE.cloud interface offers a wider range of genomic analyses, it does not directly link how the gene-level statistics correspond to gene set enrichment results. Instead, clusters of gene names and their associated statistics are displayed separately in a “Gene Stat” plot (Fig. 5c), while networks of connected pathways from GSEA are displayed in a different panel. Performing in silico perturbations also results in variations of detailed images from all three platforms. Due to us needing to include additional human-in-the-loop steps to overcome a lack of perturbation functionality for OpenXGR and vissE.cloud, the total needed to time to run our in silico analysis took approximately 30 min for OpenXGR and approximately 40 min for vissE.cloud. This compared to only needing 10 min to run an entire workflow for Multioviz. Overall, these comparison highlights promise of Multioviz to accelerate key steps in in silico perturbation workflows. The Multioviz platform interface requires less data preprocessing for inputs, more flexible functionality for real time investigation, and requires less end-to-end runtime for analysis.

Multioviz: an interactive platform for in silico perturbation and interrogation of gene regulatory networks | BMC Bioinformatics

Scale Biosciences and partner CZI to propel RNA sequencing innovation in ‘100 Million Cell Challenge’

Delineating cell types with transcriptional kinetics

Cornell researchers develop machine-learning diagnostic models that use cell-free molecular RNA

Chromosome-level genome assembly of Tritrichomonas foetus, the causative agent of Bovine Trichomonosis

Parasitologists up in arms as NIH ends funding for key database

Hot Topics

Scale Biosciences and partner CZI to propel RNA sequencing innovation in ‘100 Million Cell Challenge’

Delineating cell types with transcriptional kinetics

Cornell researchers develop machine-learning diagnostic models that use cell-free molecular RNA

Related Articles

Balancing Act: Pregnancy and Bipolar Disorder

Cohesion at the cellular level: flexible yet stable

Gut bacteria influence responses to immunotherapy in patients with asbestos related cancer

Quick Links

Must Read

Scale Biosciences and partner CZI to propel RNA sequencing innovation in ‘100 Million Cell Challenge’

Delineating cell types with transcriptional kinetics

Cornell researchers develop machine-learning diagnostic models that use cell-free molecular RNA

Chromosome-level genome assembly of Tritrichomonas foetus, the causative agent of Bovine Trichomonosis

Popular Articles

Scale Biosciences and partner CZI to propel RNA sequencing innovation in ‘100 Million Cell Challenge’

Delineating cell types with transcriptional kinetics

Cornell researchers develop machine-learning diagnostic models that use cell-free molecular RNA