NeuroimaGene: an R package for assessing the neurological correlates of genetically regulated gene expression | BMC Bioinformatics

InstallationNeuroimaGene is an open source R package that can be downloaded from the comprehensive R archive network (CRAN) using R’s built in “install.packages()” function (https://doi.org/https://doi.org/10.32614/CRAN.package.neuroimaGene) The repository for the R package is located on GitHub (https://github.com/xbledsoe/NeuroimaGene_R). NeuroimaGene requires no formal external dependencies other than R (≥ 3.5.0) and imports six well-documented packages on installation that are necessary for functionality (data.table, ggplot2, DBI, stringr, ggseg, and RSQLite). The R package, hosted on CRAN and GitHub, represents a set of functions and supporting files designed to interrogate the NeuroimaGene database stored on Zenodo. Upon attempting to use the neuroimaGene() command for the first time, the user will be prompted to download the database from its permanent location on Zenodo (https://doi.org/https://doi.org/10.5281/zenodo.10994978). The current database may be expanded, as new data are compiled. A minimally sufficient subset of the database is included in the package which can be used to run the example scripts included in the official documentation.UsageNeuroimaGene is designed to be accessible to users with minimal coding experience. A comprehensive manual in PDF format as well as a vignette with worked examples are accessible on CRAN. The package provides functionality for users to interrogate the local context in which target genes are neurologically relevant and to identify the primary brain regions impacted by the GReX of these genes.Querying associations between GReX and NIDPsThe primary NeuroimaGene query, ‘neuroimaGene()’, returns a set of tissue-specific associations between the GReX of user-defined genes and NIDPs of interest. This function takes a vector of either HUGO gene names or ENSEMBL gene IDs as necessary user input. Second, the user has three options available to restrict the NIDPs queried by the neuroimaGene command. The ‘modality’ parameter allows the user to restrict the query to NIDPs derived from T1 structural MRI, diffusion tensor imaging, or functional MRI. The ‘atlas’ parameter, allows the user to further restrict the query to named cortical atlases such as the Desikan-Killiany (DK) or Destrieux atlases when using T1 modalities. Within diffusion tensor imaging, there are two ‘atlas’ options which reflect different algorithms used to infer biology from the diffusion data. These are probabilistic tractography and tract based spatial statistics. A full list of modalities and atlases are available to browse in the help vignette and on the GitHub README. To use all modalities, or all atlases within a modality, the user must set the corresponding parameter to either NA or ‘all’. Third, the user can input a vector of pre-determined NIDPs. These NIDPs must match the names used in NeuroimaGene exactly. When neuroimaging nomenclature from outside imaging studies differs from the NeuroimaGene NIDP names, matching NIDPs must be identified manually. The ‘listNIDPs()’ function will return all NIDP names according to modality and atlas parameters that are identical to those in the neuroimaGene() function.Lastly, the user will have the option to identify a multiple testing correction procedure for the statistical significance threshold. Each imaging modality or atlas contains a different number of NIDPs. The Bonferroni correction (‘BF’) treats each of these NIDPs as independent even though data analyses demonstrate that this is not accurate [7]. This is a highly conservative threshold that will yield high confidence associations but is likely to generate many false negatives. Nevertheless, users may select this threshold using the mtc = ‘BF” parameter. Recognizing the correlation of brain measures from the same modality and atlas, we set the default multiple testing correction to reflect the less stringent Benjamini Hochberg (‘BH’) false discovery rate. Users are also permitted to access all nominally significant results (< 0.05, uncorrected) by setting the multiple testing correction parameter to ‘nom’.The multiple testing correction parameter represents a study-wide threshold and is therefore dynamic, depending on the modality and atlas parameters provided in the initial neuroimaGene query. If a user provides an atlas, multiple testing correction will be calculated for all tissues, NIDPs and genes for the NIDPs in that atlas. If the user provides a modality such as ‘T1’ of ‘dMRI’ but sets the atlas as NA or ‘all’, multiple testing correction will be calculated for all associations involving NIDPs from that modality. If the user sets the both the modality and atlas to NA or ‘all’, the correction will be applied to the entire data set of all 19 tissue models and 3537 NIDPs and 22,436 genes. This format applies to both the Bonferroni (BF) and Benjamini Hochberg (BH) corrections with an alpha of 0.05 in each case. Users wishing to provide their own vector of NIPDs will receive results according to a nominal threshold and will be alerted of the necessity to perform multiple testing correction themselves. When run, the neuroimaGene command returns a data.table object describing tissue-specific associations between gene expression and NIDPs. Descriptions of each data column in the neuroimaGene object are further provided in the package documentation.Performance benchmarkingThe neuroimaGene() function is the primary function of the package in which user-defined gene sets are associated with NIDPs from the UKB. We perform benchmarking analyses for the performance of this function regarding runtime and memory allocation within R (Supplementary Figs. 1–2, Additional file 1). We perform 100 iterations of the neuroimaGene command on gene sets of multiple lengths across all 3 multiple testing thresholds and 5 different atlas/modality parameters. The most computationally demanding analysis completed neuroimaGene analysis of 150 genes using a nominal threshold. Across all 5 tested atlas and modality parameters, the maximal mean runtime was under 7 s. The memory allocated across our analyses scaled linearly with the number of genes assessed and achieved a maximum of 2 MB in our benchmarking analyses. The NeuroimaGene packages relies on an external SQL database (1.9 GB) for which automatic download permissions are requested immediately upon package installation.VisualizationOnce a query has returned data in the form of a table, there are several options for visualizing the findings. NeuroimaGene implements customized functions based on the ggplot and ggseg packages to generate visual output [11, 12]. The plot_gns() function takes the results table as an input and returns a ggplot bar chart showing the number of NIDPs (y-axis) associated with each gene (x-axis). By default, the plot only displays the top 15 genes as ranked by top effect size. The maximum number of genes to be displayed can be modified using the maxGns parameter.Complementary to the plot_gns() function is the plot_nidps() function. This command takes the neuroimaGene results table as input and returns a ggplot dot plot object showing the aggregate effect size magnitude of all significantly associated query genes on the NIDPs from the results table. Effect sizes are aggregated by simple arithmetic mean. The maxNIDPs parameter can be set manually to show more NIDPs than the default 30. By default, the function displays a comparison of the normalized effect size magnitude with direction of effect indicated by the point shape. The user can also specify that the function display the normalized effect size as a vector rather than a scalar by using the mag = FALSE parameter.The plot_gnNIDP() function takes the results table as input and returns a ggplot heatmap object. The x-axis represents the queried genes, and the y-axis shows the associated NIDPs. The data in the plot represent associations between GReX of the x-axis genes and the y-axis NIDPs colored according to the number of tissue contexts in which the association was determined to be statistically significant. Default gene and NIDP counts can be adjusted using the previously described parameters.Lastly, the neuro_vis() command leverages the ggseg and fsbrain packages to generate visual representations of cortical and subcortical GReX associations [12, 13]. This function takes as input a NeuroimaGene results data table. Notably, this script is limited to displaying visual representations of the DK, DKT, Destrieux, and subcortical segmentation atlases only. The appropriate atlas must be included with the atlas parameter. The function returns a two-dimensional multi-panel plot with NIDPs colored according to the aggregate effect size of all associated genes from the results table. The high, mid, and low-range color scales default to blue, white, and red, and can be customized via function parameters.

Hot Topics

Related Articles