Vitessce: integrative visualization of multimodal and spatially resolved single-cell data

Using VitessceVitessce is available in different forms to make it useful for multiple audiences. Researchers can use the R and Python packages to explore local or remote datasets during data analysis. The Vitessce website and online configuration editor are designed for sharing visualizations with collaborators and debugging. Software developers can incorporate Vitessce into other tools and write plugins by using it as a JavaScript package.Vitessce in Python environments and Jupyter NotebooksVitessce can be used as a Python package in scripts and Jupyter notebooks. Visualizations can be configured and rendered directly into notebooks using an interactive widget that is compatible with multiple notebook environments including JupyterLab, Jupyter Notebook (classic) and Google Colab. The implementation of the widget is based on AnyWidget (https://anywidget.dev). Installation instructions and API documentation can be found at https://python-docs.vitessce.io. A set of tutorial notebooks is available at https://github.com/vitessce/vitessce-python-tutorial.Vitessce in R environments, RMarkdown, RStudio and Shiny appsVitessce can be used as an R package in scripts, RMarkdown documents and Shiny apps. Visualizations configured in R can be rendered using an interactive widget in the RStudio Viewer pane, RMarkdown documents, pkgdown websites and Shiny apps. The widget is implemented as an R htmlwidget. Installation instructions and API documentation can be found at https://r-docs.vitessce.io.Vitessce in JavaScript and web environmentsVitessce is implemented in JavaScript as a React component with corresponding APIs for configuration and registering plugins (described below). The JavaScript package can be used in websites, other React components or other JavaScript packages. Installation instructions and API documentation can be found at http://vitessce.io.Vitessce as a website and online configuration editorTo quickly configure a Vitessce visualization from a web browser, we provide a web application to write and edit Vitessce configurations using JavaScript Object Notation (JSON) and JavaScript syntax. This method of configuring a Vitessce instance does not require Python, R or JavaScript package installation. This resource can be found at http://vitessce.io.Python, R and JavaScript configuration APIsVitessce configurations are defined using a JSON representation that specifies the view layout and points to local or remote data files via URLs. In addition to the declarative JSON representation, we have developed Python, R and JavaScript APIs to enable users to define configurations programmatically. These configuration APIs support definition of datasets, files, views, the view layout and view coordinations within the native object-oriented paradigms of each language.JavaScript plugin APIThe Vitessce JavaScript package contains functions for defining plugin view types, coordination types, data types and file types. Once plugins have been defined, they must be registered by providing a name that will be used to refer to the implementation in Vitessce configurations. Plugin view types must be implemented as React components. We provide examples and tutorials for using the plugin API on the Vitessce documentation website.Multimodal configurationVitessce supports arbitrary multimodal datasets by adopting the observation-by-feature matrix conventions used by data analysis packages in the single-cell ecosystem including MultiAssayExperiment, Seurat and MuData. In this model, observations are entities being measured, such as cells, molecules, spots, beads or nuclei. Features are the characteristics being measured about entities such as genes, chromatin accessibility peaks or surface proteins. Feature values are the quantities being measured, such as expression levels, counts or intensities. Identifiers for types of observation, feature and feature value can be defined in the Vitessce configuration for both data and views. Vitessce then matches views to data accounting for observation type, feature type and/or value type identifiers. For most views and data types, all three properties are used, but a subset or a superset may be used depending on which properties are relevant to a particular view. For example, the heatmap considers all three properties when loading data because the visualization contains features, observations and values. A particular heatmap can be uniquely identified by these properties. In contrast, the feature list view is uniquely identified by only feature type and therefore observation type and value type are not considered when loading data for the view.Data organizationTo separate data loading from rendering and support multimodal experiments, Vitessce views load data corresponding to datasets and data types, such as arrays of spatial coordinates per observation, dimensionality reduction coordinates per observation, images and observation-by-feature matrices. Views may load data corresponding to one or more datasets and one or more data types. These data types may be aligned on certain axes (for example, to support shared observation or feature sets) or not (for example, to support comparison of multiple datasets).Data types are loaded independently such that their data may be contained in the same file or split across independent files, allowing multiple file formats to be used to load each dataset. Vitessce defines multiple file types that correspond to data type–file format pairs. If a file format supports multiple data types, a joint file type may be defined to simplify the configuration (that is, allowing URLs and file options to be defined once for a file while being exposed as multiple data types internally). For example, AnnData objects, which may contain multiple observation-by-feature matrices, dimensionality reductions and spatial coordinates, can be configured using a higher-level joint file type that is specific to AnnData. However these specifics are abstracted from the implementations of views that simply perform lookups for data corresponding to individual data types.On-demand loading of data subsetsTo scale to large datasets, data loading is deferred in multiple cases: multiscale images, multiscale bitmask-based segmentations, genome-mapped data and per-feature subsets of observation-by-feature matrices. The loading of highly multiplexed multiscale (that is, pyramidal) image files is implemented using Viv. As described by Manz et al.11, Viv loads images as data tiles corresponding to the current viewport zoom level (that is, resolution) and target position (that is, X/Y), as well as selections for channel (C), temporal (T) and Z axes. In Vitessce, this approach applies not only to primary images but also to image bitmask files for cell and organelle segmentations, which can also be stored in multiresolution formats.Scalability to large observation-by-feature matrices in Vitessce takes advantage of a similar data tiling approach. These matrices can be stored in multiple Zarr-based file formats, in particular when using AnnData19,32, MuData21 and SpatialData9. Zarr supports tiled (‘chunked’) and compressed multidimensional arrays that can be served as directories of static files termed ‘stores’. Benchmarking conducted by Moore et al.24 demonstrates that accessing Zarr data is at least as fast as accessing the same data stored in hierarchical data format version 5 (HDF5) and TIFF, and in high-latency cloud storage scenarios, Zarr outperforms HDF5 by an order of magnitude.The Zarr chunk strategy can be configured to optimize for particular use cases. In Vitessce, performance is optimized when many observations and few features are stored in each chunk, enabling use cases such as quickly loading the expression values for a particular gene (that is, one feature) across all cells (that is, all observations). The trade-off is that the same chunk strategy might result in poor performance under a different use case such as loading expression values for all genes for only one cell. We additionally compare the full sizes of Zarr and CSV-based AnnData objects in Supplementary Fig. 7.Genome browser tracks visualized using HiGlass also load data tiles on-demand based on the current browser viewport and zoom level, as described by Kerpedjiev et al.12. In Vitessce, we extend this mechanism to support genomic data from Zarr stores (in addition to the existing file formats supported by HiGlass) to eliminate the dependency on a specialized HiGlass Server. This extension is implemented as a HiGlass plugin data fetcher.Coordinated multiple viewsVitessce adopts the coordinated multiple views technique from the field of information visualization to enable comparison tasks, such as overview and detail, focus and context, difference views and small multiples42,43.Coordination modelThe coordination model proposed by Boukhelifa et al.29 is used to link subsets of views on visualization, interaction and data properties. In this model, views are not directly linked to one another, but instead to named property values referred to as coordination scopes. The properties that views can be linked on are referred to as coordination types. As a result, views are coordinated when they are linked to the same coordination scope for a given coordination type.JavaScript implementationVitessce is implemented as a configurable React component in JavaScript. Below the root React component, visualization and control views are also implemented as React components. Views for visualization may use any web technologies including WebGL, SVG, CSS and the HTML canvas element. The main Vitessce component is distributed in a JavaScript package that also exports an object-oriented configuration API and plugin APIs for defining custom views as React components, file types as JavaScript classes and coordination types.Implementation of coordinated multiple viewsWe have implemented the aforementioned coordination model in JavaScript and incorporated it into the Vitessce configuration schema using a JSON representation. As we anticipate that this may be useful for general implementation of coordinated multiple view visualizations using JavaScript, we provide the implementation as a standalone JavaScript package that can be used in conjunction with the plugin API.Implementation of viewsVitessce configurations define a set of views that contain interactive visualizations. View implementations are independent of one another and use custom React hook functions to access values from the coordination model and load data. The views currently available include a scatterplot, spatial plot, heatmap, cell set size bar chart, gene expression histogram, gene expression per cell set violin plots, cell set manager, gene selection, image layer controller and genomic profile per cell set genome browser. Views for visualization are implemented using custom JavaScript code and existing JavaScript libraries for statistical and geospatial data visualization. A list of the major open-source software packages used and a list of views currently available are provided in Supplementary Tables 1 and 2, respectively.Implementation of custom deck.gl layersThe implementations of several views including the heatmap, scatterplot, and spatial views leverage the extensibility of the deck.gl API for WebGL-based data visualization. Deck.gl exposes not only high-level JavaScript APIs (‘layers’) for rendering points, lines, polygons and text, but also abstractions for developing custom layers with associated custom WebGL shader programs. The heatmap is implemented using a custom layer that performs aggregation on the graphic processing unit (GPU) of neighboring values in an observation-by-feature matrix when multiple values correspond to only a single pixel on the screen. This eliminates the aliasing and Moiré patterns that would otherwise occur at low zoom levels of a large matrix in the heatmap view, while preserving smooth zoom and pan interactions. The custom heatmap layer also contains logic which restricts pan interactions to the matrix area and determines how to display axis ticks on the basis of zoom level and text length. The spatial view is implemented using multiple custom layers, including one which renders image bitmasks by extending from Viv layers. The spatial and scatterplot views both use a custom layer, which efficiently maps feature values to colors on the GPU using quantitative colormap functions written in WebGL.Implementation of data loadingData are loaded using HTTP from static files on local or remote web servers. Files corresponding to one or more datasets are specified via an array of URLs and file types in the configuration. Certain file types accept options that specify details of the internal file organization to enable lookups (for example, to load particular arrays within Zarr stores by specifying their relative paths). Each file type is loaded by a corresponding JavaScript class that defines a required load function and optional functions to load data subsets (where file formats allow). Data loading classes may perform validation, in particular when file formats such as JSON are used, which may vary widely in their contents. A list of supported file types is provided in Supplementary Table 3.Processing of data for use casesData for use cases shown in Fig. 2 was processed using Python scripts, Jupyter notebooks and Snakemake pipelines44 (Extended Data Figs. 1, 3 and 6). Vitessce visualizations were configured using the Vitessce Python package. Configurations were exported to JSON files and uploaded to the GitHub Gist service to enable referencing them by URL. Files obtained from the HuBMAP Data Portal were processed by automated pipelines developed within the HuBMAP Infrastructure and Engagement Collaboratory.Reporting summaryFurther information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Hot Topics

Related Articles