Whole brain alignment of spatial transcriptomics between humans and mice with BrainAlign

This research did not involve any biological experiments, human participants, or animal subjects, and thus did not require ethical approval. We focused on integrating mouse17 and human19 whole-brain STs datasets. Both datasets include gene expression matrices and spatial 3D coordinates of spots (Fig. 1a), with the mouse data exhibiting a higher resolution relative to the human data, where the mouse spot amount is almost tenfold that for humans.Data descriptionMouse dataThe processed mouse gene expression data and spot three-dimensional coordinates were obtained first17. The left hemispheres of three male mouse brains were cut into 10-μm sections. 75 coronal sections that covered the entire AP axis onto ST arrays were hybridized for one hemisphere. Each imaged brain hemisphere section was aligned and the positions of the ST spots in the tissue were measured using a computational framework designed to generate reference maps60 with the Allen mouse brain reference atlas (AMBA; www.brain-map.org). We acquired 64 anatomical brain regions for this mouse dataset. We also generated a set of 15 broad regions for visualization.Human dataHuman gene expression data and spot three-dimensional coordinates were obtained from the AHBA19 following the preprocessing procedures3, spot locations were mapped from histology data into MR space using Inkscape and BioImage Suite19. To obtain the anatomical region reference for each spot, we used the hierarchical ontology from the AHBA, using the Allen Institute’s API. We aggregated and pruned the neuroanatomical hierarchy as in ref. 3, resulting in 88 human brain regions. We also generated a set of 16 broad regions as in ref. 3 for visualization and annotation.Hippocampus dataWe downloaded the hippocampus data profiled by Stereo-seq from the Brain Science Data Center at the Chinese Academy of Sciences and selected the slices T315 (4540 spots), T447 (28566 spots), and T36 (28499 spots) in the mouse, marmoset, and macaque datasets, respectively46. We cropped the slice ’Puck_200115_08’ in the mouse hippocampus dataset54 sequenced by Slide-seqV2 to 21919 spots.Data preprocessingWe downloaded the processed mouse gene expression data directly. The complete dataset contains 15,326 unique genes across 34,053 spots after quality control. We pre-processed the human gene expression data using the abagen package in Python (https://abagen.readthedocs.io/en/stable/)19,20,61 as in ref. 3. We acquired a gene-by-spot expression matrix with 15,627 genes and 3682 spots across all donors. Both mouse and human expression arrays were log-transformed. For PCA, Beauchamp-FCN, Harmony, and Scanorama, we selected 6000 highly variable one-to-one homologous genes. We applied PCA to the datasets of two species separately after selecting 6000 HVGs. Before the integration, we used sc.pp.normalize_total() and sc.pp.log1p() from SCANPY to normalize the data. Moreover, we used sc.pp.combat() to achieve batch effect correction. These methods performed poorly without this step.Gene selectionWe selected highly variable genes (HVGs) and differentially expressed genes (DEGs) for downstream procedures. We used SCANPY62 built-in function highly_variable_genes() to identify the top 2000 genes with the highest dispersions as HVGs. We computed the DEGs using brain regions and clusters generated by the Leiden algorithm28 with resolution = 1 separately for mouse and human datasets by a Student’s t-test performed through the function rank_genes_groups() from SCANPY62. We merged the DEGs for brain regions and clusters. The genes used as the spot features were shared between species. We first took the top 70 DEGs for each region and retained genes with one-to-one homology in another species. We then took the union of the resulting two sets of genes for input, denoted as \({{{{\mathcal{G}}}}}_{m}\), \({{{{\mathcal{G}}}}}_{h}\) for mouse and human, respectively. We combined both HVGs and DEGs from mouse and human data to decide the node genes used for constructing the GNN. We denote the homologous genes for \({{{{\mathcal{G}}}}}_{m}\), \({{{{\mathcal{G}}}}}_{h}\) as \({{{{\mathcal{G}}}}}_{m}^{(hom)}\), \({{{{\mathcal{G}}}}}_{h}^{(hom)}\), respectively. Finally, we utilize \({{{{\mathcal{G}}}}}_{m}\cup {{{{\mathcal{G}}}}}_{h}^{(hom)}\), \({{{{\mathcal{G}}}}}_{h}\cup {{{{\mathcal{G}}}}}_{m}^{(hom)}\) as the node genes for two species separately, where ∪ represents “union” of sets. The dimensions of the initial node features of the human and mouse spots are 3554 and 3311, respectively, which correspond to the number of considered genes. In the hippocampus experiments, we selected 6244 and 7605 genes for mice and macaques in the same-platform experiment, and 6256 and 6180 genes for mice and macaques in the cross-platform experiment.Gene orthologyFollowing the step in ref. 10, we downloaded the gene multi-to-multi homology (orthologs) information from the BioMart web server derived from the Ensembl Compara pipeline63. We used the mouse as the anchor species and downloaded the homology mapping file.BrainAlignConstruction of the cross-species heterogeneous graphFollowing10, after selecting HVGs and DEGs, we extended them using homologous mappings across species to form the gene sets for building the heterogeneous graph. We built the “spot-spot” edges in the HGNN via KNNs measured by the spatial distance of spot 3D coordinates. We searched top k = 5 KNNs for each spot as its neighbors and assigned edges and directly produced the gene-gene edges according to the multi-to-multi gene homology mappings. We determined the spot-gene edges through whether the gene expression was zero or nonzero.Node feature initializationFor stable convergence and maintenance of the brain region or spot clusters, we used the first layer output of the trained CAME model10 as spot and gene initial features of BrainAlign. In CAME, the gene initial embedding is obtained via aggregation of neighboring spot embeddings. It is noteworthy that the first layer output of CAME only induced information of “region” or “cluster” labels for only one species, the self-supervised training setting was not affected. In the whole-brain mouse and human datasets, only mouse region labels are used. No regional labels are used in the mouse and macaque hippocampus datasets, and only cluster labels of pre-clustering on one-to-one genes are used for initializing embeddings. Furthermore, we averaged the initial features of homologous genes to inject the homology information into the gene embeddings.Self-supervised mechanisms in the graph neural networkInstead of using predefined meta-paths, just as in ref. 23, BrainAlign first globally learns the relation-based embeddings for all nodes from the neighbor properties under each relation type and then uses an attentive fusion module to combine them. Self-supervised learning is achieved by automatically generating positive and negative spots and optimizing the model through contrastive learning. Then a multi-hop contrast is used to optimize the regional structure information by utilizing the strong correlation between nodes and their neighbor-graphs. And multiple relationships are taken into consideration by multi-hop message passing.
Notations
Initially, an attributed heterogeneous network is defined as \(G=(V,E,X,{{{\mathcal{T}}}},{{{\mathcal{R}}}})\), which consists of a set of nodes V, a set of edges E, the node feature matrix X, the set of node types \({{{\mathcal{T}}}}\) and the set of edge types \({{{\mathcal{R}}}}\). For heterogeneous networks, \(| {{{\mathcal{T}}}}| > 1\) and/or \(| {{{\mathcal{R}}}}| > 1\) is required. Besides, for a node v ∈ V, the p-hop neighborhood of v is defined as \({{{{\mathcal{N}}}}}_{v}^{p}\), the set of vertices that are reachable from v after p hops. For a specific relation r, the p-hop relation-based neighborhood of v is defined as \({{{{\mathcal{N}}}}}_{v}^{p}(r)\). Here, one-hop means two nodes are directly connected, while p-hop means two nodes are connected by p one-hop relations. For example, “spot 1-gene 1” and “gene 1-spot 2” are one-hop relations, and the two-hop relation “spot 1-spot 2” is induced from the two one-hop relations as “spot 1-gene 1-spot 2”. Formally, in a graph G, a p-hop relation-based neighbor-graph of the node v is defined as \({{{{\mathcal{G}}}}}_{v}^{{\pi }_{p}}\), where πp = r1r2…rp is used to denote a path \({t}_{1}{\to }^{{r}_{1}}{t}_{2}{\to }^{{r}_{2}}\,\)\(\ldots {\longrightarrow }^{{r}_{p}}{t}_{p+1}\) with a sequence of p edge types \(\{{r}_{1},{r}_{2},\ldots,{r}_{p}\}\) and node types \(\{{t}_{1},{t}_{2},\ldots,{t}_{p+1}\}\). The types of nodes and edges in the path can be repeated. The p-hop relation set of node with type t is defined as \({R}_{t}^{p}\), e.g., \({R}_{t}^{1}\) denotes all the possible one-hop relation set of node with type t.

Encoding layer
BrainAlign learns the embedding of a node from its relevant one-hop relation-based neighbor-graphs by employing a relation-based encoder to encode the multi-typed nodes and edges of the spot-gene graph. For node it of type t, we generate the relation rt-based embedding of the k-th layer by averaging (MEAN) the representations of its relevant one-hop neighbors as follows:$${{{{\bf{u}}}}}_{{r}_{t}}^{k}({i}_{t})=\delta \left({{{{\boldsymbol{\Theta }}}}}_{{r}_{t}}^{k}\cdot {{{\rm{MEAN}}}}\left(\left\{{{{{\bf{h}}}}}_{j}^{k-1}| \, j\in {{{{\mathcal{G}}}}}_{{i}_{t}}^{{\pi }_{1,{r}_{t}}}\right\}\right)\right),$$
(1)
where \({{{{\bf{u}}}}}_{{r}_{t}}^{k}({i}_{t})\) denotes the 1-hop neighbor embedding of the it-th spot, δ denotes the ReLU activation function, \({{{{\boldsymbol{\Theta }}}}}_{{r}_{t}}^{k}\) is a matrix to be learned for the relation rt in the k-th layer, \({{{{\bf{h}}}}}_{j}^{k-1}\) is the representation of the neighbor j in the (k − 1)-th layer, and \({{{{\mathcal{G}}}}}_{{i}_{t}}^{{\pi }_{1,{r}_{t}}}\) denotes the one-hop neighbor-graph of node it based on relation rt.
For all nodes of type t, important learning parameters are shared by default. For all nodes in Vt, the coefficients of each relation rt in \({{{{\mathcal{R}}}}}_{t}^{1}\) can be formulated as:$${e}_{{r}_{t}}^{k}=\frac{1}{\left\vert {V}_{t}\right\vert }{\sum}_{{i}_{t}\in {V}_{t}}\tanh \left({\left({{{{\bf{q}}}}}_{t}^{k}\right)}^{\top }\cdot \left({{{{\bf{W}}}}}_{t}^{k}\cdot {{{{\bf{u}}}}}_{{r}_{t}}^{k}({i}_{t})+{{{{\bf{b}}}}}_{t}^{k}\right)\right),$$
(2)
where \({{{{\bf{W}}}}}_{t}^{k}\) is a linear transformation parameter matrix, \({{{{\bf{q}}}}}_{t}^{k}\) is the learnable attention vector, and \({{{{\bf{b}}}}}_{t}^{k}\) is the learnable bias vector in the k-th layer. Then, to fairly integrate these relation-based embeddings for nodes of type t, the standard coefficient \({\beta }_{{r}_{t}}^{k}\) is obtained by normalizing \(\{{e}_{{r}_{t}}^{k}| {r}_{t}\in {R}_{t}^{1}\}\) with a softmax function:$${\beta }_{{r}_{t}}^{k}={{{\mathrm{softmax}}}}\,\left({e}_{{r}_{t}}^{k}\right)=\frac{\exp \left({e}_{{r}_{t}}^{k}\right)}{{\sum }_{{r}_{t}\in {R}_{t}^{1}}\exp \left({e}_{{r}_{t}}^{k}\right)}.$$
(3)
Eventually, focusing on node it in Vt, its unique representation in the k-th layer can be produced by the following linear combination:$${{{{\bf{h}}}}}_{{i}_{t}}^{k}={\sum}_{{r}_{t}\in {{{{\mathcal{R}}}}}_{t}^{1}}{\beta }_{{r}_{t}}^{k}\cdot {{{{\bf{u}}}}}_{{r}_{t}}^{k}({i}_{t}).$$
(4)

Multi-hop message passing
BrainAlign employs a multi-hop contrast optimization to enhance the regional structure information by leveraging the strong correlations between nodes and their neighbors. Since directly learning the node embedding by aggregating features of its one-hop neighbors is insufficient to capture diverse graph structures, we stack multiple encoding layers to derive multi-hop message passing. For this purpose, we concatenate the learned embedding with their original attributes to get the final embedding for each type of node by a node-type-based linear propagation:$${{{{\bf{z}}}}}_{{i}_{t}}={\hat{{{{\bf{W}}}}}}_{t}\cdot \left[{{{{\bf{h}}}}}_{{i}_{t}}^{k}\parallel {{{{\bf{x}}}}}_{{i}_{t}}\right],$$
(5)
where \({\hat{{{{\bf{W}}}}}}_{t}\) is a matrix to be learned for all nodes of type \(t,{{{{\bf{h}}}}}_{{i}_{t}}^{k}\) and \({{{{\bf{x}}}}}_{{i}_{t}}\) are the representations and attributes of the node it with type t, respectively, and [ ⋅ ∥ ⋅ ] is a concatenation operation.
Contrastive lossIn BrainAlign, self-supervised learning is approached via contrastive learning, automatically generates positive and negative spots, and optimizes the model through contrastive learning. Negative spots for one-hop and two-hop contrasts are generated by shuffling the learned node representations. Positive spots for p-hop (p = 1, 2) contrast are p-hop neighbor-graph vectors. The contrast loss in BrainAlign comprises the sum of intrinsic contrast loss and multi-hop contrast. For intrinsic contrast, a discriminator computes the probability score for each intrinsic pair, i.e., the learned node embeddings and the original attribute, using a standard binary cross-entropy objective function. The multi-hop loss for each node is a weighted sum of one-hop and two-hop contrast. The total loss \({{{\mathcal{L}}}}\) is the sum of loss for both spot and gene node, i.e., \({{{{\mathcal{L}}}}}_{spot}\) and \({{{{\mathcal{L}}}}}_{gene}\).
Intrinsic contrast
We introduced the intrinsic contrast to capture the mutual dependence between learned node embeddings and the original attributes. We used a discriminator \({{{\mathcal{D}}}}\) to compute the probability score for each of the intrinsic pairs and present the objective function with a standard binary cross-entropy:$${{{{\mathcal{L}}}}}_{t}^{S}=-\frac{1}{\left\vert {V}_{t}\right\vert }\sum\limits_{{i}_{t}\in {V}_{t}}\left(\log {{{\mathcal{D}}}}\left({{{{\bf{z}}}}}_{{i}_{t}},\,{{{{\bf{x}}}}}_{{i}_{t}}\right)+\log \left(1-{{{\mathcal{D}}}}\left({\tilde{{{{\bf{z}}}}}}_{{i}_{t}},{{{{\bf{x}}}}}_{{i}_{t}}\right)\right)\right).$$
(6)
The discriminator \({{{\mathcal{D}}}}\) is a bilinear function as follows:$${{{\mathcal{D}}}}\left({{{{\bf{z}}}}}_{{i}_{t}},{{{{\bf{x}}}}}_{{i}_{t}}\right)=\sigma \left({{\tilde{{{{\bf{z}}}}}}}_{{i}_{t}}^{\top }\cdot {{{{\boldsymbol{\Omega }}}}}_{t}^{d} \cdot {{{{\bf{x}}}}}_{{i}_{t}}\right),$$
(7)
where σ is the logistic sigmoid nonlinearity and \({{{{\boldsymbol{\Omega }}}}}_{t}^{d}\) is a trainable matrix.

Multi-hop contrast
For the learned type t node representations \({{{{\bf{Z}}}}}_{t}=\{{{{{\bf{z}}}}}_{{i}_{t}}| {i}_{t}\in {V}_{t}\}\), we first corrupted them to get negative spots by shuffling them in a row-wise manner, denoted as \({\tilde{{{{\bf{Z}}}}}}_{t}=\{{\tilde{{{{\bf{z}}}}}}_{{i}_{t}}| {i}_{t}\in {V}_{t}[idx,\,]\}\), where idx is the shuffled indices and \({\tilde{{{{\bf{z}}}}}}_{{i}_{t}}\) is the negative spot embedding of node it. Then, we extracted one-hop and two-hop relation-based neighbor-graphs for each node and obtained the neighbor-graph level summary vectors by a readout function \({{{\mathcal{A}}}}:{{\mathbb{R}}}^{{N}^{{\prime} }\times d}\to {{\mathbb{R}}}^{d}\), where \({N}^{{\prime} }\) indicates the neighbor-graph size and d is the dimension of latent representations. For each node, a set of one-hop and two-hop neighbor-graph vectors are used as positive spots. Specifically, we use \({{{{\rm{S}}}}}_{{i}_{t}}^{1}=\{{{{{\rm{s}}}}}_{{i}_{t}}^{{r}_{t}}| {r}_{t}\in {R}_{t}^{1}\}\) and \({{{{\rm{S}}}}}_{{i}_{t}}^{2}=\{{{{{\rm{s}}}}}_{{i}_{t}}^{{r}_{t}^{2}}| {r}_{t}^{2}\in {R}_{t}^{2}\}\) to denote the gathers of one-hop and two-hop neighbor-graph vectors of the node it, respectively. And \({r}_{t}^{2}\) represents a 2-hop relation for a node with type t. After that, we used the margin triplet loss for one-hop contrastive learning of node it :$${\ell }_{1}^{{i}_{t}}={\sum}_{{r}_{t}\in {R}_{t}^{1}}\left(\max \left(\sigma \left({{{{{\bf{z}}}}_{{i}_{t}}^{\top }}}\cdot {{{{\bf{s}}}}}_{{i}_{t}}^{{r}_{t}}\right)-\sigma \left({{\tilde{{{{\bf{z}}}}_{{i}_{t}}^{\top }}}}\cdot {{{{\bf{s}}}}}_{{i}_{t}}^{{r}_{t}}\right)+\epsilon,0\right)\right),$$
(8)
and two-hop contrastive learning of node it :$${\ell }_{2}^{{i}_{t}}={\sum}_{{r}_{t}^{2}\in {R}_{t}^{2}}\left(\max \left(\sigma \left({{{{{\bf{z}}}}}}_{{i}_{t}}^{\top }\cdot {{{{\bf{s}}}}}_{{i}_{t}}^{{r}_{t}^{2}}\right)-\sigma \left({{\tilde{{{{\bf{z}}}}}}_{{i}_{t}}^{\top }}\cdot {{{{\bf{s}}}}}_{{i}_{t}}^{{r}_{t}^{2}}\right)+\epsilon,0\right)\right),$$
(9)
where σ is the logistic sigmoid nonlinearity and ϵ is the margin value. For all nodes with type t, we summarize the multi-hop contrast loss as following:$${{{{\mathcal{L}}}}}_{t}^{C}=\frac{1}{\left\vert {V}_{t}\right\vert }{\sum}_{{i}_{t}\in {V}_{t}}\left(\lambda {\ell }_{1}^{{i}_{t}}+\left(1-\lambda \right){\ell }_{2}^{{i}_{t}}\right),$$
(10)
where λ is the tunable coefficient.
Finally, we joined all modules’ losses and obtained the final objective \({{{\mathcal{L}}}}\) for all types of nodes (spot and gene in this work):$${{{\mathcal{L}}}}={\sum}_{t\in {{{\mathcal{T}}}}}\left({{{{\mathcal{L}}}}}_{t}^{C}+{{{{\mathcal{L}}}}}_{t}^{S}\right),$$
(11)
where \({{{\mathcal{T}}}}\) comprises two types of node, i.e., gene and spot.
The training settings of BrainAlignIn the experiment, the embedding dimension is set to 128. We used the AdamW optimizer64 with an initial learning rate of 0.02, and the learning rate of each parameter group decayed by 0.5 every ten epochs. We also adopted early stopping with a patience of 30 epochs. The maximum number of iterations is set to 150. We used 2-hop message passing to learn the node embedding, ϵ is set to 0.8, and λ is set to 0.5. The dropout rate in the fully-connected layers is set to 0.7, and the leaky rectified linear unit was used as the activation function65.Analysis methodsClustering spot and gene embeddingsWe used the Leiden algorithm implemented in the SCANPY package62. For spot embedding clustering, we set the resolution as 9 for proper cluster fineness. Similarly, for gene embedding clustering, we set the resolution as 1.5.Naming clusters with brain region namesWe referred to ref. 17 to name the spot clusters in the embedding space. Each cluster name was given by combining mouse and human region names or acronyms. For example, ‘0-TH-TH’ represents the first cluster of spots mainly from the mouse and human thalamus. Specifically, for each species, if the rate of region spot number in one cluster/species spot number in one cluster exceeds P = 0.3, the region name or acronym is appended to the cluster name. If the proportions of all region spots in one cluster are below P, the ‘mixed(region name or acronym with the maximum count number)’ is appended to the cluster name, e.g., ‘1-PIR-mixed(CI)’.Identification of homologous brain regionsThe procedure for identifying the homologous brain regions is as follows: (1) Clustering spot embeddings using the Leiden algorithm28 with resolution equaling 9; (2) Identifying highly correlated spot pairs, i.e., spots from different species that are grouped as highly correlated cluster pairs (clustered to the same cluster across species); (3) Aligning clusters and brain regions. When the spot proportion of some region ranks the top three in the cluster, this cluster represents this region. Those homologous brain region pairs are identified by mapping the highly correlated cluster pairs to region pairs. For example, if the proportion of spots with the region labeled “thalamus” ranks top three in spots of cluster ‘0-TH-TH’ for both species, this cluster is seen as a representation of the thalamus region in mice and humans, and the homologous pairs “Thalamus-thalamus” is identified.Identifying differentially expressed genesWe used the Wilcoxon test implemented in the package SCANPY62 to identify differentially expressed genes for each spatial domain with the FDR threshold of 1% (Benjamin-Hochberg adjustment).Generating abstracted graphWe employed the PAGA algorithm39 implemented in package SCANPY62 to depict spatial trajectory. The PAGA graphs were visualized by the function scanpy.pl.paga(), while the coordinates of points were reset to discriminate species.Gene set enrichment analysisWe used the function gseapy.enrichr implemented in the package GSEApy66 to identify enriched GO terms for a specific gene list. For mice and humans, we selected eight biologically important gene sets related to the brain atlas, cell types, cell markers, biological process, cellular component, molecular function, and gene atlas for GSEA.Cross-species overlap ratio of GO termsFor each pair of regions or clusters from mouse and human, we selected the DEGs and then conducted gene set enrichment on the DEGs list. The enriching GO terms were filtered by the adjusted P-value < 0.05 (Wilcox test, Bonferroni correction). For the i-th GO term set of mouse \({{{{\rm{S}}}}}_{i}^{m}\) and the j-th GO term set of mouse \({{{{\rm{S}}}}}_{j}^{h}\), the overlap ratio of GO terms was computed by the following formula:$${{{{\rm{OR}}}}}_{ij}=\frac{1}{2}\left(\frac{| {{{{\rm{S}}}}}_{i}^{m}\cap {{{{\rm{S}}}}}_{j}^{h}| }{| {{{{\rm{S}}}}}_{i}^{m}| }+\frac{| {{{{\rm{S}}}}}_{i}^{m}\cap {{{{\rm{S}}}}}_{j}^{h}| }{| {{{{\rm{S}}}}}_{i}^{h}| }\right).$$After this, we applied the row and column z-scores to the overlap index matrix to normalize it. For each pair of regions or clusters from mice and humans, we selected the DEGs and then conducted gene set enrichment on the DEG list. The enriching GO terms were filtered by the adjusted P-value < 0.05 using the Wilcox test (Bonferroni correction adjusted).Seurat alignment scoreWe adopted the Seurat alignment score (SAS) to evaluate the alignment of spots or genes in the embedded space, that is, the mixing of embeddings between species. SAS ranges from 0 to 1, and higher values indicate better mixing. Here, we calculated SAS as described in ref. 25 with the Python implementation in ref. 67. SAS is defined as:$${{{\rm{SAS}}}}=1-\frac{\bar{x}-\frac{K}{N}}{K-\frac{K}{N}},$$where \(\bar{x}\) is the average number of spots from the same species among the K-nearest neighbors (different datasets were first subsampled to the same number of spots as the smallest dataset), and N is the number of species. We set K to 1% of the subsampled spot number.Statistics and reproducibilityNo statistical method was used to predetermine the sample size. No data were excluded from the analyses. The experiments were not randomized. The investigators were not blinded to allocation during experiments and outcome assessment. More information was provided in the Reporting summary file.Reporting summaryFurther information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Hot Topics

Related Articles