SANTO: a coarse-to-fine alignment and stitching method for spatial omics

SANTOIn general, each spatial omics dataset contains N cells/spots, and the spatial coordinates and omics feature expression profiles are denoted as \({{{{\mathscr{S}}}}}{{{{\mathscr{\in }}}}}{{\mathbb{R}}}^{N\times D}\) and \({{{{\mathscr{G}}}}}{{{{\mathscr{\in }}}}}{{\mathbb{R}}}^{N\times F}\), respectively, where D represents the dimension of spatial coordinates that equals to 2 (x and y axis) or 3 (x, y and z axis) and F denotes the number of omics features shared among all of cells/spots, e.g., number of genes in spatial transcriptomics data. In the alignment and stitching tasks, there are a source slice X and a target slice Y. X contains \({{{{{\mathscr{S}}}}}}_{X}\) and \({{{{{\mathscr{G}}}}}}_{X}\) with \({N}_{X}\) cells/spots and \({F}_{X}\) omics features in \({{{{{\mathscr{G}}}}}}_{X}\). Similarly, Y contains \({{{{{\mathscr{S}}}}}}_{Y}\) and \({{{{{\mathscr{G}}}}}}_{Y}\) with \({N}_{Y}\) cells/spots and \({F}_{Y}\) omics features in \({{{{{\mathscr{G}}}}}}_{Y}\). The goal of SANTO is to align/stitch the spatial coordinates between slice X and slice Y properly.Coarse alignmentSinces X and Y have non-identical omics features, the common features F is the intersection between \({F}_{X}\) and \({F}_{Y}\), then \({{{{{\mathscr{G}}}}}}_{X}\) and \({{{{{\mathscr{G}}}}}}_{Y}\) are extracted as the shape of \({N}_{X}\times F\) and \({N}_{Y}\times F\). Next, two omics feature expression profiles \({{{{{\mathscr{G}}}}}}_{X}\) and \({{{{{\mathscr{G}}}}}}_{Y}\) are used to calculate the Pearson Correlation Coefficient (PCC) between each pair of the i-th and the j-th cells/spots \({X}_{i}\) and \({Y}_{j}\) from slices X and Y:$${PCC}\left({{{{{\mathscr{G}}}}}}_{{X}_{i}},\,{{{{{\mathscr{G}}}}}}_{{Y}_{j}}\right)=\frac{\sum \left({{{{{\mathscr{G}}}}}}_{{X}_{i}}-\bar{{{{{{\mathscr{G}}}}}}_{X}}\right)\left({{{{{\mathscr{G}}}}}}_{{Y}_{j}}-\bar{{{{{{\mathscr{G}}}}}}_{Y}}\right)}{\sqrt{\sum {\left({{{{{\mathscr{G}}}}}}_{{X}_{i}}-\bar{{{{{{\mathscr{G}}}}}}_{X}}\right)}^{2}\sum {\left({{{{{\mathscr{G}}}}}}_{{Y}_{j}}-\bar{{{{{{\mathscr{G}}}}}}_{Y}}\right)}^{2}}},$$
(1)
where \(\bar{{{{{{\mathscr{G}}}}}}_{X}}\) and \(\bar{{{{{{\mathscr{G}}}}}}_{Y}}\) represent the mean values of \({{{{{\mathscr{G}}}}}}_{X}\) and \({{{{{\mathscr{G}}}}}}_{Y}\) among omics features. With output PCC matrix with the shape of \({N}_{X}\times \, {N}_{Y}\) between X and Y, each cell/spot \({X}_{i}\) can be paired with a specific cell/spot \({Y}_{j}\) if \({PCC}({{{{{\mathscr{G}}}}}}_{{X}_{i}},\,{{{{{\mathscr{G}}}}}}_{{Y}_{j}})\) has the highest PCC score among PCC scores between \({X}_{i}\) and all of cells/spots in Y. This strategy is under the consideration that most of pairs are spatially proximal, which can dominate the proper transformation even if few pairs are not spatially proximal.Alternatively, for the stitching task, the overlap region between X and Y needs to be identified by filtering all pairs inside the overlap region. Thus, we firstly sort all PCC values of pairs with the descending order. To identify the pairs inside the overlap region, we formulate this problem as change point detection. We want to find a dramatic change in value as the change point (an index of pair), which can distinguish the most difference between left and right points. And we will keep the left points because the highly correlated pairs are more likely to be concentrated in the overlap region. For each pair along the indices, we calculate the cost for each pair x by \({cost}({x}_{{left}})+{cost}({x}_{{right}})\), where \({x}_{{left}}\) and \({x}_{{right}}\) represent all left and right pairs of the current pair x along the indices. This cost quantifies the integrated variances of left and right pairs for each index x along the indices. For the set of pairs \({x}_{I}\) (i.e., \({x}_{{left}}\) or \({x}_{{right}}\)), \({cost}\left({x}_{I}\right)\) is to quantify the variance of PCCs among a set of pairs \({x}_{I}\), which is defined as:$${cost}\left({x}_{I}\right)={\sum}_{t\in I}{{||p}({x}_{t})-{median}\left({p({x}_{I})}\right){||}}_{1},$$
(2)
where the t-th pair \({x}_{t}\) in \({x}_{I}\) has the PCC value as \(p({x}_{t})\), and \({median}\left({p}({x}_{I})\right)\) is the median value of PCCs from all pairs in \({x}_{I}\). After calculating the cost of all pairs along the indices, we can finally find the change point \({x}^{{\prime} }\) with the minimum cost, which is the best partition along the indices to distinguish that the pairs of cells/spots are inside or outside the overlap region. Next, we delete the right pairs and keep the left pairs of the best partition \({x}^{{\prime} }\). Then, for each i-th cell/spot \({X}_{i}\) in X and its paired cell/spot \({Y}_{{i}_{{pair}}}\) in Y, they can be used to calculate the covariance matrix H between X and Y given by:$$H=\sum \left({{{{{\mathscr{S}}}}}}_{{X}_{i}}-\bar{{{{{{\mathscr{S}}}}}}_{X}}\right)\left({{{{{\mathscr{S}}}}}}_{{Y}_{{i}_{{pair}}}}-\bar{{{{{{\mathscr{S}}}}}}_{Y}}\right),$$
(3)
where \(\bar{{{{{{\mathscr{S}}}}}}_{X}}\) and \(\bar{{{{{{\mathscr{S}}}}}}_{Y}}\) denote the spatial centroids of X and Y, respectively. Next, singular value decomposition (SVD) is used to decompose \(H=U\Sigma {V}^{T}\) where U and V are unitary matrices and Σ is a diagonal matrix with non-negative real numbers on the diagonal. After the SVD factorization, we can optimize the proper rotation and translation in closed-form by:$$T=-R\bar{{{{{{\mathscr{S}}}}}}_{X}}+\bar{{{{{{\mathscr{S}}}}}}_{Y}}.$$
(5)
Finally, the spatial coordinates of source slices \({{{{{\mathscr{S}}}}}}_{X}\) can be updated as \({{{{{\mathscr{S}}}}}}_{X}^{{\prime} }={R{{{{\mathscr{S}}}}}}_{X}+T\), which is the result of the coarse alignment.Fine alignmentFirstly, for both X and Y, we construct k-NN graphs for cells/spots in \({{{{\mathscr{S}}}}}\) and \({{{{\mathscr{G}}}}}\) separately by Euclidean distances of spatial coordinates and omics feature expressions, respectively, where k is a hyperparameter. Then, we use dynamic graph CNN (DGCNN) to learn permutation-invariant embeddings of each cell’s/spot’s spatial feature \({{{{\mathscr{S}}}}}\) and omics feature \({{{{\mathscr{G}}}}}\) in X and Y by the graphs we construct22. There are two embedding networks via DGCNN focusing on spatial feature and omics feature (\({{{{\mathscr{S}}}}}\) and \({{{{\mathscr{G}}}}}\)) separately with different weights. On the other hand, since embedding networks focusing on each feature type (spatial feature or omics feature) in X and Y try to learn the same knowledge of corresponding feature type, the parameters of embedding networks for each feature type between X and Y are shared23. The initial input for the embedding networks is each constructed graph for both \({{{{\mathscr{S}}}}}\) and \({{{{\mathscr{G}}}}}\) in X and Y. Through the forward mechanism of the embedding network, \({X}_{i}^{l}\) represents the embedding of the i-th cell/spot \({X}_{i}\) in the l-th layer in the graph, with L layers in total. \({h}_{\theta }^{l}\) is the convolution layer in the l-th layer with weight θ, differing by the feature type (spatial or omics features). f denotes the channel-wise aggregation function max and the final embedding \({X}_{i}\) in \({{{{\mathscr{S}}}}}\) or \({{{{\mathscr{G}}}}}\) is:$${X}_{i}={\parallel }_{l=1}^{L}\, f\left(\left\{{h}_{\theta }^{l}\left({X}_{i}^{l-1},\,{X}_{j}^{l-1}\right),\forall j\in {{{{{\mathscr{N}}}}}}_{i}\right\}\right),$$
(6)
where || denotes the concatenation of outputs from L layers, \({{{{{\mathscr{N}}}}}}_{i}\) denotes the neighbors of vertex \({X}_{i}\) in the graph and j is the index of all of neighbors in \({{{{{\mathscr{N}}}}}}_{i}\). Through the forward pass each time, we recompute the input k-NN graph before the embedding network. This dynamic graph update learns how to construct the graph in each layer dynamically rather than a fixed graph through the training process.After obtaining the separate embedding \({{{{{\mathscr{F}}}}}}_{{{{{\mathscr{S}}}}}}\) and \({{{{{\mathscr{F}}}}}}_{{{{{\mathscr{G}}}}}}\) of \({{{{\mathscr{S}}}}}\) and \({{{{\mathscr{G}}}}}\) for X and Y, respectively, we have two final embeddings \({{{{{\mathscr{F}}}}}}_{X}\) and \({{{{{\mathscr{F}}}}}}_{Y}\) of X and Y that each embedding is the concatenation of embedded spatial features and omics features (\({{{{{\mathscr{F}}}}}}_{{{{{\mathscr{S}}}}}}\) and \({{{{{\mathscr{F}}}}}}_{{{{{\mathscr{G}}}}}}\)) in the feature-wise from the corresponding slice. Next, we generate a probabilistic mapping from X to Y by:$$m\left(X,Y\right)={softmax}\left({{{{{\mathscr{F}}}}}}_{Y}{{{{{\mathscr{F}}}}}}_{X}^{T}\right).$$
(7)
The shape \(m\left(X,Y\right)\) is \({N}_{X}\times {N}_{Y}\) and for each cell/spot in X with index i, we have a soft pointer \(m\left({X}_{i},\, Y\right)\) from \({X}_{i}\) into all cells/spots of Y in m(X,Y). With soft mapping between X to Y, \({{{{{\mathscr{S}}}}}}_{X}^{{\prime} }\) can be guided to transform to final coordinates \({{{{{\mathscr{S}}}}}}_{X}^{{\prime} {\prime} }\) by:$${{{{{\mathscr{S}}}}}}_{X}^{{\prime} {\prime} }={{{{{\mathscr{S}}}}}}_{Y}^{T}m\left(X,Y\right),$$
(8)
where cells/spots with the same index in \({{{{{\mathscr{S}}}}}}_{X}^{{\prime} }\) and \({{{{{\mathscr{S}}}}}}_{X}^{{\prime} {\prime} }\) are paired. Thus, we can optimize the final rotation and translation of the fine alignment from \({{{{{\mathscr{S}}}}}}_{X}^{{\prime} }\) to \({{{{{\mathscr{S}}}}}}_{X}^{{\prime} {\prime} }\) by SVD as the previous demonstration (Eqs. 3, 4 and 5) and the spatial coordinates of the source slice X can be finally aligned/stitched with the target slice Y by rotations and translations from the coarse and fine alignment.Since the fine alignment is an unsupervised method, we design the loss by co-considering spatial coordinates and omics feature expressions of cells/spots, under the assumption that proximal cells/spots between X and Y tend to be similar at spatial and omics feature level. We use a hyperparameter α to regulate the weight of loss between \({{{{\mathscr{S}}}}}\) and \({{{{\mathscr{G}}}}}\). The loss is defined as:$${Loss}=\frac{1}{{N}_{X}{N}_{Y}}\left({\sum }_{i}^{{N}_{X}}{softmin}\cdot {\sum }_{j}^{{N}_{Y}}\left(\alpha \left(1-{PCC}\left({X}_{i},{Y}_{j}\right)\right)+\left(1-\alpha \right){Dis}\left({X}_{i},{Y}_{j}\right)\right)\right)$$
(9)
where \({PCC}\left({X}_{i},{Y}_{j}\right)\) and \({Dis}({X}_{i},{Y}_{j})\) denote the PCC values of omics feature expressions and Euclidean distance of spatial coordinates between the i-th cell/spot \({X}_{i}\) and the j-th cell/spot \({Y}_{j}\). The \({softmin}\) is:$${softmin}=\frac{{e}^{-{Dis}\left({X}_{i},{Y}_{j}\right)/\tau }}{{\sum }_{j}^{{N}_{Y}}{e}^{-{Dis}\left({X}_{i},Y\right)/\tau }},$$
(10)
which obtains the attention weights between each \({X}_{i}\) to \({Y}_{j}\). Ï„ represents the temperature, which is used to regulate sensitivity of the softmin function. Through the definition of the softmin function, spatially closer pairs of cells/spots between X and Y would have larger attention weights. softmin is used to consider the mapping between the i-th cells/spots \({X}_{i}\) in X to a region of cells/spots in Y (one-to-many soft mapping) instead of one-to-one hard mapping.BenchmarkingMeasurementsWe use three metrics to benchmark SANTO with other methods: PCC, CI and AAS. For PCC, we calculate the Pearson correlation coefficient of each pair of cell/spot \({X}_{i}\) in X with index i and its spatially closet cell/spot \({Y}_{j}\) with index j after alignment, and take the average through all cells/spots in X:$${PCC}=\frac{1}{{N}_{X}}{\sum }_{{N}_{X}}^{i}{PCC}\left({X}_{i},{Y}_{j}\right).$$
(11)
To evaluate the performance in the cell-type scale, we calculate CI between X and Y with their original cell-type annotation. For each pair of cell/spot \({X}_{i}\) in X with index i and its closet cell/spot \({Y}_{j}\) with index j, if they have the same cell type, \(I({X}_{i},{Y}_{j})\) is 1; otherwise, \(I({X}_{i},{Y}_{j})\) is 0. CI takes the average of I through all cells/spots in X, which is:$${CI}=\frac{1}{{N}_{X}}\mathop{\sum }_{{N}_{X}}^{i}I\left({X}_{i},\,{Y}_{j}\right),\,{and\; I}\left({X}_{i},\,{Y}_{j}\right)=\left\{\begin{array}{c}1,\,{C}_{{X}_{i}}={C}_{{Y}_{j}}\\ 0,\,{C}_{{X}_{i}}\ne {C}_{{Y}_{j}}\end{array}\right.,$$
(12)
where \({C}_{{X}_{i}}\) and \({C}_{{Y}_{j}}\) denote the cell type annotations of cells/spots \({X}_{i}\) and \({Y}_{j}\).Since in the tasks with simulated data, we manually rotate the source slice, the performance of the predicted rotation could be evaluated based on the ground-truth rotation. AAS calculates the gap between predicted and ground-truth rotation as:$${AAS}=360^{\circ} -\left|{{Angle}}_{{pred}}-{{Angle}}_{{gt}}\right|,$$
(13)
where a higher AAS corresponds to a better prediction.Compared methodsWe use PASTE (v1.4.0) for comparing the alignment results with its default hyperparameters (‘alpha’ = 0.1). In the comparison with PASTE2 under the stitching task, we use the default ‘alpha’ as 0.1 and set the ‘s’ as 0.3. Since SLAT and STAligner focus on one-to-one alignment, we use output one-to-one matching between two slices to calculate rotation and translation by SVD optimization via Eqs. 3, 4 and 5. For SLAT, we use the default ‘k_cutoff’ as 10 in the function ‘Cal_Spatial_Net’. For STAligner, we use the default ‘knn_neigh’ as 50 in the function ‘train_STAligner’. The benchmarking is conducted on a workstation with a 2 Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50 GHz (30,720 KB cache size; 24 cores in total) and 528 GB of memory. The GPUs are two Nvidia Quadro M6000 24 GB (48 GB in total). The operating system used is Ubuntu 18.04.RobustnessIn the evaluation of robustness, we select 3 different values of hyperparameters for learning rate, epoch, k and alpha (learning rate = 0.01, 0.05 and 0.1, epoch = 50, 100 and 200, k = 5, 10 and 15, alpha = 0.1, 0.3 and 0.5).Downstream analysis for breast cancerBy getting scFFPE-seq data as the reference, we deconvolve the Visium data from breast cancer by RCTD with its default parameters27,44. After alignment between Xenium and Visium slices, we also deconvolve the Visium data by the Xenium data. And for each spot in the Visium data, we take the spatial coordinates of each spot as the center and circle with the radius as 30 to include cells from the Xenium data. If no Xenium cells are inside the circle, we choose the 3-NN cells in the Xenium data. Then we can annotate the cell type of each Visium spot by the dominating cell type among the chosen cells from the Xenium data. To explore the cell-cell communication in the tumor microenvironment, we use COMMOT (v0.0.3) by choosing the reference dataset as CellChat and the species as human28,45.To predict the expression levels of the uncharacterized genes of the Xenium data outside the overlap region, we firstly pair each Xenium cell in the overlap area with its spatially closet Visium spot. Then, according to all of pairs between Xenium and Visium in the overlap region, we fit all gene expressions of Xenium cells with uncharacterized target gene’s expressions of the paired Visium spots by Poisson regression (sklearn.linear_model.PoissonRegressor)46. After fitting the model, we input all gene expressions of Xenium cells outside the overlap region to predict the expressions of the uncharacterized gene.3D-to-3D spatiotemporal alignment and cross-modality alignmentIn the 3D-to-3D spatiotemporal alignment, we choose 8 mouse embryonic slices from two time points E14.5 and E15.5, and align 4 slices in the same time point firstly to generate a 3D-level point cloud of cells. Because the extreme large-scale memory consuming with these millions of cells during spatiotemporal alignment, we bin these slices by 5 × 5 down-sampling.In the cross-modality alignment, the omics features from the two slices are gene expression profile and gene score matrix from transcriptome and epigenome, respectively. Thus, we use Harmony (max_iterations = 10) to unify two omics feature expressions with their PCA results (n_comp = 50) to the same hidden space for alignment further.Reporting summaryFurther information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Hot Topics

Related Articles