ST-GEARS: Advancing 3D downstream research through accurate spatial information recovery

FGW OT descriptionFused Gromov Wasserstein (FGW) Optimal Transport (OT) is the modeling of spot-wise or cell-wise similarity between two sections, with the purpose of solving optimal mappings between the spots or cells, with mappings also called ‘anchors’. By FGW OT, the optimal group of mappings enables highest gene expression similarity between mapped spots, at the same time keeping similar positions relative to their located sections.The required input of FGW OT includes genes expression, spot or cell locations before registration, and constraint values which assigns different weight to the optimization on different spots or cells. For gene expression, we introduce \({{\bf{A}}}\in {R}^{{n}_{A},m}\) for section A, to describe normalized count of unique molecular identifiers (UMIs) of different genes of each cell or spot, thereinto nA denotes number of spots in slice A, and m denotes number of genes that are captured in both sections. Similarly, we describe gene expression on section B as \({{\bf{B}}}\in {R}^{{n}_{B},m}\), with genes arranged in the same order as in A. For spot or cell locations, we introduce \({{{\bf{X}}}}_{{{\bf{A}}}}\in {R}^{{n}_{A},2}\) to describe spots locations of section A, with the 1st column storing horizontal coordinates and the 2nd storing vertical coordinates. Similarly, we have \({{{\bf{X}}}}_{{{\bf{B}}}}\in {R}^{{n}_{B},2}\) to describe spots locations in section B. Spots are arranged in the same order in gene expression and location matrices. Constraint values are discussed in section of Distributive Constraints.FGW OT solves:$${{\boldsymbol{\pi }}}= {argmi}{n}_{{{\boldsymbol{\pi }}}\in \varPi \left(a,b\right)}{{\langle }}(1-\alpha ){{{\bf{M}}}}_{{{\bf{AB}}}}^{{{\bf{2}}}}+\alpha {L}^{2}({{{\bf{C}}}}_{{{\bf{A}}}},{{{\bf{C}}}}_{{{\bf{B}}}})\otimes {{\boldsymbol{\pi }}},{{\boldsymbol{\pi }}}{{\rangle }}\\= {argmi}{n}_{{{\boldsymbol{\pi }}}\in \varPi (a,b)}((1-\alpha ){{\langle }}{{{\bf{M}}}}_{{{\bf{AB}}}}^{{{\bf{2}}}},\pi {{\rangle }}+\alpha {{\langle }}{L}^{2}({{{\bf{C}}}}_{{{\bf{A}}}},{{{\bf{C}}}}_{{{\bf{B}}}})\otimes {{\boldsymbol{\pi }}},{{\boldsymbol{\pi }}}{{\rangle }})$$$$s.t.{\sum}_{j}{{{\boldsymbol{\pi }}}}_{{{\bf{i}}},{{\bf{j}}}}={{{\bf{W}}}}_{{{\bf{i}}}}^{({{\bf{A}}})},{\sum}_{i}{{{\boldsymbol{\pi }}}}_{{{\bf{i}}},{{\bf{j}}}}={{{\bf{W}}}}_{{{\bf{j}}}}^{({{\bf{B}}})}$$
(1)
Thereinto, \({{{\bf{M}}}}_{{{\bf{AB}}}}\in {R}^{{n}_{A},{n}_{B}}\) describes the similarity of each pair of spots respectively on section A and B, formulated as \({{{\bf{M}}}}_{{{\bf{i}}},{{\bf{j}}}}^{({{\bf{AB}}})}={KL}({A}_{i,:},{B}_{j,:})\). Be noted that \({{{\rm{M}}}}_{{{\rm{i}}},{{\rm{j}}}}^{({{\rm{AB}}})}\) still indicates spot-wise similarity MAB, with section code AB being moved to superscript and added parenthesis for clarity, since subscript location are taken by spot index i, j. KL denotes Kullback-Leibler (KL) divergence43. \({{{\bf{C}}}}_{{{\bf{A}}}}\in {R}^{{n}_{A},{n}_{A}}\) describes spot-wise distance within section A, with \({{{\bf{C}}}}_{{{\bf{i}}}{{,}}{{\bf{j}}}}^{({{\bf{A}}})}={dis}({{{\bf{X}}}}_{{{\bf{i}}}{{,}}{{:}}}^{({{\bf{A}}})}{{,}}{{{\bf{X}}}}_{{{\bf{j}}},{{:}}}^{({{\bf{A}}})})\), and dis denoting Euclidean distance measure. Be noted that \({{{\rm{X}}}}_{{{\rm{i}}},:}^{({{\rm{A}}})}\) and \({{{\rm{X}}}}_{{{\rm{j}}},:}^{({{\rm{A}}})}\) still indicate spot locations XA, with section code A being moved to superscript and added parenthesis for clarity, since subscript location are taken by spot index i and j. \({{{\rm{C}}}}_{{{\rm{i}}},{{\rm{j}}}}^{({{\rm{A}}})}\) refers to spot-wise distance CA for the same reason. Similarly, \({{{\bf{C}}}}_{{{\bf{B}}}}\in {R}^{{n}_{B},{n}_{B}}\) describes spot-wise distance of section B. \({{\bf{L}}}\in {R}^{{n}_{A},{n}_{B},{n}_{A},{n}_{B}}\) defines the difference between all spot pair distance respectively on section A and B, with \({{{\rm{L}}}}_{{{\rm{i}}},{{\rm{j}}},{{\rm{k}}},{{\rm{l}}}}=|{{{\rm{C}}}}_{{{\rm{i}}},{{\rm{k}}}}^{({{\rm{A}}})}-{{{\rm{C}}}}_{{{\rm{j}}},{{\rm{l}}}}^{({{\rm{B}}})}|\). ⊗ denotes Kronecker product of two matrices; 〈,〉 denotes matrix multiplication.Adjacency matrix \({{\mathbf{\pi }}}\in {R}^{({n}_{A},{n}_{B})}\) to be optimized stores strength of anchors between spots from the two sections, with row index representing spots on section A, and column index representing spots on section B. Sum of elements of Ï€ is 1. With \({{\langle }}{{{\bf{M}}}}_{{{\bf{AB}}}}^{{{\bf{2}}}}{{,}}{{\mathbf{\pi }}}{{\rangle }}\), the similarity of mapped spots are measured. With \({{\langle }}{{{\bf{L}}}}^{{{\bf{2}}}}({{{\bf{C}}}}_{{{\bf{A}}}}{{,}}{{{\bf{C}}}}_{{{\bf{B}}}}){{\times}}{{\mathbf{\pi }}}{{,}}{{\mathbf{\pi }}}{{\rangle }}\), similarity between distance of spot pairs on section A, with its anchored spot pairs on section B, is measured. \(\langle {{{\bf{L}}}}^{{{\bf{2}}}}({{{\bf{C}}}}_{{{\bf{A}}}}{l{{,}}}{{{\bf{C}}}}_{{{\bf{B}}}})\otimes {{\mathbf{\pi }}}{{,}}{{\mathbf{\pi }}}\rangle\) describes similarity between spatial structures under the anchors’ connection. α ∈ [0,1] denotes regularization factor, which specifies the relative importance of structure similarity compared to expression similarity. WA and WB are constraint values that are introduced in section of Distributive Constraints.With the formulation above, FGW OT solves optimal anchors between the spots, or cells, which enables maximum weighted combination of gene expression similarity and position similarity of mapped spots or cells.Distributive constraintsAs adopted by constraint values in FGW OT, we introduce Distributive Constraints, to assign different emphasis to spots or cells in the optimization. Distributive Constraints utilizes cell type component information to differentiate the emphasis: if an annotation or cluster express high similarity across sections, its corresponding spots or cells will be placed relatively high sum of probability, and vice versa. With higher sum of probability, more anchors and anchors with higher strength are generated, while less anchors are produced on spots with lower sum of probability. This operation leads registration to rely more on expression-consistent regions of sections, hence largely enhancing both accuracy of anchors and precision of following rigid and elastic registration.The required inputs of Distributive Constraints include \({{{\bf{G}}}}_{{{\bf{A}}}}\in {R}^{{n}_{A}}\) and \({{{\bf{G}}}}_{{{\bf{B}}}}\in {R}^{{n}_{B}}\), which store the grouping information such as annotation type or cluster of each spot in section A and B. We then summarize the repeated annotations or clusters from GA and GB, and put the unique values in \({{\bf{g}}}\in {R}^{{n}_{{group}}}\). ngroup is the number of unique annotation type or clusters. Then implemented in ST-GEARS, for each annotation type or cluster gi, we calculate the average gene expression across spots:$$\begin{array}{c}{{\bf{av}}}{{{\bf{g}}}}_{{{\bf{A}}}}=\frac{1}{{{|}}{{{\bf{I}}}}_{{{\bf{A}}}}{{|}}}{{{\bf{1}}}}_{{{{\bf{n}}}}_{{{\bf{A}}}}}{{{\bf{A}}}}_{{{\bf{i}}}\in {{{\bf{I}}}}_{{{\bf{A}}}},:}\\ {{\bf{av}}}{{{\bf{g}}}}_{{{\bf{B}}}}=\frac{1}{{{|}}{{{\bf{I}}}}_{{{\bf{B}}}}{{|}}}{{{\bf{1}}}}_{{{{\boldsymbol{n}}}}_{{{\boldsymbol{B}}}}}{{{\boldsymbol{B}}}}_{{{\boldsymbol{i}}}\in {{{\boldsymbol{I}}}}_{{{\boldsymbol{B}}}},:}\end{array}$$where$$\begin{array}{c}{{{\bf{I}}}}_{{{\bf{A}}}}=\left\{{i}^{{\prime} }\in \{1,2,…,{n}_{A}\}|{{{\bf{G}}}}_{{{{\bf{i}}}}^{{\prime} }}^{({{\bf{A}}})}={{{\bf{g}}}}_{{{\bf{i}}}}\right\}\\ {{{\bf{I}}}}_{{{\bf{B}}}}=\left\{{i}^{{\prime} }\in \{1,2,…,{n}_{B}\}|{{{\bf{G}}}}_{{{{\bf{i}}}}^{{{{\prime} }}}}^{({{\bf{B}}})}={{{\bf{g}}}}_{{{\bf{i}}}}\right\}\end{array}$$Be noted that \({{{\bf{G}}}}_{{{{\bf{i}}}}^{{\prime} }}^{({{\bf{A}}})}\) and \({{{\bf{G}}}}_{{{{\bf{i}}}}^{{\prime} }}^{({{\bf{B}}})}\) still indicate grouping information GA and GB, with section code A and B being moved to superscript and added parenthesis for clarity, since subscript location are taken by spot index i′ and j′. And \({{\bf{1}}}_{{{{\bf{n}}}}_{{{\bf{A}}}}}\) and \({{\bf{1}}}_{{{{\bf{n}}}}_{{{\bf{B}}}}}\) are both row vectors of ones.With average gene expression of each annotation type or cluster, with the form of distribution, we measure its difference across sections by KL divergence. Then the calculated distance is mapped by logistic kernel, to further emphasize differences between relatively consistent annotations or clusters.$${dis}={KL}({av}{g}_{A},{av}{g}_{B})$$\({di}{s}_{{map}}={f}_{{logistic}}({dis})\), where \({f}_{{logistic}}\left(x\right)=\frac{1}{1+{e}^{-x}}-0.5\). Putting scaler value dis of each annotation or cluster together, we have a vector \({{\bf{DI}}}{{{\bf{S}}}}_{{{\bf{map}}}}\in {R}^{{n}_{{celltype}}}\). Finally, we transform the distance to similarity, map the similarity result back to each spot:$$\begin{array}{c}{{\rm{sim}}}=-1\times {{\rm{DI}}}{{{\rm{S}}}}_{{{\rm{map}}}}\\ {{{\rm{W}}}\_{{\rm{raw}}}}_{\{{{\rm{i}}}|{{{\rm{C}}}}_{{{\rm{i}}}}^{({{\rm{A}}})}={{{\rm{c}}}}_{{{\rm{i}}}}\}}^{({{\rm{A}}})}={{\rm{si}}}{{{\rm{m}}}}_{{{\rm{i}}}}\\ {{{\rm{W}}}\_{{\rm{raw}}}}_{\{{{\rm{i}}}|{{{\rm{C}}}}_{{{\rm{i}}}}^{({{\rm{B}}})}={{{\rm{c}}}}_{{{\rm{i}}}}\}}^{({{\rm{B}}})}={{\rm{si}}}{{{\rm{m}}}}_{{{\rm{i}}}}\end{array}$$We further apply normalization on the result:$$\begin{array}{c}{{{\bf{W}}}}_{{{\bf{A}}}}=\frac{1}{\varSigma {{\bf{W}}}{{\_}}{{\bf{ra}}}{{{\bf{w}}}}^{({{\bf{A}}})}}({{\bf{W}}}{{\_}}{{\bf{ra}}}{{{\bf{w}}}}^{({{\bf{A}}})}-\min ({{\bf{W}}}{{\_}}{{\bf{ra}}}{{{\bf{w}}}}^{({{\bf{A}}})})\times {{{\bf{1}}}}_{{{{\bf{n}}}}_{{{\bf{A}}}}})\\ {{{\bf{W}}}}_{{{\bf{B}}}}=\frac{1}{\Sigma {{\bf{W}}}{{\_}}{{\bf{ra}}}{{{\bf{w}}}}^{({{\bf{B}}})}}({{\bf{W}}}{{\_}}{{\bf{ra}}}{{{\bf{w}}}}^{({{\bf{B}}})}-\min ({{\bf{W}}}{{\_}}{{\bf{ra}}}{{{\bf{w}}}}^{({{\bf{B}}})})\times {{{\bf{1}}}}_{{{{\bf{n}}}}_{{{\bf{B}}}}})\end{array}$$WA and WB are constraints values applied in (1). Since the values are computed based on similarity measure using cell composition information, weight of FGW OT is automatically redistributed, with higher emphasis on more consistent regions across sections, and less emphasis on less consistent area. Enhanced anchor accuracy hence registration accuracy is then achieved.Self-adaptive regularizationIn FGW OT formulation, a regularization factor is included to specify the relative importance of structural similarity compared to expression similarity during optimization. ST-GEARS includes a self-adaptive regularization method that determines the factor value, that induces highest overall accuracy of anchors despite of varying situations. Situations include but are not limited to section distances, spot sizes, extent of distortions, and data quality such as level of diffusion.By practice, our method respectively adopts factors on multiple scales including 0.8, 0.4, 0.2, 0.1, 0.05, 0.025, 0.013, and 0.006. The candidate values vary exponentially, for ST-GEARS to find the optimal term regardless of scale differences between expression and structural term in (1). The accuracy of each set of optimized anchors by every regularization factor was evaluated, by measuring weighted percentage \({\sum}_{{{{\bf{G}}}}_{{{\bf{i}}}}^{{{(}}{{\bf{A}}}{{)}}}{{=}}{{{\bf{G}}}}_{{{\bf{j}}}}^{{{(}}{{\bf{B}}}{{)}}}}{{{\boldsymbol{\pi }}}}_{{{\bf{i}}}{{,}}{{\bf{j}}}}\) of anchors that join spots with same annotation types or clusters. Be noted that \({{{\rm{G}}}}_{{{\rm{i}}}}^{({{\rm{A}}})}\) and \({{{\rm{G}}}}_{{{\rm{j}}}}^{({{\rm{B}}})}\) still indicate grouping information GA and GB, respectively, with section code A and B being moved to superscript and added parenthesis for clarity, since subscript location are taken by spot index i and j. The regularization factor value that achieves highest accuracy is then adopted by our method.Elastic field inferenceFinding spots with highest probabilityAfter rigid registration, elastic fields are inferred based on the anchors with the highest probability for each spot or cell. For elastic field to be applied on each section, it is calculated using its anchors with closest sections, as well as spatial coordinates of sections after rigid registration. Along cross-sectioning order, each section in the middle has two closest sections, respectively on its anterior and posterior sides. Exceptionally, if a section is on anterior or posterior end, it has only one closest section.Specifically for a section in the middle with N spots, we calculate \({{{\bf{I}}}}_{{{\bf{pre}}}}\epsilon {Z}^{N}\) and \({{{\bf{I}}}}_{{{\bf{next}}}}\epsilon {Z}^{N}\) which stores the mapped spots on anterior and posterior neighbor section for each of its spots. The calculation takes as input adjacency matrix Ï€pre, which stores anchors with the anterior neighbor section output by FGW OT, and Ï€next storing anchors with posterior section.$$\begin{array}{c}{{{\bf{I}}}}_{{{\bf{pre}}}}^{{{\bf{n}}}}={argma}{x}_{i\epsilon \left\{0,\ldots,{N}_{{pre}}-1\right\}}{{{\boldsymbol{\pi }}}}_{{{:}}{{,}}{{\bf{n}}}}^{\left({{\bf{pre}}}\right)}\\ {{{\bf{I}}}}_{{{\bf{next}}}}^{{{\bf{n}}}}={argma}{x}_{j\epsilon \{0,…,{N}_{{next}}-1\}}{{{\boldsymbol{\pi }}}}_{{{\bf{n}}}{{,}}{{:}}}^{({{\bf{next}}})}\end{array}$$Be noted that \({{{\rm{\pi }}}}_{:,{{\rm{n}}}}^{\left({{\rm{pre}}}\right)}\) and\(\,{{{\rm{\pi }}}}_{{{\rm{n}}},:}^{({{\rm{next}}})}\) still indicate adjacency matrix Ï€pre and Ï€next, with direction code pre and next being moved to superscript and added parenthesis for clarity, since subscript location are taken by spot index n.Notably, not every spot in a selected section has its own anchored spot, due to multiple strategies including distributive constraint and anchors filtration, hence their corresponding element in Ipre and Inext are null. For section located on posterior end, only Inext is applicable; and for section located on anterior end, only \({{{\bf{I}}}}_{{{\rm{pre}}}}^{{{\rm{n}}}}\) is applicable.Elastic field establishmentAfter specifying spots with highest probability, ST-GEARS calculates location displacements between the spots, then establishes elastic fields for each section. An elastic field is a 2D displacement distribution, describing how displacement values are distributed across different locations. And it is established to enable ST-GEARS to benefit from further denoising functions to reduce elastic operation outliers and improve elastic effect consistency across regions.For each section located in the middle, 4 elastic fields are generated. Two of those represent the section’s horizontal and vertical displacement distribution compared to anterior neighbor section, denoted as 2D matrix F(x_pre) and F(y_pre), while the other two represent its horizontal and vertical displacement distribution compared to posterior neighbor, denoted as F(x_next) and F(y_next). To initialize F(x_pre), F(y_pre), F(x_next) and F(y_next) for the section, the shape of the matrix is first decided. Its height denoted by Height and width denoted by Width are calculated by gridding the spot locations using a fixed step. Height and Width are shared across the 4 matrices:$$\begin{array}{c}Height={{\lceil }}({ma}{x}_{i\epsilon \{0,…,N\}}{{{\bf{X}}}}_{{{\bf{i}}}{{,}}{{\bf{0}}}}-{mi}{n}_{i\epsilon \{0,…,N\}}{{{\bf{X}}}}_{{{\bf{i}}}{{,}}{{\bf{0}}}})/{psize}{{\rceil }}\\ {Width}={{\lceil }}({ma}{x}_{i\epsilon \{0,…,N\}}{{{\bf{X}}}}_{{{\bf{i}}}{{,}}{{\bf{1}}}}-{mi}{n}_{i\epsilon \{0,…,N\}}{{{\bf{X}}}}_{{{\bf{i}}}{{,}}{{\bf{1}}}})/{psize}{{\rceil }}\end{array}$$For its input, \({{\bf{X}}}\in {R}^{N,2}\) denotes spots location of current section after rigid registration. For a single section, we prepare \({{{\bf{X}}}}^{{{(}}{{\bf{pre}}}{{)}}}\epsilon {R}^{{N\_pre},2}\) and \({{{\bf{X}}}}^{{{(}}{{\bf{next}}}{{)}}}\epsilon {R}^{{N\_next},2}\) as spots location of its anterior and posterior section after rigid alignment, respectively. psize represents average distance between closest spot or cell centers, and it is to be input by users. The matrix has no filled values to this step.To fill in the fields, we first transform spot locations into the coordinate system of field. With \({{\bf{X}}}\_{{\bf{shifted}}}\,\epsilon {R}^{N,2}\) and \({{\bf{X}}}\_{{\bf{pixel}}}\,\epsilon {R}^{N,2}\):$$\begin{array}{c}{{{\bf{X}}}{{\_}}{{\bf{shifted}}}}_{{{\bf{i}}},:}={{{\bf{X}}}}_{{{\bf{i}}},:}-{[{mi}{n}_{i\epsilon \{0,…,N\}}{{{\bf{X}}}}_{{{\bf{i}}},{{\bf{0}}}},{mi}{n}_{i\epsilon \{0,…,N\}}{{{\bf{X}}}}_{{{\bf{i}}},{{\bf{1}}}}]}^{T}\\ {{{\bf{X}}}{{\_}}{{\bf{pixel}}}}_{{{\bf{i}}},{{\bf{j}}}}={{\lceil }}{{{\bf{X}}}{{\_}}{{\bf{shifted}}}}_{{{\bf{i}}},{{\bf{j}}}}/{psize}{{\rceil }}\end{array}$$We then calculate location displacements between each of its spots and their anchored spots with highest probability, on both anterior and posterior neighbors. With \({{\bf{X}}}\_{{\bf{corres}}}\,\epsilon {R}^{N,2}\) and \({{\bf{X}}}\_{{\bf{delta}}}\epsilon {R}^{N,2}\):$$\begin{array}{c}{{{\rm{X}}}\_{{\rm{corres}}}}_{{{\rm{n}}},:}^{({{\rm{pre}}})}={{{\rm{X}}}}_{{{{\rm{I}}}}_{{{\rm{pre}}}}^{{{\rm{n}}}},:}^{({{\rm{pre}}})}\\ {{{\rm{X}}}\_{{\rm{corres}}}}_{{{\rm{n}}},:}^{({{\rm{next}}})}={{{\rm{X}}}}_{{{{\rm{I}}}}_{{{\rm{next}}}}^{{{\rm{n}}}},:}^{({{\rm{next}}})}\\ \begin{array}{c}{{{\rm{X}}}\_{{\rm{delta}}}}_{{{\rm{n}}},:}^{\left({{\rm{pre}}}\right)}={{{\rm{X}}}\_{{\rm{corres}}}}_{{{\rm{n}}},:}^{({{\rm{pre}}})}{-{{\rm{X}}}}_{{{\rm{n}}},:}\\ {{{\rm{X}}}\_{{\rm{delta}}}}_{{{\rm{n}}},:}^{\left({{\rm{next}}}\right)}={{{\rm{X}}}\_{{\rm{corres}}}}_{{{\rm{n}}},:}^{({{\rm{next}}})}{-{{\rm{X}}}}_{{{\rm{n}}},:}\end{array}\end{array}$$With the spot locations in field coordinates and the displacement values above, we fill in corresponding elements of the elastic field:$$\begin{array}{c}{{{\rm{F}}}}^{({{\rm{x}}}\_{{\rm{pre}}})}[{{\rm{X}}}\_{{\rm{pixel}}}]={{{\rm{X}}}\_{{\rm{delta}}}}_{:,0}^{({{\rm{pre}}})}\\ {{{\rm{F}}}}^{({{\rm{y}}}\_{{\rm{pre}}})}[{{\rm{X}}}\_{{\rm{pixel}}}]={{{\rm{X}}}\_{{\rm{delta}}}}_{:,1}^{({{\rm{pre}}})}\\ {{{\rm{F}}}}^{({{\rm{x}}}\_{{\rm{next}}})}[{{\rm{X}}}\_{{\rm{pixel}}}]={{{\rm{X}}}\_{{\rm{delta}}}}_{:,0}^{({{\rm{next}}})}\end{array}$$$${{{\rm{F}}}}^{({{\rm{y}}}\_{{\rm{next}}})}[{{\rm{X}}}\_{{\rm{pixel}}}]={{{\rm{X}}}\_{{\rm{delta}}}}_{:,1}^{({{\rm{next}}})}$$
(2)
By the end of Eqs. (2), 4 elastic fields for each section in the middle is established. However, some elements in the matrix are still empty, because of absence of spots or cells located in the grid of location. To address this problem, 2d nearest interpolation method44 was adopted, which fills in every empty element, with the displacement value of its neighboring elements:$$\begin{array}{c}\begin{array}{c}{{{\bf{F}}}}^{({{\bf{x}}}{{\_}}{{\bf{pre}}})}={f}_{{interp}{{\_}}{grid}}({{\bf{X}}}{{\_}}{{\bf{pixel}}},{{{\bf{X}}}{{\_}}{{\bf{delta}}}}_{:,{{\bf{0}}}}^{\left({{\bf{pre}}}\right)},{{\bf{mes}}}{{{\bf{h}}}}_{{{\bf{trans}}}})\\ {{{\bf{F}}}}^{({{\bf{y}}}{{\_}}{{\bf{pre}}})}={f}_{{interp}{{\_}}{grid}}({{\bf{X}}}{{\_}}{{\bf{pixel}}},{{{\bf{X}}}{{\_}}{{\bf{delta}}}}_{:,{{\bf{1}}}}^{\left({{\bf{pre}}}\right)},{{\bf{mes}}}{{{\bf{h}}}}_{{{\bf{trans}}}})\end{array}\\ {{{\bf{F}}}}^{({{\bf{x}}}{{\_}}{{\bf{next}}})}={f}_{{interp}{{\_}}{grid}}({{\bf{X}}}{{\_}}{{\bf{pixel}}},{{{\bf{X}}}{{\_}}{{\bf{delta}}}}_{:,{{\bf{0}}}}^{({{\bf{next}}})},{{\bf{mes}}}{{{\bf{h}}}}_{{{\bf{trans}}}})\\ {{{\bf{F}}}}^{({{\bf{y}}}{{\_}}{{\bf{next}}})}={f}_{{interp}{{\_}}{grid}}({{\bf{X}}}{{\_}}{{\bf{pixel}}},{{{\bf{X}}}{{\_}}{{\bf{delta}}}}_{:,{{\bf{1}}}}^{({{\bf{next}}})},{{\bf{mes}}}{{{\bf{h}}}}_{{{\bf{trans}}}})\end{array}$$thereinto \({{\bf{mes}}}{{{\bf{h}}}}_{{{\bf{trans}}}}\epsilon {N}^{{n}_{{grids}}\times 2}\) denotes grid coordinates of the designed field, with \({n}_{{grids}}={Height}\times {Width}\). And finterp_grid denotes the nearest interpolation method.For section located on posterior end, only F(x_next) and F(y_next) are applicable; and for section located on anterior end, only F(x_pre) and F(y_pre) are applicable.2D Gaussian denoisingAs caused by exerted force, the displacement or elastic field is expected to have static or smoothly changing values across different locations45,46,47. ST-GEARS makes use of this property, to smoothen the field and to reduce errors in the field caused by any upper stream process, such as raw data noises and inaccuracy in anchor computation. Gaussian filtering48,49 is adopted to implement the denoising, similarly to image denoising processes50,51. Denoised elastic fields are then generated.It calculates weighted average across the neighboring region of each element to replace its value:$$\begin{array}{c}\begin{array}{c}{{{\bf{F}}}}^{({{\bf{x}}}\_{{\bf{pre}}})}={f}_{{gaussian}{{\_}}{filter}}({{{\bf{F}}}}^{({{\bf{x}}}\_{{\bf{pre}}})})\\ {{{\bf{F}}}}^{({{\bf{y}}}\_{{\bf{pre}}})}={f}_{{gaussian}{{\_}}{filter}}({{{\bf{F}}}}^{({{\bf{y}}}\_{{\bf{pre}}})})\end{array}\\ {{{\bf{F}}}}^{({{\bf{x}}}\_{{\bf{next}}})}={f}_{{gaussian}{{\_}}{filter}}({{{\bf{F}}}}^{({{\bf{x}}}\_{{\bf{next}}})})\\ {{{\bf{F}}}}^{({{\bf{y}}}\_{{\bf{next}}})}={f}_{{gaussian}{{\_}}{filter}}({{{\bf{F}}}}^{({{\bf{y}}}\_{{\bf{next}}})})\end{array}$$where fgaussian_filter denotes the method of Gaussian filtering.Bi-sectional fields applicationBi-sectional fields application planWith elastic fields generated and denoised, ST-GEARS uses the fields as a guidance to correct distortion for each section. Through querying the elastic fields with spatial location of each spot, the displacement to be implemented is returned. For a section in the middle, its elastic fields calculated with both anterior and posterior neighbor sections are queried, and guidance provided by both anterior and posterior sections are applied on the rigid aligned result, called ‘Bi-sectional Fields Application’. After the application, the distortion of the section is corrected, and the elastic registration result is generated.Specifically, the denoised elastic fields are first queried, returning the displacement to be implemented:$$\begin{array}{c}\begin{array}{c}X{{{\_}}{delta}}_{i,0}^{({pre}{{\_}}{final})}={F}_{{X}_{i,0}^{({pixel})}}^{(x{{\_}}{pre})}\\ {X{{\_}}{delta}}_{i,1}^{({pre}{{\_}}{final})}={F}_{{X}_{i,1}^{({pixel})}}^{(y{{\_}}{pre})}\end{array}\\ {X{{\_}}{delta}}_{i,0}^{({next}{{\_}}{final})}={F}_{{X}_{i,0}^{({pixel})}}^{(x{{\_}}{next})}\\ {X{{\_}}{delta}}_{i,1}^{({next}{{\_}}{final})}={F}_{{X}_{i,1}^{({pixel})}}^{(y{{\_}}{next})}\end{array}$$Next, average displacement returned by both anterior and posterior sections are applied on the rigid registration result, leading to final elastic registration result \({{\bf{X}}}\_{{\bf{final}}}\in {R}^{N,2}\!\!:\)$${{\rm{X}}}\_{{\rm{final}}}={{\rm{X}}}+\frac{1}{2}{{{\rm{X}}}\_{{\rm{delta}}}}^{({{\rm{pre}}}\_{{\rm{final}}})}+\frac{1}{2}{{{\rm{X}}}\_{{\rm{delta}}}}^{({{\rm{next}}}\_{{\rm{final}}})}$$For section located on posterior end,$${{\rm{X}}}\_{{\rm{final}}}={{\rm{X}}}+{{{\rm{X}}}\_{{\rm{delta}}}}^{({{\rm{pre}}}\_{{\rm{final}}})}$$For section located on anterior end,$${{\rm{X}}}\_{{\rm{final}}}={{\rm{X}}}+{{{\rm{X}}}\_{{\rm{delta}}}}^{({{\rm{next}}}\_{{\rm{final}}})}$$The validity of this plan is proved in the section: Proof of validity of Bi-sectional Fields Application.Proof of validity of Bi-sectional fields applicationBi-sectional Fields Application accurately recovers the spatial profile before distortion, by averaging and applying displacement value guided by both anterior and posterior neighbor section. The effect is approved mathematically as following:Take section A, B, and C as an example of a sequence of sections, with XA, XB and XC denoting their spots’ spatial information after rigid alignment, and XA_insitu, XB_insitu and XC_insitu denoting their in vivo spatial information. The distortion occurred to the slices during experiments are denoted as XA_dis, XB_dis and XC_dis.According to Bi-sectional Fields Application, the corrected spatial information is:$${{{\bf{X}}}}_{{{\bf{B}}}{{\_}}{{\bf{cor}}}}= {{{\bf{X}}}}_{{{\bf{B}}}}+\frac{1}{2}\left({{{\bf{X}}}}_{{{\bf{A}}}}-{{{\bf{X}}}}_{{{\bf{B}}}}\right)+\frac{1}{2}\left({{{\bf{X}}}}_{{{\bf{c}}}}-{{{\bf{X}}}}_{{{\bf{B}}}}\right)\\= \frac{1}{2}({{{\bf{X}}}}_{{{\bf{A}}}}+{{{\bf{X}}}}_{{{\bf{c}}}})$$Thereinto,$${{{\bf{X}}}}_{{{\bf{A}}}}={{{\bf{X}}}}_{{{\bf{A}}}{{\_}}{{\bf{insitu}}}}+{{{\bf{X}}}}_{{{\bf{A}}}{{\_}}{{\bf{dis}}}}$$$${{{\bf{X}}}}_{{{\bf{C}}}}={{{\bf{X}}}}_{{{\bf{C}}}{{\_}}{{\bf{insitu}}}}+{{{\bf{X}}}}_{{{\bf{C}}}{{\_}}{{\bf{dis}}}}$$Hence,$${{{\bf{X}}}}_{{{\bf{B}}}{{\_}}{{\bf{cor}}}}=\frac{1}{2}{{{\bf{X}}}}_{{{\bf{A}}}{{\_}}{{\bf{insitu}}}}+\frac{1}{2}{{{\bf{X}}}}_{{{\bf{C}}}{{\_}}{{\bf{insitu}}}}+\frac{1}{2}({{{\bf{X}}}}_{{{\bf{A}}}{{\_}}{{\bf{dis}}}}+{{{\bf{X}}}}_{{{\bf{C}}}{{\_}}{{\bf{dis}}}})$$
(3)
Based on the in vivo morphological consistency across sections, spatial information of section B can be approximated by an average of information of A and C, written as$${{{\bf{X}}}}_{{{\bf{B}}}{{\_}}{{\bf{insitu}}}}=\frac{1}{2}({{{\bf{X}}}}_{{{\bf{A}}}{{\_}}{{\bf{insitu}}}}+{{{\bf{X}}}}_{{{\bf{C}}}{{\_}}{{\bf{insitu}}}})$$
(4)
Given that XA_dis and XC_dis can be seen as independent and identically distributed sets of variables,$${{{\bf{X}}}}_{{{\bf{A}}}{{\_}}{{\bf{dis}}}}+{{{\bf{X}}}}_{{{\bf{C}}}{{\_}}{{\bf{dis}}}}=N({{{\boldsymbol{\mu }}}}_{{{\bf{ABC}}}},{{{\boldsymbol{\Sigma }}}}_{{{\bf{ABC}}}})$$
(5)
where μABC is the universal mean, and ΣABC is the variance of the 2d displacement information.Inserting the terms (4) and (5) back to Eq. (3) gives$${{{\bf{X}}}}_{{{\bf{B}}}{{\_}}{{\bf{cor}}}}= {{{\bf{X}}}}_{{{\bf{B}}}{{\_}}{{\bf{insitu}}}}+\,\frac{1}{2}N({{{\boldsymbol{\mu }}}}_{{{\bf{ABC}}}},{{{\boldsymbol{\Sigma }}}}_{{{\bf{ABC}}}})\\= {{{\bf{X}}}}_{{{{\bf{B}}}}_{{{\bf{insitu}}}}}+o\left({{{\bf{X}}}}_{{{{\bf{B}}}}_{{{\bf{insitu}}}}}\right)\\ \to {{{\bf{X}}}}_{{{\bf{B}}}{{\_}}{{\bf{insitu}}}}$$indicating the proximity of corrected spatial information to in vivo spatial information.Evaluation metrixWe evaluated the accuracy of anchors by index of Mapping Accuracy, and measured the reconstruction effect by MSSIM and SI-STD-DI, in both elastic effect study and overall methodology comparison.Mapping accuracyDesigned and adopted by PASTE27, Mapping Accuracy calculates the weighted percentage of anchors joining spots with same annotation.$${{\rm{Mapping}}}\; {{\rm{Accuracy}}}={\sum}_{i,j,{{\bf{l}}}({{\bf{i}}}){{=}}{{\bf{l}}}({{\bf{j}}})}{{{\boldsymbol{\pi }}}}_{{{\bf{ij}}}}$$MSSIM indexMSSIM measures the accuracy of registration, based on the assumption that in some sectioning positions, tissue morphology remains almost consistent across slices. The method quantifies the accuracy, by measuring the similarity of annotation type distribution of such section pairs.To implement the quantification, first, structurally consistent section pairs are selected among all sections arranged in sequence.Next, on each section from the pair, transformation from individual spots to a complete image is implemented, by gridding the rectangular area that surrounds the tissue, and assigning each grid of a value that represents the annotation type which occurs most frequently in the grid. The resulted image describes the annotation type distribution of the section.Finally, similarity between each pair of images is measured, by index of MSSIM52. The method generates a window with fixed size, slides the window simultaneously on both images, and compares the two framed parts by windows on their intensity, contrast, and structures. Among those, the intensity difference is measured by difference of average pixel values, the contrast difference is measured by comparing variance of the two sets of framed pixel values, and the structure difference is measured by comparing their covariances. A Structural Similarity of Images (SSIM) index is calculated for each position of the window using \({SSIM}(X,Y)=\frac{(2{\mu }_{x}{\mu }_{y})(2{\sigma }_{{xy}}+{c}_{2})}{({\mu }_{x}^{2}+{\mu }_{y}^{2}+{c}_{1})({\sigma }_{x}^{2}+{\sigma }_{y}^{2}+{c}_{2})}\), where μx and μy denote average pixel values of the frames, σx and σy denote variances of the frames, and σxy denotes covariances of the two frames. c1 and c2 are constants to avoid 0 value of the divisor. Averaging the SSIM value across all windows gives the final MSSIM result of the two sections.SI-STD-DISI-STD-DI measures smoothness of area changing across sections along a fixed axis, by calculating the standard deviation of area changes on each pair of adjacent sections and scale the result by dividing it by average area.$$ {{\rm{SI}}}-{{\rm{STD}}}-{{\rm{DI}}}={STD}(\{{s}_{i}-{s}_{i-1}:i\in [1,2,…,I-1]\})/\\ \left.|{mean}(\{{s}_{i}-{s}_{i-1}:i\in [1,2,…,I-1]\})|\right)$$Software and codeData analysisAll software used to analyze data in this study are open-sourced Python packages, including anndata = 0.9.2, numpy = 1.22.4, pandas = 1.4.3, scipy = 1.10.1, matplotlib = 3.5.2, k3d = 2.15.3.Statistics and reproducibilityNo statistical method was used to predetermine sample size. No data were excluded from the analyses. The experiments were not randomized. The Investigators were not blinded to allocation during experiments and outcome assessment.Reporting summaryFurther information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Hot Topics

Related Articles