Deciphering cell–cell communication at single-cell resolution for spatial transcriptomics with subgraph-based graph attention network

Preprocessing of datasetsSeveral preprocessing steps were performed for each dataset. Quality control of the scRNA-seq data was performed using Scanpy. Cells were filtered based on several criteria to ensure the quality of the downstream analysis. Specifically, cells with a mitochondrial gene expression percentage exceeding 20% were excluded, as a high mitochondrial content can indicate stressed or apoptotic cells. However, it’s important to note that the mitochondrial gene expression percentage threshold can be adjusted based on specific circumstances. Cells with an abnormally high total count or an excessively large number of expressed genes were also removed. The specific thresholds for total counts and the number of genes were carefully chosen based on the characteristics of each dataset to retain only the highest quality cells for further analysis. Additionally, Scanpy’s filter_cells and filter_genes functions were utilized to further purify the dataset by removing low-quality cells and genes. For both scRNA-seq and ST datasets, normalization of the expression matrix is essential. We employed the widely-used normalize_total function from the Scanpy package to perform this normalization. The normalization can be mathematically represented as:$${{{{\rm{X}}}}}_{{ij}}^{{\prime} }=\frac{{{{{\bf{X}}}}}_{{ij}}}{{\sum }_{j}{{{{\bf{X}}}}}_{{ij}}}*T,$$
(1)
where \({{{{\bf{X}}}}}_{{ij}}\) is the raw count of gene i in cell j, \({{{{\bf{X}}}}}_{{ij}}\hbox{`}\) is the normalized expression value, and T is the target sum of counts after normalization, ensuring consistency across all cells.During the integration of scRNA-seq and ST data, the selection of training genes is paramount. To enhance the discriminative power of ST data, we carefully curate a set of marker genes that are uniformly distributed across different cell types in the scRNA-seq data as training genes. Specifically, we employ the rank_genes_groups functionality in Scanpy to identify the most highly expressed genes within specific cell types. Following this, we create a non-redundant gene set and intersect it with the gene list in the ST dataset. Genes that record a zero count in either dataset are excluded from our training gene set, thus guaranteeing that only relevant and expressed genes are included. This procedure results in a refined gene set comprising ‘n’ genes, which will serve as the foundation for model training. For model validation, we adopt a leave-one-out validation strategy. Specifically, we sequentially designate each gene in the refined gene set as the test gene, while the remaining ‘n-1’ genes function as training genes. This training process is iterated ‘n’ times, omitting a different gene each time, to ensure that each gene receives an independent prediction evaluation.DeepTalk algorithmThe DeepTalk model has two components: integration of the scRNA-seq and ST datasets (DeepTalk-Integration) and prediction of the spatial CCCs (DeepTalk-CCC). The former focuses on determining the cell-type composition of the single-cell or spot-based ST datasets, whereas the latter is designed to predict the spatially mediated CCCs influenced by the L–R pairs in the spatial context.Integration of the scRNA-seq and ST datasetsAn attention-based GNN was used to integrate the scRNA-seq and ST datasets (Supplementary Fig. 20a). This network was tasked with generating matching descriptors, denoted as \({{{{\bf{f}}}}}_{i}\in {{\mathbb{R}}}^{D}\), where ‘D’ represents the dimensionality of the feature space, determining the length or number of components in the feature vector. These descriptors were created through feature communication among the initial features, which encompassed raw or minimally processed data such as transcriptomic information and cell positions. Initially, we developed a point encoder combining the transcriptomic data and cell positions for each cell, denoted as \(i\). By employing a multilayer perceptron (MLP), the cell positions were embedded into a high-dimensional vector, represented as:$${{{{\bf{x}}}}}_{i}^{(0)}=\,{{{{\bf{d}}}}}_{i}+{{\mbox{MLP}}}_{{{\rm{enc}}}}\left({{{{\bf{p}}}}}_{i}\right),$$
(2)
where \({{{{\bf{d}}}}}_{i}\) refers to the gene expression data obtained from the transcriptome and \({{{{\bf{p}}}}}_{i}\) represents the positional information of the cells. This point encoder allows the GNN to simultaneously leverage the \({{{{\bf{d}}}}}_{i}\) and \({{{{\bf{p}}}}}_{i}\) during subsequent inference stages.Subsequently, we created a graph integrating two omics with nodes representing the cells from both transcriptomics. Self-edges connected each cell \(i\) to all other cells within the same omics, whereas cross-edges linked cell \(i\) to all other cells with different omics. To propagate information effectively, message-passing equations were employed, enabling the diffusion of information along the self- and cross-edges. This approach resulted in a multi-GNN where each node started with a high-dimensional state. At each layer, the updated representations were computed by simultaneously aggregating messages from all edges of all the nodes within the graph.In the proposed framework, the intermediate representation of element \(i\) in the scRNA-seq A at layer \(l\) is denoted by \({{{{\bf{x}}}}}_{{Ai}}^{(l)}\). The message \({{{{\bf{m}}}}}_{\varepsilon \to i}\) represents the aggregation of information from all cells j, such that {\(j:(i,j)\in \varepsilon\)}, where \(\varepsilon \in \{{\varepsilon }_{{self}},\,{\varepsilon }_{{cross}}\}\). The residual message-passing update for all \(i\) in the scRNA-seq A can be expressed as follows:$${{{{\rm{x}}}}}_{{Ai}}^{(l+1)}={{{{\rm{x}}}}}_{{Ai}}^{(l)}+{{{\rm{MLP}}}}\left(\left[{{{{\rm{x}}}}}_{{Ai}}^{(l)}\big|\big|{{{{\rm{m}}}}}_{{\varepsilon} {\to} {i}}\right]\right)$$
(3)
where the concatenation operator \([\cdot{\mbox{||}}\cdot]\) is used for concatenation. A comparable update can be simultaneously applied to all the points in omics B. To create a hierarchical structure, a predetermined number of layers L with distinct parameters are linked together and the information is alternately aggregated along the self- and cross-edges; starting from \(l\,=\,1\), \(\varepsilon=\,{\varepsilon }_{{self}}\) when \(l\) is an odd number and \(\varepsilon={\varepsilon }_{{cross}}\) when \(l\) is an even number. This approach enabled iterative aggregation along different edges, thereby facilitating effective representation learning.The aggregation and computation of the message \({{{{\bf{m}}}}}_{\varepsilon \to i}\) were performed by an attention mechanism. The self- and cross-edges used self- and cross-attention, respectively. As in database retrieval, the query \({{{{\bf{q}}}}}_{i}\) was used to retrieve values \({{{{\bf{v}}}}}_{j}\) for specific elements based on their attributes, represented as keys \({{{{\bf{k}}}}}_{j}\). Thereafter, the message was computed by taking the weighted average of the retrieved values as follows:$${{{{\bf{m}}}}}_{\varepsilon \to i}={\sum }_{j:(i,j){\in }_{\varepsilon }}{a}_{{ij}}{{{{\bf{v}}}}}_{j},$$
(4)
The attention weight \({a}_{{ij}}\) is calculated as the softmax of the key-query similarities and is represented as \({a}_{{ij}}\,={\mbox{Softmax}}({{{{\bf{q}}}}}_{i}^{T}{{{{\bf{k}}}}}_{j})\). The key, query, and value are obtained by applying linear projections to the deep features of the GNN. Considering that the query point i belongs to the scRNA-seq dataset Q and all source points reside in the ST dataset S, this relationship can be expressed as \(\left(Q,S\right)\in {\{A,B\}}^{2}\).$${{{{\bf{q}}}}}_{i}={W}_{1}{{{{\bf{x}}}}}_{{Qi}}^{(l)}+{{{{\bf{b}}}}}_{{{{\bf{1}}}}}$$
(5)
$${{{{\bf{k}}}}_{j} \atop {{{\bf{v}}}}_{j}}={{W}_{2} \atop {W}_{3}}{{{\bf{x}}}}_{{Sj}}^{(l)}+{{{{\bf{b}}}}_{2} \atop {{{\bf{b}}}}_{3}},$$
(6)
The projection parameters are specific to each layer \(l\); these parameters are learned and shared across all points in both datasets. Multi-head attention was employed to enhance the expressiveness of the model, enabling the representation of both geometric transformations and assignments. The resulting matching descriptors were obtained via linear projections as follows:$${{{{\bf{f}}}}}_{i}^{A}=W\cdot {{{{\bf{x}}}}}_{{Ai}}^{\left(l\right)}+{{{\bf{b}}}},\; {{\forall}} _{{{l}}}\in A,$$
(7)
They were similarly obtained for points in B.Creating individual representations for every potential match in a matrix of size ncells × nspots is impractical. Therefore, an alternative approach was adopted by representing the pairwise scores as a similarity matrix \(M\in {{\mathbb{R}}}^{n{cells}\times {n}{spots}}\) capturing the similarity of the matching descriptors.$${M}_{i,j}= < {{{{\bf{f}}}}}_{i}^{A},{{{{\bf{f}}}}}_{j}^{B} > ,{\forall }\left(i,j\right)\in A\times B,$$
(8)
where\(\, < \cdot,\cdot > \) is the inner product. The matching descriptors are not normalized, and their magnitude may vary on a per-feature basis throughout the training process to reflect the confidence level of the predictions. To derive the mapping matrix, the following objective function was minimized with respect to M:$$\phi \left(\widetilde M\right)={\sum }_{k}^{{n}_{{genes}}}{\cos }_{{{{\rm{sim}}}}}(({{{{\bf{M}}}}}^{{{{\boldsymbol{T}}}}}{{{\bf{A}}}}{ * }_{,k}),{{{\bf{B}}}}{ * }_{,k})+{\sum }_{k}^{{n}_{{spots}}}{\cos }_{{{{\rm{sim}}}}}(({{{{{\bf{M}}}}}^{{{{\boldsymbol{T}}}}}{{{\bf{A}}}}}_{{{{\boldsymbol{j}}}}{{{\boldsymbol{,}}}}}{{{\boldsymbol{ * }}}}),{{{{\bf{B}}}}}_{j{{{\boldsymbol{,}}}}} * ),$$
(9)
where \({\cos }_{{{\rm{sim}}}}\) is the cosine similarity function and * indicates matrix slicing.Spatial CCC predictionDuring pretraining, we employed a random-walk strategy to generate pretraining subgraphs \({G}_{C}\) for each node in the graph. This process involved masking and predicting nodes within the random walks, effectively capturing the graph’s overall connectivity patterns (Supplementary Fig. 20b). For each node v in the graph during pretraining (and for each node-pair during fine-tuning), we generated individual subgraphs denoted as \({g}_{c}\in {G}_{c}\). Each subgraph \({g}_{c}\) was encoded as a set of nodes represented by \({g}_{c}\,=\,({v}_{1},\,{v}_{2},\; {\ldots },\; {v}_{\left|{V}_{c}\right|})\), where \(|{V}_{c}|\) denotes the number of nodes in \({g}_{c}\).We assigned a low-dimensional vector representation to each node vi in the subgraph. This representation was obtained by mapping the attributes (such as gene expression data or cellular phenotypes) and the structure-based embedding of vi using the function \({{f}}_{{{\rm{attr}}}}{\mbox{(.)}}\). The resulting stacked vector was denoted as \({{{{\bf{h}}}}}_{i}={W}_{e}{f}_{{{\rm{attr}}}}\left({{{{\bf{v}}}}}_{i}\right)\), where \({W}_{e}\) is a learnable embedding matrix. Collectively, the node embeddings within the subgraph \({g}_{c}\) were represented as \({H}_{c}\,=\,({{{{\bf{h}}}}}_{1},{{{{\bf{h}}}}}_{2},\ldots,{{{{\bf{h}}}}}_{|{V}_{c}|})\). This flexible representation approach allowed us to incorporate both node and relation attributes into low-dimensional embeddings. Alternatively, these embeddings could also be initialized using the output embeddings from other global feature generation methods that capture the multi-relational graph structure. Specifically, for the initialization of node features in our study, we merged pretrained node representation vectors from node2vec with gene expression data acquired from cells.For a subgraph \({g}_{c}\), where \({V}_{c}\) represents the set of nodes and \({H}_{c}\in {{\mathbb{R}}}^{d\times |{V}_{c}|}\) denotes their corresponding global input embeddings, the main objective of contextual learning is to transform these global embeddings to reflect the most representative roles of the nodes within the structure of \({{{{\rm{g}}}}}_{{{{\rm{c}}}}}\). This transformation was achieved via a series of layers, with the model having the flexibility to incorporate multiple layers to account for higher-order relationships within the graph. To capture higher-order relational dependencies between nodes—including indirect and multi-step interactions—we introduced a semantic association matrix, denoted as \(\bar{A}\), which acts as an asymmetric weight matrix. This asymmetry originated from the different influences that the two cells may have on each other within a subgraph. The weights of the matrix \({\bar{A}}^{k}\) were iteratively learned in each translation layer \(k\) by considering the connectivity between nodes through the local context subgraph \({g}_{c}\) and larger global graph G.In the k + 1 translation layer, the semantic association matrix \({\bar{A}}^{k}\in {{\mathbb{R}}}^{|{V}_{c}|\times |{V}_{c}|}\) was updated via the transformation operation, which involves performing message passing across all nodes within the subgraph \({{{{\rm{g}}}}}_{{{{\rm{c}}}}}\) and updating the node embeddings \({H}_{c}^{k} \,=({{{{\mathbf{h}}}}}_{1}^{k},{{{{\mathbf{h}}}}}_{2}^{k},\ldots,{{{{\mathbf{h}}}}}_{|{V}_{c}|}^{k})\) to \({H}_{c}^{k+1}\). The update process ensures that the embeddings capture the evolving representations of the nodes based on the contextual information derived from the message passing and relationship updates in the subgraph. Specifically, the updated node embeddings \({H}_{c}^{k+1}\) are computed as follows:$${H}_{c}^{K+1}=f_{{{{\rm{NN}}}}}\left(\right.{W}_{s}{H}^{k}_{{c}}{\bar{A}}^{k}+{H}_{c}^{K},$$
(10)
Here, \({f}_{{\mbox{NN}}}\) represents a non-linear function, and \({W}_{s}\in {{\mathbb{R}}}^{d\times d}\) is a learnable transformation matrix.The non-linear function and the transformation matrix were used to compute the corresponding entry \({\bar{A}}_{{ij}}^{k}\) in the semantic association matrix. To retain contextual embeddings from the previous step, we incorporate a residual connection. This ensures that global relations are preserved by passing the original global embeddings through the layers. For the two nodes \({{{{\bf{v}}}}}_{i}\) and \({{{{\bf{v}}}}}_{j}\) within the subgraph \({g}_{c}\), the calculation of \({\bar{A}}_{{ij}}^{k}\) utilizes a multihead attention mechanism with \({N}_{h}\) heads, allowing us to capture the relational dependencies within different subspaces. For each head, \({\bar{A}}_{{ij}}^{k}\) was computed as follows:$${A}_{i,j}^{-k}=\frac{\exp \left(\left({W}_{1}{h}_{i}^{k}\right)\left({W}_{2}{h}_{j}^{k}\right)\right)}{{\sum }_{t=1}^{{{|}}{V}_{c}{{{\rm{|}}}}}\exp \left({\left({W}_{1}{h}_{i}^{k}\right)}^{T}\left({W}_{2}{h}_{t}^{k}\right)\right)},$$
(11)
where the transformation matrices \({W}_{1}\) and \({W}_{2}\) are learnable parameters. By applying multiple translation layers, multiple embeddings were generated for each node within the subgraph. Considering the various embeddings in downstream tasks, the node embeddings learned from different layers \({\{{{{{\bf{h}}}}}_{i}^{k}\}}_{k=1,\ldots,K}\) were embedded into the contextual embedding \({\widetilde{{{{\bf{h}}}}}}_{i}\) for each node. This aggregation was performed as follows:$${\widetilde{{{{\bf{h}}}}}}_{i}={{{{\bf{h}}}}}_{i}^{1}\oplus {{{{\bf{h}}}}}_{i}^{2} \oplus \cdot \cdot \cdot \oplus {{{{\bf{h}}}}}_{i}^{K},$$
(12)
After obtaining the embedding vectors \({\{{\widetilde{{{{\bf{h}}}}}}_{i}\}}_{i=1{\mbox{,}}2{\mbox{,…,}}{|}{V}_{c}|}\) for the nodes within \({g}_{c}\), these embeddings can be used as inputs for prediction tasks. During pretraining, a linear projection function was applied to the embeddings to predict the probabilities of the masked nodes. In the fine-tuning step, we utilized a single-layer feed-forward network with a softmax activation function for binary link prediction, facilitating predictions regarding the presence or absence of links between nodes.Pretraining in the proposed model involves training a self-supervised node-prediction task. For each node in G, a node \({g}_{c}\) with a diameter (the maximum shortest distance between any pair of nodes) was created using the aforementioned generation methods. Subsequently, a single node within the subgraph was randomly masked for prediction without altering the graph structure. Therefore, pretraining was accomplished by maximizing the probability of correctly predicting the masked node \({{{{\bf{v}}}}}_{m}\) based on the given context \({g}_{c}\). The probability was computed in the following form:$$\theta={{{{\rm{argmax}}}}}_{\theta }{\varPi }_{{g}_{c}{\in }_{{G}_{C}}}{\varPi }_{{{{{\rm{v}}}}}_{m}{\in }_{{g}_{c}}}p\left({{{{\bf{v}}}}}_{m} | {g}_{c},\theta \right),$$
(13)
where \(\theta\) represents the set of model parameters.To fine-tune the model further, we focused on a contextualized link prediction task. Multiple fine-grained contexts were generated for each node pair considered for link prediction. During this stage, the model was trained to maximize the probability of observing a positive edge (\({e}_{p}\)) given its corresponding context (\({g}_{{cp}}\)). Simultaneously, the model learned to assign low probabilities to the negatively sampled edges (\({e}_{n}\)) and their associated contexts (\({g}_{{cn}}\)). The overall objective was constructed by summing over two subsets of training data: positive edges (\({D}_{p}\)) and negative edges (\({D}_{n}\)). By optimizing this objective, the model improved its ability to accurately predict the positive and negative edges.$$L={{{{\boldsymbol{\Sigma }}}}}_{({e}_{p,}{g}_{{cp}}){\in }_{{D}_{p}}}\log \left({{{\rm{P}}}}\left({{{{\rm{e}}}}}_{p} | {g}_{{cp}},\theta \right)\right)+{{{{\boldsymbol{\Sigma }}}}}_{({e}_{n,}{g}_{{cn}}){\in }_{{D}_{n}}}\log \left(1-{{{\rm{P}}}}\left({e}_{n} | {g}_{{cn}},\theta \right)\right),$$
(14)
The probability of an edge between two nodes, denoted by \(e=({{{{\bf{v}}}}}_{i},\,{{{{\bf{v}}}}}_{j})\), was calculated using the similarity score \({\mbox{S}}({{{{\bf{v}}}}}_{i},\,{{{{\bf{v}}}}}_{j})\), which can be mathematically expressed as \({{\mbox{S}}}({{{{\bf{v}}}}}_{i},\,{{{{\bf{v}}}}}_{j})\,=\sigma ({\widetilde{{{{\bf{h}}}}}}_{i}^{T}\cdot {\widetilde{{{{\bf{h}}}}}}_{j})\), where\(\,{\widetilde{{{{\bf{h}}}}}}_{i}\) and \({\widetilde{{{{\bf{h}}}}}}_{j}\) are embeddings of \({{{{\bf{v}}}}}_{i}\) and \({{{{\bf{v}}}}}_{j}\), respectively. \(\sigma (\cdot)\) represents sigmoid function. The probability of an edge between two nodes was thereby calculated based on the similarity of their embeddings.Definition of cell type for the ST datasetTo analyze the single-cell ST dataset, the cell type with the highest coefficient was assigned to each individual cell type. For the 10X Visium fluorescence dataset used in this study, squidpy.im.segment () was used to segment the tissue image. For the ST dataset, the maximum cell number (\({N}_{{cell}}\)) was defined for each spot, and \({N}_{{cell}}\) was set to 20 based on a recent review79. To determine the optimal combination (ω) of cells for each spot, the following function was used:$${{{{\boldsymbol{\omega }}}}}_{i}\left(i\in 1,2,\ldots,k\right)=\left\{\begin{array}{cc}{[{N}_{{cell}}{\beta }_{i}]+1} & {\left(\right.\{{N}_{{cell}}{\beta }_{i}\} \ge 0.5} \\ {[{N}_{{cell}}{\beta }_{i}]} \hfill & {\left\{{N}_{{cell}}{\beta }_{i}\right\} < 0.5}\end{array},\right.$$
(15)
where \(\left[{N}_{{\mbox{cell}}}{\beta }_{i}\right]\) and \(\{{N}_{{\mbox{cell}}}{\beta }_{i}\}\) represent the integer and fractional parts of \({N}_{{cell}}{\beta }_{i}\), respectively. Thereafter, a subset of cells (\(n={\sum }_{i=1}^{k}{{{{\boldsymbol{\omega }}}}}_{i}\)) was randomly selected from the total cell population (S) and the merged expression profile (\({{{\boldsymbol{\epsilon }}}}\)) of the cell was compared with the ground truth using the following function:$${{\arg }}\min \left\{n < {N}_{{{\rm{cell}}}}\right\}{\sum }_{i=1}^{n}\left({Y}_{i}-{\sum }_{j=1}^{m}{{{\boldsymbol{\epsilon }}}}_{i}^{\; j}\right)^{2},$$
(16)
To assign the coordinates \(({x}^{\prime}\,,\,{y}^{\prime})\) to each sampled cell, we introduced a probabilistic distribution based on the ratio (R) of the same cell type in neighboring spots, allowing us to determine the probability of locating a cell in a specific region within a spot. The distribution was calculated using the following equation:$${x}^{\prime}={x}_{0}+\frac{{{ad}}_{\min }\cos \left(\frac{\theta \pi }{180}\right)}{2},$$
(17)
$${y}^{\prime}={y}_{0}+{{ad}}_{\min }\sin \left(\frac{\theta \pi }{180}\right)/2,$$
(18)
where \({d}_{\min }\) represents the minimum spatial distance to the closest neighbor spot and \(\alpha \in ({{\mathrm{0,1}}}]\) and \(\theta \in (0,{\mbox{}}360]\) represent the weight for dmin and angle toward the spot center (\({x}_{0}\), \({y}_{0}\)), respectively. Notably, \(\theta\) is determined by the following probability equation:$$\widetilde{P}\left(\theta \right)=\frac{{R}_{q}+1}{{\sum }_{i}^{Q}({R}_{i}+1)},\theta \in \left(90q-90,90q\right],$$
(19)
where \(q\) is the \({{{\rm{qth}}}}\) neighbor spot among Q spots. Practically, Q was set to 4, dividing the space around the spot into four quadrants and filtering the nearest neighbor in each quadrant. After determining \(\theta\), the corresponding neighbor spot (\({x}^{\prime}\,,\,{y}^{\prime}\)) was selected to calculate the probabilistic distribution of α using the following equation:$$\widehat{P}\left(\alpha\right)=\left\{\begin{array}{cc}\frac{{R}_{x0,\, y0}+1}{{R}_{x0,\, y0}+{R}_{{x}^{\prime},\, {y}^{\prime}}+2},& \alpha \in \left(0,0.5\right] \\ \frac{{R}_{x0,\, y0}+1}{{R}_{x0,\, y0}+{R}_{{x}^{\prime},\, {y}^{\prime}}+2},\, & \alpha \in \left(0.5,1\right]\end{array}\right.,$$
(20)
where \({R}_{{x}_{0},{y}_{0}}\) and \({R}_{{x}^{\prime},{y}^{\prime}}\) represent the ratios of the given cell type in each spot to its neighboring spot, respectively. These optimal cellular combinations were integrated for all spots to reconstruct the single-cell resolution ST dataset for the spot-based ST dataset.Definition of the CCC scoreTo generate the cell–cell distance matrix D, the spatial coordinates of individual cells were used to calculate their Euclidean distances. However, to focus on nearby secretion and paracrine signaling within a specific range, we only considered the cells that were 200 μm apart79. Subsequently, the K-nearest neighbors (KNN) algorithm was applied to select the K closest cells from the distance matrix D, aiding the construction of a cell graph network by establishing connections between the selected cells. The receptor was used as the query node to ensure the biological relevance of the identified CCCs. A random walk algorithm was employed to filter and score the downstream-activated transcription factors (TFs). Thus, the TFs activated in response to a queried receptor were identified; consequently, we considered only cells with activated TFs as receptor cells. This approach provides a more accurate representation of the intercellular information transfer and communication as it reflects the actual cellular response to signaling events.To ascertain the co-expression of a specific ligand-receptor pair between the sender cells (of type A) expressing a given ligand and the receiver cells (of type B) expressing the corresponding receptor in the cell graph network, we computed the number of cell–cell pairs (\({C}_{{A}_{i},{B}_{j}}^{0}\)) exhibiting this ligand-receptor interaction. This involved identifying the direct neighboring nodes (1-hop away) of the sender cells expressing ligand i and the receiver cells expressing receptor j. For each ligand-receptor interaction between cell types A and B, there may be distinct cell–cell pair counts.A permutation test was employed to gauge the significance of these observed counts. This entailed randomly reassigning cell labels and recomputing the ligand-receptor interaction counts. This procedure was iterated (Z) times, generating a background distribution \(C\,=\,{C}_{{A}_{i},{B}_{j}}^{1},\,{C}_{{A}_{i},{B}_{j}}^{2},\,{\ldots,\,C}_{{A}_{i},{B}_{j}}^{Z}\). The P-value was then determined by juxtaposing the observed cell–cell pair counts for the specified ligand-receptor interactions against this background distribution.Mathematically, the P-value was computed as follows:$${P}_{{A}_{i}{B}_{j}}=\frac{{{{\rm{card}}}}\left\{x\in C | x\ge {C}_{{A}_{i,}{B}_{j}}^{0}\right\}}{Z},$$
(21)
P-values less than 0.05 were considered statistically significant and were used to calculate the CCC score of the ligand-receptor interaction from senders to receivers. This score was computed as \({S}_{{A}_{i},{B}_{j}}\,=\,\sqrt{{L}_{{A}_{i}}\times {R}_{{B}_{j}}}\), where \({L}_{{A}_{i}}\) is the gene expression of the ligand \(L\) in cell \(i\) of cell type A and \({R}_{{B}_{j}}\) is the gene expression of the receptor \(R\) in cell \(j\) of cell type B.Benchmark metricsBenchmark metrics for integration methodsFive metrics were used to evaluate the integration methods, one of the metrics being the Pearson correlation coefficient (PCC), which is calculated using the following equation:$${{{\rm{PCC}}}}=\frac{E[\left({\widetilde{{{{\rm{x}}}}}}_{i}-{\widetilde{u}}_{i}\right)\left({{{{\bf{x}}}}}_{i}{{{\boldsymbol{-}}}}{u}_{i}\right)]}{{\widetilde{\sigma }}_{i}{\sigma }_{i}},$$
(22)
where \({{{{\bf{x}}}}}_{i}\) and \({\widetilde{{{{\bf{x}}}}}}_{i}\) represent the spatial expression vectors of the i-th gene in the ground truth and the predicted results, respectively. Similarly, \({u}_{i}\) and \({\widetilde{u}}_{i}\) correspond to the average expression value of the i-th gene in the ground truth and the predicted result, respectively, and \({\sigma }_{i}\) and \({\widetilde{\sigma }}_{i}\) represent the standard deviation of the spatial expression of the i-th gene in the ground truth and the predicted result, respectively. While evaluating a specific gene, a higher PCC value indicates a higher prediction accuracy for that gene. The PCC value measures the degree of linear association between the ground truth and predicted results for a particular gene.The evaluation of the integration methods also used the structural similarity index (SSIM). To prepare the data for SSIM calculation, the expression matrix was scaled by adjusting the expression values of each gene to lie in the range of 0–1 as follows:$${{{{\rm{x}}}}}_{{ij}}^{{\prime} }=\frac{{{{{\rm{x}}}}}_{{ij}}}{\max (\{{{{{\rm{x}}}}}_{i{\it{1}}},\ldots,{{{{\rm{x}}}}}_{{iM}}\})},$$
(23)
where \({{{{\rm{x}}}}}_{{{{\rm{ij}}}}}\) denotes the expression of the i-th gene in the j-th spot, and M is the total number of spots. Normalizing the expression values facilitated a consistent and comparable evaluation of integration methods using the SSIM metric. After scaling the gene expression values between 0 and 1, the SSIM value for each gene was calculated using the following equation:$${{{\rm{SSIM}}}}=\frac{\left({2\widetilde{u}}_{i}{u}_{i}+{C}_{1}^{2}\right)\left(2{{\mathrm{cov}}}\left({{{{\bf{x}}}}}_{{{{\boldsymbol{i}}}}}^{\prime},{\widetilde{{{{\bf{x}}}}}}_{i}\right)+{C}_{2}^{2}\right)}{\left({\widetilde{u}}_{{{{\rm{i}}}}}^{2}+{u}_{i}^{2}+{C}_{1}^{2}\right)\left({\widetilde{\sigma }}_{1}^{2}+{\sigma }_{{{{\rm{i}}}}}^{2}+{C}_{2}^{2}\right)},$$
(24)
To calculate the SSIM value for each gene, we utilized the same definitions of \({u}_{i}\), \({\widetilde{u}}_{i}\),\(\,{\sigma }_{i}\) and \({\widetilde{\sigma }}_{i}\) as in the calculation of the PCC value, but for the scaled gene expression. Additionally, C1 and C2 were introduced as constant values and set to 0.01 and 0.03, respectively. The term \({{\mathrm{cov}}}\left({{{{\bf{x}}}}}_{i},{\mbox{}}{\widetilde{{{{\bf{x}}}}}}_{i}\right)\) represents the covariance between the expression vectors of the i-th gene in the ground truth (\({{{{\bf{x}}}}}_{i}\hbox{`}\)) and the predicted result (\({\widetilde{{{{\bf{x}}}}}}_{i}\hbox{`}\)). Similar to the PCC value, a higher SSIM value indicates a higher level of prediction accuracy for a given gene.The z-score for the spatial expression of each gene was calculated for all spots. The root mean square error (RMSE) was computed as follows:$${{\rm{RMSE}}}=\sqrt{{\frac{1}{M}}{\sum }_{{{\rm{j}}}=1}^{{{\rm{M}}}}({\widetilde{{{\bf{z}}}}}_{{ij}}{{\boldsymbol{-}}}{{{{\bf{z}}}}_{{ij}}})^{2}},$$
(25)
where \({{{{\bf{z}}}}}_{{ij}}\) and \({\widetilde{{{{\bf{z}}}}}}_{{ij}}\) are the z-scores of the spatial expression of the i-th gene in the j-th spot in the ground truth and predicted results, respectively. For a given gene, a lower RMSE value indicates a higher level of prediction accuracy.The Jensen–Shannon divergence (JSD) uses the relative information entropy, particularly the Kullback–Leibler divergence, to quantify the difference between the two distributions. To calculate the spatial distribution probability of each gene, the following steps were performed:$${{{{\bf{P}}}}}_{{ij}}=\frac{{{{{\bf{x}}}}}_{{ij}}}{{\sum }_{j=1}^{M}{{{{\bf{x}}}}}_{{ij}}},$$
(26)
To calculate the spatial distribution probability of each gene, we assign \({{{{\bf{x}}}}}_{{ij}}\) as the expression value of the i-th gene in the j-th spot, where M is the total number of spots and \({{{{\bf{P}}}}}_{{ij}}\) is the distribution probability of the i-th gene in the j-th spot. After calculating the spatial distribution probability, the JSD value for each gene was evaluated using the following equations:$${{{\rm{JSD}}}}=\frac{1}{2}{{{\rm{KL}}}}\left({\widetilde{{{{\bf{P}}}}}}_{i}\left|\frac{{\widetilde{{{{\bf{P}}}}}}_{i}+{{{{\bf{P}}}}}_{{{{\boldsymbol{i}}}}}}{2}\right.\right)+\frac{1}{2}{{{\rm{KL}}}}\left({{{{\bf{P}}}}}_{{{{\boldsymbol{i}}}}}\left|\frac{{\widetilde{{{{\bf{P}}}}}}_{i}+{{{{\bf{P}}}}}_{{{{\boldsymbol{i}}}}}}{2}\right.\right),$$
(27)
$${{{\rm{KL}}}}\left({{{{\bf{a}}}}}_{i}{{{\boldsymbol{||}}}}{{{{\bf{b}}}}}_{i}{{{\boldsymbol{=}}}}{\sum }_{j=0}^{M}{{{\boldsymbol{(}}}}{{{{\boldsymbol{a}}}}}_{{ij}}\times \log \frac{{{{{\bf{a}}}}}_{{{{\boldsymbol{ij}}}}}}{{{{{\boldsymbol{b}}}}}_{{{{\boldsymbol{ij}}}}}}\right),$$
(28)
where \({{{{\bf{P}}}}}_{{{{\bf{i}}}}}\) and \({\widetilde{{{{\bf{P}}}}}}_{i}\) represent the spatial distribution probability vectors of the i-th gene in the ground truth and predicted result, respectively; \({\mbox{KL}}({{{{\bf{a}}}}}_{i}{{{\boldsymbol{||}}}}{{{{\bf{b}}}}}_{i})\) denotes the Kullback–Leibler divergence between the two probability distributions \({{{{\bf{a}}}}}_{i}\) and \({{{{\bf{b}}}}}_{i}\); \({{{{\bf{a}}}}}_{{ij}}\) and \({{{{\bf{b}}}}}_{{ij}}\) represent the predicted and real probabilities of the i-th gene in the j-th spot, respectively. For a given gene, a lower JSD value indicates a higher level of prediction accuracy.To evaluate the relative accuracy of the integration methods for each dataset, an accuracy score (AS) was defined by combining the PCC, SSIM, RMSE, and JSD metrics. For a given dataset, the average PCC, SSIM, RMSE, and JSD values were calculated for all the genes predicted by each integration method. Subsequently, the PCC and SSIM values of the integration methods were sorted in ascending order to obtain \({{{\rm{RANK}}}}_{{{\rm{PCC}}}}\) and \({{{\rm{RANK}}}}_{{{\rm{SSIM}}}}\), respectively. The integration method with the highest PCC/SSIM value had \({{{\rm{RANK}}}}_{{{\rm{PCC}}}/{{\rm{SSIM}}}}\) equal to N, whereas the method with the lowest PCC/SSIM value had the \({{{\rm{RANK}}}}_{{{\rm{PCC}}}/{{\rm{SSIM}}}}\) value of 1. Similarly, the RMSE and JSD values of the integration methods were sorted in the descending order to obtain \({{{\rm{RANK}}}}_{{{\rm{RMSE}}}}\) and \({{{\rm{RANK}}}}_{{{\rm{JSD}}}}\), respectively. The integration method with the highest RMSE/JSD value had \({{{\rm{RANK}}}}_{{{\rm{RMSE}}}/{{\rm{JSD}}}}\) = 1, whereas the method with the lowest RMSE/JSD value had \({{{\rm{RANK}}}}_{{{\rm{RMSE}}}/{{\rm{JSD}}}}\) = N. Finally, the average values of \({{{\rm{RANK}}}}_{{{\rm{PCC}}}}\), \({{{\rm{RANK}}}}_{{{\rm{SSIM}}}}\), \({{{\rm{RANK}}}}_{{{\rm{RMSE}}}}\), and \({{{\rm{RANK}}}}_{{{\rm{JSD}}}}\) were determined to obtain the AS value for each integration method as follows:$${{{\rm{AS}}}}=\frac{1}{4}\left({{{{\rm{RANK}}}}}_{{{{\rm{PCC}}}}}+{{{{\rm{RANK}}}}}_{{{{\rm{SSIM}}}}}+{{{{\rm{RANK}}}}}_{{{{\rm{RMSE}}}}}+{{{{\rm{RANK}}}}}_{{{{\rm{JSD}}}}}\right),$$
(29)
The method with the highest AS value exhibited the best performance among the integration methods.Benchmark metrics for the CCC prediction methodThe Wasserstein distance concept was introduced as a metric to assess the spatial communication tendency in a specific ligand-receptor (L-R) pair. Here, L and R represent the gene expression distributions of the ligand and receptor, respectively. For brevity, we refer to the actual Wasserstein distances between these distributions as \({d}_{{{\rm{real}}}}\). To establish a comparative baseline, we constructed random gene expression distributions, \({L}_{r}\) and \({R}_{r}\), by permuting the coordinates of each data point in L and R. By repeatedly permuting (1000 times in our case) and calculating the Wasserstein distance between \({L}_{r}\) and \({R}_{r}\), denoted as dsimulation, we obtained a set of dsimulation values. Subsequently, the spatial communication tendency was quantified by computing the ratio of \({d}_{{{\rm{real}}}}\) to the mean of the dsimulation set, referred to as \({d}_{{{\rm{ratio}}}}\). This ratio serves as a measure of the spatial communication tendency specific to the L-R pair under consideration.$${d}_{{{{\rm{ratio}}}}}=\frac{{d}_{{{{\rm{real}}}}}}{{\sum }_{{{{\rm{i}}}}=1}^{n}{d}_{s{{{\rm{imulation}}}_{i}}}/{n}^{{\prime} }}$$
(30)
By increasing the number of permutations (n), we constructed a null distribution of \({d}_{{{\rm{real}}}}\) using the dsimulation set. This null distribution was then utilized in a one-sided permutation test to derive a P-value, indicating the significance of the observed spatial communication tendency. Additionally, left- and right-sided P-values were calculated to distinguish between short- and long-range communications. To quantify the consistency between expected and observed spatial distance tendencies, we employed the DES metric, where a higher value signifies better consistency69. Based on their \({d}_{{{\rm{ratio}}}}\) and P-values, short- and long-range communications were ranked to form expected communication lists, \({L}_{s}\) and \({L}_{l}\), respectively. Subsequently, we extracted communications from the CCC tool’s results and created observed communication lists, S, for each cell type pair. These lists were denoted as \({S}_{n}\) and \({S}_{f}\), for nearby and distant cell type pairs, respectively. To compute the DES for a particular cell type pair, we considered weighted P-value proportions (\({P}_{{{\rm{match}}}}\) and \({P}_{{{\rm{unmatch}}}}\)) while iterating through the expected communication list. The presence or absence of communication in the observed list determined the addition or subtraction of the corresponding weights, respectively. This approach allowed us to assess the consistency between expected and observed communications for a given cell type pair. A similar methodology was applied to compute the DES for distant cell type pairs. The \({P}_{{{\rm{match}}}}\) and \({P}_{{{\rm{unmatch}}}}\) values for the j-th interaction in \({L}_{s}\) are defined as follows:$${P}_{{{{\rm{match}}}}}\left({S}_{n},j\right)={\sum }_{{{{lr}}_{j}\in {S}_{n}} \atop {j\le i}}\frac{1-{{{{\rm{Pvalue}}}}}_{{{{\rm{j}}}}}}{{\Sigma }_{{{lr}}_{j}\in s_n}(1-{{{{\rm{Pvalue}}}}}_{{{{\rm{j}}}}})},$$
(31)
where n represents the total number of matched interactions between \({S}_{n}\) and \({L}_{s}\). The DES represents the maximum deviation of (\({P}_{{{\rm{match}}}}-{P}_{{{\rm{unmatch}}}}\)) from 0, providing a quantitative measure of consistency between expected and observed spatial interaction tendencies.Comparison with other methodsTo compare the predictive performance of DeepTalk with that of other methods for predicting the spatial distribution of undetected transcripts, we used a dataset comprising 45 paired ST and scRNA-seq datasets curated by Li et al.35. These datasets were generated using various techniques, including FISH, osmFISH, seqFISH, MERFISH, STARmap, ISS, EXseq, BaristaSeq, ST, 10X Visium, Slide-seq, Seq-scope, and HDST. STARmap and seqFISH+ST datasets were employed to assess the accuracy of DeepTalk and other methods for cell type decomposition. For the single-cell ST dataset, the cells were separated into distinct groups based on fixed spatial distances and combined to create simulated spots, resulting in a reference dataset. The performances of Tangram, Cell2location, SpatialDWLS, RCTD, Stereoscope, DestVI, and SPOTlight in predicting cell-type compositions within each spot were evaluated by comparing them with true cell-type compositions, using metrics such as PCC, SSIM, RMSE, JSD, and AS. By utilizing a benchmark dataset comprising the MERFISH, 10X Visium, and ST datasets, we compared the performance of DeepTalk with that of other methods for inferring CCC, including CellCall, CellChat, CellChatV2, CellPhoneDB, CellPhoneDBV3, ICELLNET, iTALK, SingleCellSignalR, Giotto, stLearn, Connectome, NicheNet, COMMOT. All methods were evaluated using their default parameters. For the comparison of NICHES and Scrabin, two methods for inferring CCC at the single-cell resolution, we utilize the same ground-truth ligand-receptor pairs obtained from OmniPath80 for this analysis.Visualize the CCC patterns using UMAPTo visualize the CCC patterns using UMAP, we predicted CCC events mediated by various L-R pairs at single-cell resolution. Each predicted event was assigned a quantitative score reflecting the communication strength, resulting in a matrix where rows represent distinct CCC events, and columns correspond to unique L-R pairs. For dimensionality reduction and visualization, we employed Scanpy, a robust tool for single-cell analysis. Initially, we scaled the data using sc.pp.scale() to normalize the feature values. This was followed by principal component analysis (PCA) using sc.tl.pca(), which helped reduce the dimensionality of the dataset while preserving its main structure. Next, we constructed a neighborhood graph using sc.pp.neighbors(), which identifies cells that are close to each other in the high-dimensional space. This step is crucial for subsequent manifold learning techniques. Finally, we used sc.tl.umap() to craft the UMAP visualizations, thereby enabling the depiction of intricate CCC patterns within a two-dimensional framework.Reporting summaryFurther information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Hot Topics

Related Articles