Enhancing spatial domain detection in spatial transcriptomics with EnSDD

Overview of EnSDD with the architecture analysesAn overview of the EnSDD method is depicted in Fig. 1. EnSDD leverages a comprehensive approach to SDD by integrating multiple advanced SDD methods. The input data for EnSDD includes gene expression profiles, spatial coordinates of spots, and paired morphology images (Fig. 1A). The core of EnSDD’s approach is the fusion of eight cutting-edge SDD methods: BayesSpace9, DR-SC10, GraphST11, STAGATE12, SpaGCN13, stLearn14, SiGra15, and spaVAE16 (Fig. 1B). EnSDD first executes these eight methods to obtain multiple clustering results. Subsequently, binary similarity matrices are constructed for each clustering result, representing pairs of spots within the same cluster. Leveraging these matrices, EnSDD employs an optimization model to derive a consensus similarity matrix for spots. This model simultaneously determines the final spatial similarity matrix between spots and the adaptive weights assigned to each base method. Notably, EnSDD adaptively assigns weights to each base method, and these adaptive weights could be used as a metric to quantify the performance of base methods when applied to a specific dataset. Finally, EnSDD applies the Louvain algorithm with adaptive resolution to identify spatial domains based on the consensus similarity matrix of spots.Fig. 1: Workflow of EnSDD.A Input for EnSDD includes gene expression and spatial coordinates of spots, along with paired morphology images. B Core component of EnSDD for spatial domain detection. EnSDD first executes eight state-of-the-art spatial domain detection methods to obtain multiple clustering results. Then, multiple binary similarity matrices of spots are constructed based on the clustering labels. EnSDD employs an optimization model to integrate these similarity matrices, which results in the consensus similarity matrix S and the weights assigned to each base method ωm. The consensus similarity matrix is further inputted into Louvain algorithm with adaptive resolution to obtain final clustering labels of spots. C EnSDD enables a recognition of spatial domains for diverse tissue organizations. Several approaches are integrated into the EnSDD framework for determining spatially variable genes and inferring cell type enrichment patterns in different spatial domains, further exploring spatial heterogeneity of tissues. Additionally, the learned weights are significantly positively correlated with the performance of base methods, suggesting the learned weights assigned to each base method could be used as a metric to quantify the performance of base methods. D The EnSDD R package (left) with well-documented tutorials is provided, and an interactive platform (right) is constructed for easier implementation and visualization.While the primary goal of EnSDD is to detect spatial domains, it also advances the exploration of tissue spatial heterogeneity by integrating various existing methods (Fig. 1C). To comprehend how tissue regions form and function, it is crucial to investigate the SVGs that are enriched in specific spatial domains, as those links molecular functions with regional phenotypes. EnSDD offers two approaches for identifying SVGs: one involves detecting domain-specific differentially expressed genes (DEGs), while the other identifies genes with expression patterns correlated to spatial locations. In addition, EnSDD incorporates cell type composition inference into its framework, providing a more detailed and accurate depiction of tissue complexity and heterogeneity. For easier implementation and visualization, EnSDD provides an R package with well-documented tutorials and a user-friendly interactive platform (Fig. 1D).Application to DLPFC 10x Visium dataTo assess the spatial clustering performance of EnSDD comprehensively, we initially applied it to the DLPFC dataset comprising 12 human DLPFC sections. These sections have been manually annotated by Maynard et al.20, providing ground truth labels for DLPFC layers and white matter (WM) based on morphological features and gene markers. We compare the clustering accuracy of EnSDD with the eight base spatial clustering methods in terms of the adjusted Rand Index (ARI)21 and purity score (Pur)22. Each base method is applied with its default parameter settings. EnSDD is then applied to obtain ensemble clustering results. In this application, we set the number of domains to match the annotated tissue slices.The results demonstrate that EnSDD achieves top performance across most DLPFC slices. For instance, in slice 151,674 (Fig. 2A), EnSDD accurately delineates layer borders and achieves the highest clustering accuracy (ARI = 0.637, Pur = 0.753). The visual comparison reveals that some base methods, particularly DR-SC, fail to recover distinct layer clusters, with clusters mixed among different layers. While certain methods like SpaGCN, spaVAE, and stLearn exhibit partial success in identifying specific layers, they also struggle with accuracy. STAGATE generates layers that more closely match the annotation, but the layer thickness is incorrect, particularly in the WM layer (Fig. 2C). In contrast, EnSDD consistently outperforms individual SDD methods across various domains, aligning better with known annotation (Fig. 2A). Additionally, EnSDD assigns higher weights to more accurate methods, with a strong positive correlation between weights and ARI values (Fig. 2B), demonstrating its ability to effectively integrate and optimize base models.Fig. 2: EnSDD clustering enhances tissue structure identification in human DLPFC tissue.A Manually annotated layer structure of slice 151508 (left) and spatial domains detected by EnSDD (right). B Dot plot illustrating the correlation between the weights assigned to base methods by EnSDD and the ARI scores of corresponding base SDD methods on slice 151674. The Pearson correlation coefficient (R) between the learned weights and ARI scores of base methods, along with the corresponding P value (from a one-sided t test) is provided. C Spatial domains detected by BayesSpace, DR-SC, GraphST, SiGra, SpaGCN, spaVAE, STAGATE, and stLearn. D Manual annotation (left), spatial domains detected by EnSDD (middle), and histology image (right) for slice 151,669. E UMAP plots illustrating subregions 1 and 4 identified by EnSDD in slice 151,669. F Boxplot of Local Getis and Ord’s Gi values for six DEGs (P value ≤ 0.05 and adjusted P value ≤ 0.1) identified by Wilcoxon test on the gene expression between subregions 1 and 4. G Boxplots of ARI (left) and purity (right) scores of the nine methods applied to all DLPFC slices except for slice 151,669.However, a discrepancy is observed between EnSDD results and manual annotation in slice 151,669 (ARI = 0.273, Pur = 0.501). EnSDD tends to separate Layer 3 into subregions 1 and 4 (Fig. 2D, left and middle), a distinction supported by morphological differences within Layer 3 (Fig. 2D, right). Other methods like SiGra, SpaGCN, spaVAE, and stLearn, which leverage histology information, also divide Layer 3 into two regions (Supplementary Fig. S1A). This suggests the presence of multiple subregions within Layer 3. Gene expression analysis further confirms this, revealing distinct expression patterns between subregions 1 and 4 (Fig. 2E). Additionally, differential expression analysis identifies 129 DEGs between subregions 1 and 4, which are enriched for functions related to location maintenance, regulation of lipid localization, and the metabolic processes of triglycerides, neutral lipids, and acylglycerols (Supplementary Fig. S1B and C). Notably, several DEGs have reported roles in neural functions and thyroid carcinomas23,24, underscoring their potential significance. Local spatial autocorrelation analysis (Fig. 2F) further supports spatial segregation, demonstrating EnSDD’s ability to detect finer spatial variations within layers.Considering the overall performance across all DLPFC slices except for slice 151,669, EnSDD achieves the highest median ARI (0.555) and purity score (0.704), outperforming STAGATE (ARI = 0.552, Pur = 0.678) and spaVAE (ARI = 0.553, Pur = 0.672) (Fig. 2G and Supplementary Fig. S2). STAGATE, however, shows significant variability across slices, as reflected in the wide range of ARI scores, whereas EnSDD exhibits more consistent performance. Although spaVAE’s ARI values are comparable to EnSDD, it has a lower purity score. These findings suggest EnSDD’s robustness and superior performance in detecting spatial domains within the DLPFC dataset. Furthermore, we observe a strong positive correlation between the weights assigned to the base SDD methods and their ARI values across different slices (Supplementary Figs. S3–S5). Considering all slices collectively, the strong positive correlation between ARI scores and assigned weights (R = 0.618, P value = 2.01e−11) (Supplementary Fig. S5D) underscores the effectiveness of EnSDD’s dynamic weighting mechanism as a reliable metric for evaluating base methods in the absence of ground truth.Application to mouse brain dataTo assess the effectiveness of EnSDD in identifying spatial domains within tissues with intricate structures, we apply it to mouse brain datasets characterized by complex spatial arrangements. Given the absence of a definitive gold standard annotation for these datasets, we compared EnSDD with base SDD methods by generating domain segmentations with varying numbers of clusters. For a fair comparison, we cluster the mouse coronal slice into 10, 16, and 17 clusters; the sagittal anterior slice into 10, 13, and 15 clusters; and the sagittal posterior slice into 10, 17, and 20 clusters. The Allen Brain Atlas serves as a silver standard for reference (Fig. 3A, B)25.Fig. 3: Spatial domain detection on the mouse brain tissues.A The corresponding anatomical Allen Mouse Brain Atals of mouse brain tissue coronal regions. B The corresponding anatomical Allen Mouse Brain Atlas of mouse brain tissue sagittal regions. The brown dotted box marks mouse brain cortical layers, the purple dotted box marks mouse hippocampus areas, and the black dotted box marks mouse cerebellar cortex. C–E The domain detection of EnSDD for mouse brain coronal slice (C), sagittal anterior slice (D), and sagittal posterior slice (E) with different clustering resolutions. F The visualization of the marker genes corresponds to Layer 2/3, Layer 5, Layer 6a and Layer 6b regions in sagittal anterior slice of mouse brain. Cux2 and Lamp5 are marker genes of Layer 2/3; Pde1a is a marker gene of Layer 5; Nptx1 is a marker gene of Layer 6a; Nxph3 is a marker gene of Layer 6b. G The visualization of the marker genes correspond to CA1 pyramidal layer, visual areas (VIS), dentate gyrus (DG) and purkinje layer in sagittal posterior slice of mouse brain. Lmo3 is a marker gene of CA1 pyramidal layer; Satb2 is a marker gene of VIS; C1ql2 is a marker gene of DG; Pcp2 and Car8 are marker genes of the purkinje layer.We first assess EnSDD’s performance using the mouse brain coronal slice, comparing the identified spatial domains with the anatomical reference from the Allen Brain Atlas (Fig. 3A). This section primarily includes key regions such as the cerebral cortex (CTX), hippocampus (HPF), cerebellum (CB), thalamus (TH), and brainstem (BS). Within the CTX, the isocortex displays a consistent six-layered structure, while the HPF features two prominent structures: the pyramidal layer of cornu ammonis (CA) and the granule cell layer of the dentate gyrus (DG). EnSDD successfully detects the CA and DG sections of the HPF with 10 clusters (Fig. 3C). As the clustering number increases, EnSDD identifies new spatial domains while maintaining the integrity of existing ones (Supplementary Fig. S6). For instance, domain 6, initially identified with 10 clusters, subdivides into domains 8 and 14 when using 16 clusters, delineating two slice layers within the isocortex of the CTX. Notably, the spatial domain structure remains consistent as the number of clusters increases from 16 to 17, highlighting EnSDD’s stability in handling complex tissue structures. In contrast, base clustering methods such as BayesSpace, spaVAE, and stLearn struggle to consistently recognize the DG sections across various clustering settings. Although methods like DR-SC, GraphST, SiGra, SpaGCN, and STAGATE maintain regional continuity, they fail to accurately capture boundary layers within the isocortex (Supplementary Fig. S7). EnSDD, however, effectively delineates these boundaries (Fig. 3C).Next, we apply EnSDD to the sagittal anterior tissue of the mouse brain, where the somatosensory areas (SS) within the CTX exhibit a discernible six-layered structure (Fig. 3B). With a clustering number of 10, EnSDD initially identifies SS as spatial domains 3, 7, and 9. Increasing the number of clusters to 13 and 15 allowed EnSDD to further differentiate specific layers, including Layer 2/3, Layer 5, Layer 6a, and Layer 6b (Fig. 3D). Additionally, domain 6 (with 10 clusters) is subdivided into domains 7 and 8 (with 13 clusters), corresponding to the nucleus accumbens (ACB) and caudoputamen (CP). These findings align with the high expression of known domain-specific marker genes (Cux2, Lamp5, Pde1a, Nptx1, and Nxph3) (Fig. 3F)26. However, base methods like BayesSpace and stLearn exhibit limitations: BayesSpace overly subdivides Layer 2/3, while stLearn produces excessively smooth cluster boundaries, diminishing its ability to capture subtle cortical structures. BayesSpace and SpaGCN also generate disjointed spatial domains at certain resolutions (13/15 clusters) (Supplementary Fig. S9). In contrast, methods like SiGra, spaVAE, and EnSDD (with 15 clusters) clearly delineate the six-layered structure within the CTX.In the mouse sagittal posterior dataset, EnSDD successfully delineates the boundaries of HPF, CTX and midbrain (MB) with 10 clusters. As the number of clusters increases, spatial domain 8 (with 10 clusters) is subdivided into domains 10 and 13 (with 17 clusters), where domain 10 corresponds to visual areas (VIS) in the sagittal posterior slice (Fig. 3B, E and Supplementary Fig. S10). EnSDD also identifies the CA1 and DG areas in HPF with 17 and 20 clusters. The expression patterns of several specific marker genes further validate these cluster partitions (Fig. 3G). However, EnSDD tends to include more spots in the Purkinje cell layer rather than defining it as a single spot layer within the cerebellar cortex (Fig. 3E), which might be due to the limitations of current methods in distinguishing Purkinje cells. Among the base methods, BayesSpace and stLearn fail to generate contiguous regions across different clustering settings. DR-SC and SiGra subdivide the dentate gyrus area into more subgroups at higher cluster numbers, while SpaGCN fails to accurately depict the VIS boundary with 10 and 17 clusters (Supplementary Fig. S11).Overall, the spatial visualizations obtained from EnSDD and the eight base SDD methods confirm the effectiveness of EnSDD in decoding complex spatial domains in mouse brain samples. EnSDD consistently demonstrates superior performance in accurately delineating key brain structures, particularly in regions with intricate spatial arrangements.Application to breast cancer tissue dataIntratumoral heterogeneity in cancer poses significant challenges for effective treatment and is associated with poor survival outcomes27. Spatial transcriptomics provides valuable insights into tumor complexity and the interactions between tumor and immune responses28. To demonstrate the generalizability of EnSDD to cancer tissues, we apply it to a human breast cancer dataset. The data is manually annotated by a pathologist based on the H&E image and the spatial expression of reported breast cancer marker genes29. The H&E image is segmented into four main morphotypes: ductal carcinoma in situ/lobular carcinoma in situ (DCIS/LCIS), healthy tissue (Healthy), invasive ductal carcinoma (IDC), and tumor surrounding regions with low features of malignancy (Tumor edge) and the four main morphotypes are further divided into 20 subregions (Fig. 4A, right). Unlike the well-defined boundaries in DLPFC and mouse brain tissues, human breast cancer tissues are highly heterogeneous with complex microenvironments. Although manual annotations are provided, due to the heterogeneity of tumor tissue, we cluster the data with multiple clusters (10, 16, 17, 20, and 25) and compared spatial domains identified by various SDD methods at different clustering resolutions.Fig. 4: Spatial domain detection on the human breast cancer tissue.A H& E image and manual annotation of human breast cancer tissue. IDCS: ductal carcinoma; LCIS: lobular carcinoma; IDC: invasive ductal carcinoma. B Spatial domain detection of EnSDD (cluster = 10/16/17/20/25). C Sankey diagram comparing the 10/16/17/20/25 clusters generated by EnSDD for human breast cancer tissue. The widths of the lines linking both sets of clusters correspond to the number of spots they have in common. Due to the randomness inherent in the Louvain algorithm, if the intersection of spots between the source and target domains is less than 5, the target domains will be filtered out in the layer of Sankey diagram. D Visualization of domains 13, 15, and 17 (cluster = 25). E Expression of marker genes of domains 13, 15, and 17. F Visualization of deconvolution result. A spatial scatter pie chart displays cell-type compositions predicted by RCTD, and each scatter represents a spot in SRT data. Abbreviations: CAFs (cancer-associated fibroblasts), DCs (dendritic cells), PVL (perivascular-like cells). G Comparisons of cell type proportions in domains 13, 15, and 17. The boxplot represents the distribution of cell type proportions in each region. For each cell type, a two-sided Wilcoxon Rank Sum test is used to test the difference.Visually, EnSDD effectively distinguishes DCIS/LCIS, healthy tissue, IDC, and tumor edge regions with cluster number 10 (Fig. 4A, B). Specifically, domain 1 encompasses tumor edges 1, 2, 3, 4, 5, and 6, while domain 6 includes DCIS/LCIS 1, 2, and 5. Similarly, domains 2, 4, 5, and 7 correspond to IDC 4, 2, 8, and 5. As the cluster numbers increase, for example, from 10 to 16 clusters, domain 1 is further subdivided into domains 1 and 6, corresponding to tumor edges 6 and 2, respectively. EnSDD demonstrates an improved ability to reveal intricate details of intratumoral heterogeneity with increasing cluster numbers, as demonstrated by the identification of specific domains 13, 15, and 17 with 25 clusters, which corresponded to IDC_5 in manual annotation (Fig. 4B, C). Compared to base SDD methods, EnSDD identifies regions with greater regional continuity and reduced noise upon visual inspection (Fig. 4B). Notably, EnSDD demonstrates an ability to identify additional sub-clusters within tumor regions, whereas other methods tend to subdivide Healthy_1 region into sub-clusters, despite all methods generating the same number of clusters (Fig. 4C and Supplementary Fig. S12). For a quantitative assessment, we compute the ARI and purity scores based on 20-cluster results compared to manual annotation. EnSDD achieves the highest ARI of 0.667 and a purity score of 0.668, surpassing other methods (Fig. 4B and Supplementary Fig. S12D).To delve deeper into the intratumoral heterogeneity of human breast cancer, we focus on EnSDD domains 13, 15, and 17 in the IDC_5 region with 25 clusters (Fig. 4D). Differential expression analysis using the Wilcoxon test identifies 311 DEGs in subregion 13, 192 DEGs in subregion 15, and 270 DEGs in subregion 17 (P value ≤ 0.05; adjusted P value ≤ 0.1), compared to other regions. Among the top DEGs with smaller adjusted P values in different subregions, we observe that the DEGs expression patterns are consistent with the boundaries of subregions 13, 15, and 17 (Fig. 4E). Utilizing Local Getis and Ord’s Gi, the following significant local spatial autocorrelation (SA) patterns are observed: CSTA, FAM234B, and HEBP1 exhibit significant positive local SA in subregion 13; CLDN3, GPX4, and MGP show significant positive local SA in subregion 15; GRB14, MAOB, and TEX14 display significant negative local SA in subregion 17 (Supplementary Fig. S13). Notably, these DEGs are linked to IDC progression, with genes such as HEBP1 (adjusted P value = 1.64e−46) being reported to be associated with the development of the ERα + breast tumors30; FAM234B (adjusted P value = 1.89e−43) is linked to poor prognosis in patients with Luminal breast cancer31; CSTA (adjusted P value = 5.71e−38) is involved in regulating the progression of ductal carcinoma in situ to invasive breast cancer32. In addition, gene set enrichment analysis (GSEA) reveals that the DEGs are most enriched in biological functions associated with immune activation and response, highlighting the complexity of tumor-immune microenvironment in invasive ductal carcinoma of human breast cancer (Supplementary Fig. S14)28,33.To further explore the tumor-immune microenvironment of IDC, we employ RCTD34 to predict the cell type composition of spots in subregions 13, 15, and 17 of the IDC_5 region using scRNA-seq human breast tissue as a reference (Fig. 4F)28. The predicted cell type proportion distributions reveal significant differences among subregions 13, 15, and 17 for cancer epithelial cells, macrophage, and dendritic cells (DCs) (Fig. 4G). To be specific, the proportion of cancer epithelial cells is higher in subregion 13 (central tumor) than in subregions 15 and 17 (surrounding the central tumor), while the proportions of macrophage and DCs in subregion 15 and 17 are significantly higher than those in subregion 13. The elevated proportion of macrophages and DCs in the peritumoral area suggests a potential role in cancer cell migration and invasion35,36. Additionally, CD4+ T cells, crucial for regulating immune responses and influencing breast cancer survival outcomes37,38, show slight differences in abundance between subregions 13 and 15 (Fig. 4G). The increased CD4+ T cell presence at the tumor periphery may be associated with a more favorable prognosis due to its role in anti-tumor immunity.In summary, our findings demonstrate that EnSDD can identify finer regions with distinct biological functions, uncovering intratumoral heterogeneity within visually homogeneous tumor regions and providing insights into the tumor-immune microenvironment.Applications to prostate and ovarian cancer tissue dataThe edge effect in cancer, characterized by increased diversity at the boundaries between tumor and normal tissues—known as ecotones—plays a crucial role in tumor progression and treatment response39. Spatial transcriptomics allows for detailed mapping of these cancer ecotones, offering insights into the peritumoral microenvironment and aiding in the development of innovative treatment strategies. To demonstrate the effectiveness of EnSDD in delineating cancer ecotones, we apply the method to human prostate and ovarian cancer datasets. These datasets are initially categorized by pathologists into tumor and stroma regions40, necessitating the use of both EnSDD and base SDD methods to identify spatial domains at various cluster resolutions. For a fair comparison, we cluster human prostate cancer tissue into 3, 5, and 8 clusters, and human ovarian cancer tissue into 2, 5, and 8 clusters.We first evaluate EnSDD’s performance on human prostate cancer tissue by comparing its identified spatial domains with manual annotation (Fig. 5A). EnSDD accurately maps tumor regions across different cluster numbers. As the number of clusters increases, EnSDD refines domain 3 (initially with 3 clusters) into domains 4 and 5 (now with 5 clusters), where domain 4 corresponds to stroma and partially atrophic changes, and domain 5 aligns with the stroma region identified in the manual annotation. When increasing to 8 clusters, domain 2 is further divided into domains 3 and 5 (Fig. 5B and Supplementary Fig. S15). EnSDD effectively distinguishes between the tumor and surrounding regions, setting 8 clusters, where spatial domains 1, 2, 4, 5, and 7 correspond to the tumor, and domain 3 represents the tumor’s edge. In contrast, base SDD methods, including GraphST, SiGra, SpaGCN, and STAGATE, fail to maintain contiguous and stable domains with increasing clusters, while SpaVAE suffers from over-smoothing (Supplementary Fig. S16).Fig. 5: Spatial domain detection on the human prostate and ovarian cancer tissues.A H& E image and manual annotation of human prostate cancer tissue. B Spatial domain detection of EnSDD (cluster = 3/5/8). C Expression of marker genes of tumor and tumor edge regions. D Visualization of tumor and tumor edge regions with 8 clusters (left) and deconvolution results (right). The spatial scatter pie chart on the right shows cell-type composition predicted by RCTD, with each scatter representing a spot in the SRT data. Abbreviations: BE (basal epithelial cells), HE (high-grade epithelial cells), CE (claudin-expressing epithelial cells), LE (luminal epithelial cells), MNP (mononuclear phagocytes). E Comparisons of cell type proportions in regions tumor and tumor edge. The boxplot represents the distribution of cell type proportion in each region. For each cell type, a two-sided Wilcoxon Rank Sum test is used to test the difference. F H & E image and manual annotation of human ovarian cancer tissue. H Spatial domain detection of EnSDD (cluster = 2/5/8). G Expression of marker genes of domains 1, 3, and 6. I Visualization of domains 1, 3, and 6 with 8 clusters (left) and deconvolution results (right). The spatial scatter pie chart on the right displays cell-type composition predicted by RCTD, with each scatter representing a spot in the SRT data. Abbreviations: CAFs (cancer-associated fibroblasts), DCs (dendritic cells), EOC (epithelial ovarian cancer), ILC (innate lymphoid cells), pDC (plasmacytoid dendritic cells). J Comparisons of cell type proportion in domains 1, 3 and 6. The boxplot represents the distribution of cell type proportion in each region. For each cell type, a two-sided Wilcoxon Rank Sum test is used to test the difference.To further explore differences in gene expression and cell type distribution between tumor and surrounding regions, we analyze spots from domains 1, 2, 4, 5, and 7 (within the tumor) and from domain 3 (the tumor edge), as identified by EnSDD with 8 clusters (Fig. 5D, left). Wilcoxon test is introduced to identify domain-specific SVGs and characterize transcriptional differences between these regions. Subsequently, RCTD is applied to predict cell type abundance for each spot and investigate variations in cell type distributions between the tumor and its edge. Gene expression and cell type distribution analyses reveal significant differences between tumors and surrounding regions. Differential gene expression analysis using the Wilcoxon test identifies total of 4170 genes that are defined as DEGs (P value ≤ 0.05; adjusted P value ≤ 0.1). The top DEGs with smaller adjusted P values show expression patterns that align with the boundaries of the tumor and tumor edges (Fig. 5C). Local spatial autocorrelation analysis using Local Getis and Ord’s Gi reveals significant local SA for KLK3, ATG9B, and NPRL2 in the tumor region, and FLNA, TP63, and TAGLN in the tumor edge region (Supplementary Fig. S17). Notably, GSEA shows the top DEGs, such as KLK3, ATG9B, TP63, are strongly associated with human prostate cancer development41,42. Cell-type distribution analysis, using RCTD, shows significant differences between the tumor and tumor edge regions, particularly for LE-KLK3 (luminal epithelial cells marked by the tumor gene KLK3), BE (basal cells), MNP (mononuclear phagocytes), and B cells. (Fig. 5D, right). Specifically, LE-KLK3 cells are more prevalent in the tumor region, indicating the complex cellular structure and glandular changes within the tumor. Conversely, BE cells and immune cells, including MNP and B cells, are more prevalent at the tumor edge, suggesting a complex immune microenvironment and potential mechanisms for tumor-immune escape (Fig. 5E)43.EnSDD also effectively delineates comprehensive cancer ecotones in human ovarian tumor tissue (Fig. 5F), distinguishing between tumor and stroma regions with 5 and 8 clusters. With increased clustering number, domain 2 (with 2 clusters) refines into domains 2, 3, and 4 (with 5 clusters), and domains 1 and 2 (with 5 clusters) are further split into domains 1, 6 and 2, 5, 7 (with 8 clusters) (Fig. 5H and Supplementary Fig. S18). Specifically, EnSDD identifies domains 1, 3, and 6 (with 8 clusters), where domain 1 corresponds to the stroma region, domain 3 to the tumor region, and domain 6 to the interface between the tumor and stroma, aligning with manual annotation. In comparison, base SDD methods like GraphST and SpaGCN struggle with consistent domain identification as the number of clusters increases, while methods such as SiGra, DR-SC, and BayesSpace tend to overly subdivide the stroma region, and SpaVAE suffers from over-smoothing, resulting in poorly aligned domain boundaries (Supplementary Fig. S19).We further explore gene expression and cell type distribution among tumor, stroma, and ecotone regions by analyzing spots from domain 1 (stroma), domain 3 (tumor), and domain 6 (ecotone) identified by EnSDD with 8 clusters. The Wilcoxon test identifies 6815 DEGs in domain 1, 8014 in domain 3, and 1489 in domain 6 (P value ≤ 0.05; adjusted P value ≤ 0.1). The expression patterns of top DEGs with smaller adjusted P values align with the boundaries of these regions (Fig. 5G). Local spatial autocorrelation analysis using Local Getis and Ord’s Gi reveals significant spatial patterns (Supplementary Fig. S20). Notably, marker genes of subregion 6, such as SFRP4, IGKN, THBS1 are strongly linked to Wnt signaling, immune function, and those gene expression changes may promote human ovarian tumor growth and invasiveness44,45,46. The predicted cell type proportions reveal significant differences among domains 1 (stroma), 3 (tumor), and 6 (ecotone) for epithelial ovarian cancer (EOC) cells, epithelial cells, mesothelial cells, and cancer-associated fibroblasts (CAFs) (Fig. 5I, right). EOC cells are more abundant in the tumor region (domain 3), consistent with manual annotation, while epithelial cells, mesothelial cells, and CAFs are more prevalent in the ecotone (domain 6), reflecting tumor-induced changes in the microenvironment and providing insights into tumor-immune interactions and their implications for treatment development47 (Fig. 5J).In summary, our results demonstrate that EnSDD can accurately identify detailed regions within the tumor microenvironment, revealing distinct cancer ecotones and providing new insights into potential treatment strategies.

Hot Topics

Related Articles