METI: deep profiling of tumor ecosystems by integrating cell morphology and spatial transcriptomics

Overview of METI’s workflowMETI analyzes the TME in a systematic, step-by-step manner, focusing on the progression from normal to premalignant cells and then to malignant cells, while also examining the lymphocytes within each tissue section. METI takes standard ST data as input, including a spot-by-gene matrix for gene expression data, an H&E image of the corresponding tissue section, and X, Y coordinates that map the location of each spot onto the image. The goal of METI is the precise identification of various cell types and their respective states within the TME. Each module in METI is tailored to address a particular cell type, enabling focused analysis leveraging domain-specific knowledge (Fig. 1). In the first module, METI identifies normal and premalignant cells, such as goblet cells in the stomach17,18,19. In module 2, METI identifies tumor cell-enriched regions and characterizes their cell states heterogeneity. Module 3 focuses on spatial mapping of T cells including CD4+ and CD8+ T cells, and various T cell states such as regulator T cells (Treg) and exhausted T cells (Tex). In addition to T cells, METI identifies other immune cells including neutrophils, B cells, plasma cells, and macrophages in module 4. In the last module, METI focuses on a comprehensive analysis of CAFs, a subset of activated stromal cells that play a crucial role in cancer progression and therapy resistance20,21,22. This module maps CAFs and their subtypes, including myCAFs, iCAFs, and apCAFs23,24,25,26,27. The outputs generated by METI are comprehensive, including specific cell-type segmentation results, gene expression data, and integrated segmentation-gene expression results that offer a holistic view of the tissue sample, as well as 3D-density plots for the spatial visualization of cell density. We have demonstrated that METI achieves more accurate cell type identification compared to existing methods. Additionally, it remains robust in instances where one modality may be of low quality, as high-quality data from the other modality can compensate, ensuring the reliability and effectiveness of our analysis.Fig. 1: Workflow of METI.METI takes 10x Visium Spatial Transcriptomics (ST) data, with a spot-by-gene matrix for gene expression data, Hematoxylin and Eosin (H&E) images, and XY coordinates that map the location of each spot onto the image as input. With METI algorithm, METI offers cell type identification, nuclei segmentation and the functionality of generating 3D cell density plots in five distinct modules. Module 1 is dedicated to mapping normal and premalignant cells through the integration of gene expression (GE) data and H&E images. Module 2 focuses on identifying cancer cell domains and characterizing their heterogeneity. Module 3 is dedicated to T cell mapping and phenotyping. Module 4 involves in-depth analysis of other immune cells. Lastly, Module 5 pertains to the analysis of Cancer-Associated Fibroblasts (CAFs).Mapping normal and premalignant cellsThe first module of METI focuses on dissecting the normal and premalignant cells within the epithelial cell compartment. Here, we used goblet cells as an example because they display a distinctive morphological appearance in H&E-stained images. Goblet cells are shaped like wine goblets, with pale, almost white vesicles at the top and oval nuclei at the base. Goblet cells are commonly found in the respiratory, digestive, and reproductive tracts, including the small intestine, colon, and bronchi. They play a critical role in maintaining homeostasis in these tissues. In the context of disease, the abnormal presence of goblet cells in the gut is a key characteristic of a precancerous condition known as intestinal metaplasia17,19,28,29,30.To showcase the capabilities of METI’s module 1, we applied it to identify goblet cells in a human stomach adenocarcinoma (STAD) sample, labeled as G1, annotated by our gastrointestinal pathologists (Fig. 2a). METI first combines canonical goblet cell markers reported in previous publications31,32,33,34,35 into a meta gene signature, including MS4A10, MGAM, CYP4F2, XPNPEP2, SLC5A9, SLC13A2, SLC28A1, MEP1A, ABCG2, and ACE2 (Supplementary Table 2). This meta gene visually represents the overall expression levels of the goblet cell molecular signature across the whole section. As shown in Fig. 2b, this meta gene was then used to annotate goblet cell-enriched regions using a machine learning model, TESLA14. Notably, cell type annotation in TESLA mainly relies on gene expression, which may lead to false negative annotation due to the regional variation and high level of noise for some marker genes. For instance, as shown in Fig. 2c, TESLA fails to identify goblet cells in region 4. Further examination shows that this false negative in detection was due to the overall low unique molecular identifier (UMI) counts captured in region 4, as illustrated in Fig. 2d. To address the limitations caused by low-quality gene expression data, METI simultaneously performed goblet cell identification on the H&E image. METI employed a K-mean-based segmentation method to detect different morphological components, such as background, nuclei, fiber, gland, and necrosis. Next, by filtering on the color, shape, and size of these components (see Methods), METI is able to accurately detect individual goblet cells characterized by their morphology signature, i.e., their round hollow centers (Fig. 2e). This morphological analysis by METI enabled the identification of goblet cells in region 4 (Fig. 2f), an area overlooked by transcriptomic data alone. Conversely, goblet cells in regions 1 and 3 were not detected through image analysis because the tissue was fragmented, exhibiting discontinuous and fragile shapes. By integrating the gene expression and image analysis results, METI successfully identified all four regions enriched with goblet cells, as shown in Fig. 2g. This integrative approach for goblet cell detection overcomes the limitations posed by low UMI counts and provides a more accurate characterization of goblet cells within the analyzed samples. A detailed examination of goblet cell detection, utilizing both gene expression data and imaging, reveals that accurate detection can only be achieved through the integration of these two modalities. Such detection cannot be achieved by popular spatial clustering methods alone and the necessity of combining both modalities for comprehensive analysis is depicted in the Supplementary Information.Fig. 2: Mapping premalignant cells and cancer cell domain.a Pathology annotation depicting goblet cell enriched regions in STAD G1. b Goblet meta gene expression plot at pixel-level. c Spot annotation indicating regions of high goblet cell gene expression on the H&E image. d Total UMI counts for individual spots. e Identification of four distinct goblet-enriched regions on the left side, accompanied by zoomed-in views of goblet regions of the H&E image and segmentation outcomes for regions 2 and 4. f Spot annotation using segmentation results. g METI combined result by integrating gene expression and segmentation. h Pathology annotation highlighting tumor cell-enriched spots of STAD G2 (left), pixel depiction of EPCAM highly expressed regions (middle), and EPCAM+ region annotation on the H&E image (right). i Pixel-level gene expression plots for tumor subtypes, MKI67, MSLN, SOX9, and CLDN18. j Overlay of regions expressing tumor-related genes and SOX9-positive regions. k Nuclei segmentation (left) and 3D cell density plots (right).Identification of cancer cell domains and heterogeneityThe majority of solid tumors originate from epithelial cells, known as carcinomas, including gastric, lung, bladder, breast, prostate, and colon cancers, while some other solid tumors start in other types of tissues including sarcoma and melanoma. Regardless of their cell of origin, understanding the molecular features and cellular heterogeneity of malignant cells is crucial for unraveling the mechanisms underlying tumor growth, invasion, metastasis, and therapeutic response. Therefore, METI’s second module focuses on the analysis of malignant cells. This module starts by identifying cancer cells using cancer cell markers that are curated by the authors such as cytokeratins (CK), EPCAM, and trefoil factors. As depicted in Fig. 2h, METI effectively identifies all tumor regions in STAD sample G2, in strong agreement with annotations made by our experienced pathologists. Next, METI incorporates additional markers to characterize cancer cell states and heterogeneity, including markers of cell proliferation such as MKI67 to map proliferative cancer cells, stemness-related markers such as SOX9 to identify stem-like cancer cells in STAD, and therapeutic targets like CLDN18 and MSLN to further characterize tumor subtypes36,37,38,39. These aforementioned marker genes exhibit distinct expression patterns within the tumor region of sample G2, as illustrated in Fig. 2i. They can be utilized to characterize different states of cancer cell states. For example, as shown in Fig. 2j, METI is not only able to identify the SOX9+ tumor region but also can illustrate the co-localization or exclusivity of different cancer cell states in Supplementary Fig. S1. This module offers a flexible and customizable approach, allowing users to input their genes of interest for tumor state identification. Additionally, users can employ genes associated with critical pathways such as KRAS, EGFR, and factors like hypoxia to conduct a comprehensive exploration of cancer cell states and spatial heterogeneity across diverse cancer types.Quantifying the distribution and density of cells spatially within biological tissues is crucial for diverse applications, particularly in the field of pathology and oncology. While gene expression provides a molecular lens, the associated H&E images can be leveraged to measure spatial cell distribution and density. Following a parallel process in module 1, METI next conducted tumor cell nuclei segmentation, and then generated 3D tumor cell density plots (Fig. 2k), visually depicting the spatial distribution and density of cancer cells. This function serves to convey the spatial distribution, density, and pattern of cell types of interest.T cell mapping and phenotypingModule 3 in METI is dedicated to characterizing T cells and their various states within the TME. Initially, we utilize specific T cell markers, including CD3D and CD3E, to map T cell-enriched regions. Within the identified T cell regions, we further discern the different states of T cells. By adding specific cell lineage markers such as CD4, CD8A, and CD8B40, we can further distinguish CD4+ T cells, CD8+ T cells, and their various states including CD4+ Tregs (e.g., FOXP3, IL2RA) and CD8+ Tex cells by incorporating known immune checkpoint genes (e.g., PD-1, TIM-3, and LAG-3, CTLA-4, TIGIT) and Tex related transcription factors (e.g., TOX)40. Furthermore, this module provides function of overlaying two or more different T cell states within defined cancer cell regions directly on the same tissue section, allowing us to visualize their spatial relationships. Given that the level and spatial distribution of infiltrated T cells are critical factors influencing tumor immune phenotypes and immunotherapy responses, METI’s 3D module creates cell density plots for the entire image, serving to visually depict the spatial distribution of T cells within the TME.To showcase the capability of this module, we applied METI to analyze a STAD sample G3 and a lung adenocarcinoma (LUAD) sample L1. The pathology annotations for both samples are presented in Supplementary Fig. S2. METI identified regions characterized by elevated T cell gene expression levels, as illustrated in Fig. 3a. Next, to delineate regions enriched in CD8+ T cells, we restrict our analysis to T cell-enriched regions only. The regions enriched in CD8+ T cells are shown in Fig. 3b, and different states including CD4+ Tregs and CD8+ Tex cells are shown in Fig. 3c, d. Mapping distinct T cell states aids in elucidating their spatial landscape and relationships within the analyzed STAD and LUAD samples, as well as cellular interactions, fostering the generation of insightful hypotheses.Fig. 3: T cell mapping and phenotyping.a Pixel-level visualization of T cell marker gene expression in STAD G3 and LUAD L1 (left), accompanied by annotation indicating regions of T cell marker gene expression on the H&E image. b Pixel-level representation of CD8+ T cell marker gene expression (left), along with annotation of CD8+ T cell marker gene-expressing regions on the H&E image (right). c Pixel-level representation of CD4+ Treg marker gene expression. d Pixel-level depiction of CD8+ Tex marker gene expression. e Overlay displaying the intersection of tumor+ region and CD4+ Treg-positive region. f Overlay illustrating the overlap between tumor+ region and CD8+ Tex-positive region. g 3D cell density plots for STAD G3 and LUAD G1. h Overlay demonstrating the spatial relationship between CD4+ Treg and CD8+ Tex-positive regions.As the relative locations of CD4+ Tregs and CD8+ Tex cells to cancer cells impact tumor immune phenotypes41,42 and immunotherapy responses, we have overlaid regions with cancer cells with those enriched with CD4+ Tregs and CD8+ Tex cells, respectively, as depicted in Fig. 3e, f. Based on the overlay results, we observe distinct enrichment patterns in CD4+ Tregs and CD8+ Tex cells across different samples. Specifically, in sample G3, CD4+ Tregs are slightly less abundant than CD8+ Tex cells. Conversely, in the sample L1, CD8+ Tex cells are less abundant than CD4+ Tregs (Fig. 3e, f). This highlights the variability in T cell states among different tumor types. To better illustrate the spatial cell distribution of the whole image, METI provides 3D cell density plots (Fig. 3g) based on the nuclei density segmented from the H&E image. For the STAD sample, a region in the upper left displays higher cell density, whereas the LUAD sample shows relatively homogeneous cell density throughout. Furthermore, we conducted an overlay of CD4+ Treg and CD8+ Tex signals to study their spatial co-localization patterns (Fig. 3h). Notably, CD4+ Tregs and CD8+ Tex cells tend to co-localize at the bottom left of the LUAD sample, while the rightmost part of the LUAD sample solely comprises CD4+ Treg cells, indicating the heterogeneity in spatial distribution and cellular composition of T cells. This co-localization analysis provides a better understanding of the coexistence and potential interplay between these two T cell states. Moreover, METI can assist researchers in studying various types of T cells, such as naïve T cells, memory T cells, follicular helper T cells, and their transcriptional states. Users can customize it to plot specific T cell types and states of interest. This flexibility allows researchers to explore T cell state composition and distribution within TME.In-depth analysis of other immune cellsMETI’s module 4 is capable of detecting immune cell types other than T cells, including neutrophils, macrophages, B cells, and plasma cells, which are critical components in the TME. METI utilizes validated gene signatures to identify specific immune cell types/states40,43,44,45. We have applied this module to two bladder cancer samples B1 and B2 for neutrophil detection. The two H&E images are shown in Fig. 4a, c. The neutrophil-enriched regions in these two sections were verified by our experienced pathologists as ground truth for evaluation (Supplementary Fig. S3). As shown in Fig. 4b, d, METI identified regions exhibiting elevated gene expression levels of neutrophil marker genes in both sections. Subsequently, METI conducted corresponding annotation for neutrophil-enriched regions directly on H&E image, isolating regions expressing the neutrophil marker genes and providing a magnified view, as illustrated in Fig. 4e. Upon zooming in on these annotated regions, neutrophils, which exhibit characteristic multi-lobed nuclei, were easily distinguished in the image analysis. In Fig. 4f, four regions where neutrophils have been pathology-verified were circled out and subsequently segmented for neutrophil detection. The results correlated well with the annotation using gene expression. We also provided 3D cell density as shown in Fig. 4g to illustrate the spatial cell distribution in the vicinity of neutrophils within the TME.Fig. 4: In-depth analysis of other immune cells.a H&E image of bladder cancer sample B1. b Pixel-level visualization of neutrophil marker gene expression in BLCA-B1. c H&E image of bladder cancer sample BLCA-B2. d Pixel-level visualization of neutrophil marker gene expression in BLCA-B2. e Annotation indicating regions of high neutrophil gene expression on the H&E image for BLCA-B1 and BLCA-B2; Zoom-in display of three neutrophil-enriched regions of BLCA-B1 and BLCA-B2, and four yellow-circled regions where neutrophils present visually. f Zoomed-in view of four yellow-circled region in (e) and corresponding segmentation results. g 3D cell density plots for BLCA-B1 and BLCA-B2. h Pixel-level visualization of B cell marker gene expression in STAD G4 (left), accompanied by annotation indicating regions of B cell marker gene expression on the H&E image (right). i Pixel-level visualization of plasma cell marker gene expression in STAD G4 (left), accompanied by annotation indicating regions of plasma cell marker gene expression on the H&E image (right). j 3D cell density plots for STAD G4. k Pixel-level visualization of macrophage marker gene expression in STAD G4 (left), accompanied by annotation indicating regions of plasma macrophage marker gene expression on the H&E image (right). l Zoomed-in view of macrophage regions of the H&E image and segmentation.Additionally, we demonstrate the capability of this module by mapping B cells and plasma cells in an STAD sample (Fig. 4h, i). The 3D cell density plot (Fig. 4j) aligns well with the lymphoid aggregates in sample STAD G4 annotated by our pathologists (Supplementary Fig. S2a). Similarly, macrophages can be correctly mapped in the LUAD sample based on pathology annotation (Fig. 4k and Supplementary Fig. S2b). Within the regions showing high macrophage marker gene expression, a randomly selected region was segmented, revealing a cluster of macrophages (Fig. 4l). In addition to the aforementioned immune cell types, this module maintains flexibility by allowing users to investigate other specific immune cell populations of interest using their curated gene signatures.Analysis of cancer-associated fibroblasts (CAFs)In Module 5, METI is designed to analyze stromal cell components including CAFs and various CAFs subtypes within the TME. CAFs are known for their exceptional heterogeneity, both phenotypically and functionally46,47,48. They are categorized as activated fibroblasts, representing an essential component of the TME with both tumor-promoting and tumor-restraining activities49,50,51. CAFs are phenotypically and functionally heterogeneous. Different subtypes of CAFs such as myofibroblastic CAFs (myCAFs), inflammatory CAFs (iCAFs), and antigen-presenting CAFs (apCAFs) have been identified and described23,24,25,26,27.We applied module 5 to the gastric sample G2 which was annotated to contain abundant tumor stroma by our pathologists (Fig. 5a). METI first segmented CAFs and generated a fibroblast cell density plot as illustrated in Fig. 5b. Next, we found that the fibroblasts enriched region annotated by METI have diminished UMI counts, aligning with the notion that cancer cells tend to have higher UMI counts compared to other cell types (Fig. 5c). METI next effectively mapped CAFs within the sample using the CAF metagene (Fig. 5d) and annotated CAFs directly on H&E image, which was highly consistent with the pathology annotation. Within the annotated CAF regions, METI further delves into the characterization of CAF subtypes, including myCAFs, iCAFs, and apCAFs (Fig. 5e–g)23,24,25,26,27. To characterize the spatial co-localization of CAF subtypes, we overlayed the three CAF populations with the total CAF-positive regions (Fig. 5h). This approach allows us to better understand the spatial heterogeneity of CAFs within the TME. Likewise, METI can co-map CAFs, cancer cells, and any other immune cell subsets of interest to provide additional insights into cellular interactions among them. This module remains adaptable, enabling users to explore other subregions related to CAFs based on their specific interests.Fig. 5: Analysis of cancer associated fibroblasts.a Pathology annotation of fibroblast-enriched spots in STAD G2. b Fibroblasts segmentation result (left), accompanied by 3D fibroblast density plots (right). c Total UMI counts for individual spots. d Pixel-level meta gene expression plot for CAF (left), with annotations highlighting regions of elevated CAF gene expression on the corresponding H&E image (right). e Pixel-level meta gene expression plot specifically for myCAF (left), accompanied by annotations indicating regions of elevated myCAF gene expression on the H&E image (right). f Pixel-level meta gene expression plot for iCAF (left), with annotations denoting regions of elevated iCAF gene expression on the H&E image (right). g Pixel-level meta gene expression plot for apCAF (left), with annotations indicating regions of elevated apCAF gene expression on the H&E image (right). h Overlay showcasing regions of high gene expression for myCAF, iCAF, apCAF, and general CAF.Quantitative comparison with existing toolsWe initiated our analysis by comparing METI’s performance against two spatial clustering methods, SpaGCN13 and BayesSpace12, specifically in the context of goblet cell annotation within a human STAD dataset. The pathologist’s annotations, serving as a benchmark, indicated the locations of goblet cells on the H&E image at the spot level, as depicted in Supplementary Fig. S4a, b. METI showcased high accuracy (ACC = 0.778) in identifying these spots, as demonstrated in Supplementary Fig. S4c.To conduct a fair comparison with the unsupervised nature of SpaGCN and BayesSpace, where cluster identities are initially unknown, we established a criterion: a cluster is considered a goblet cluster if over 10% of its spots are true goblet cells. This approach allowed us to categorize clusters into goblet and non-goblet groups for a binary comparison and to calculate the accuracy for both BayesSpace and SpaGCN. The clustering results for BayesSpace are illustrated in Supplementary Fig. S4d. BayesSpace achieved its highest accuracy of 0.704 at n = 15 clusters, which is lower than METI. Further analysis using river plots in Supplementary Fig. S4e, revealed that BayesSpace struggled to isolate a single cluster exclusively composed of goblet cells, regardless of the cluster number chosen. SpaGCN encountered similar difficulties, as shown in Supplementary Fig. S4f, with its highest accuracy of 0.734 at n = 5 clusters. These findings emphasize METI’s advantage in cell type annotation over existing clustering methods. While both METI and the spatial clustering methods are reference-free, METI is capable of providing accurate cell type annotations by incorporating domain knowledge into its framework. In contrast, the spatial clustering methods primarily capture the overall tissue structure rather than cell type enrichment.Furthermore, we utilized gastric single-cell RNA sequencing data as a reference to evaluate the performance of METI, RCTD, and CytoSpace in identifying goblet cells, CD4+ T cells, and CD8+ T cells within two distinct human gastric cancer spatial transcriptomics (ST) datasets, as detailed in Supplementary Fig. S5. For the first dataset, RCTD displayed limited accuracy of 0.525 in Supplementary Fig. S5a, notably struggling to accurately determine the location of goblet cells. In contrast, CytoSpace exhibited a significantly higher accuracy of 0.751 in Supplementary Fig. S5b, aligning closely with the findings presented by METI in Supplementary Fig. S5c.Due to the lack of pathologist annotations in the second dataset, hence the accuracy cannot be quantified, we focused on comparing the capabilities of RCTD and CytoSpace in identifying CD4+ T cells and CD8+ T cells. RCTD has a limited ability to accurately identify CD4+ T cells and CD8+ T cells (Supplementary Fig. S5d, g), while CytoSpace outperforms RCTD (Supplementary Fig. S5e, h). Consequently, to further validate these findings, we analyzed the expression of marker genes for a visual assessment of the results, as depicted in Supplementary Fig. S5f, i, which further supports the presence of these cell types in the locations identified by METI and CytoSpace.METI’s robustness with low-quality imagesWe illustrate METI’s robustness with low-quality images using another human STAD dataset. As depicted in Supplementary Fig. S6a, blurriness from loss of camera focus obscures cell boundaries, making it difficult for our pathologists to provide detailed annotation of lymphocytes due to rough boundaries between different regions, as noted in Supplementary Fig. S6b. In this dataset, we applied METI for B cell annotation. B cells, along with other lymphocytes, are distinguishable in H&E images by their small, dark-purple nuclei. Therefore, the initial step with METI is to identify lymphocytes on the H&E images. As illustrated in Supplementary Fig. S6c, the blurriness of the image led to numerous false positive detections of lymphocytes, which are sparsely distributed, particularly around the tissue border. Moreover, the H&E staining does not adequately differentiate B cells from other lymphocytes due to their similar morphological features. The subsequent step in the METI is to identify B cells using specific gene markers, including MS4A1 and CD19. Supplementary Fig. S6d showcases the B cell distribution identified by METI through gene expression only. Given that B cells are a specific subtype of lymphocytes, METI further refines the detection by overlaying B cell data with the lymphocyte regions identified from image analysis, as displayed in Supplementary Fig. S6e. The results are then translated into spot-level data, as depicted in Supplementary Fig. S6f. This case demonstrates METI’s capability to merge gene expression and image analysis in a knowledge-aware manner, ensuring robustness in cell-type annotation even when image quality is low.

METI: deep profiling of tumor ecosystems by integrating cell morphology and spatial transcriptomics

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis

Chemistry wordoku #062 | Puzzle

Multi-output prediction of dose–response curves enables drug repositioning and biomarker discovery

Hot Topics

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis

Related Articles

Balancing Act: Pregnancy and Bipolar Disorder

Cohesion at the cellular level: flexible yet stable

Gut bacteria influence responses to immunotherapy in patients with asbestos related cancer

Quick Links

Must Read

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis

Chemistry wordoku #062 | Puzzle

Popular Articles

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis