Integrative analysis of pan-cancer single-cell data reveals a tumor ecosystem subtype predicting immunotherapy response

ICI scRNA-Seq cohortsWe downloaded 5 ICI scRNA-Seq cohorts from the TISCH2 portal (http://tisch.comp-genomics.org/)11. The standard scRNA-Seq workflow were already completed by TISCH2 portal including quality control, normalization, unsupervised clustering and cell type annotation. Detailed information of the 5 cohorts is shown in Fig. 1a. Ideally, it is better that all of the samples in ICI scRNA-Seq cohorts were responders (R) and non-responders (NR). However, the Jerby SKCM cohort had treatment naive patients which likely include both potential responders and non-responsers. IntegrateData function in Seurat R package was performed to integrate 2 discovery cohorts.Fig. 1: Identifcation and validation of a ecotype associated with better immunotherapy response.a Summary of ICI cohorts with scRNA-seq data. b UMAP of 5 ICI datasets profiled in this work by scRNA-seq. Cell state abundance patterns in the discovery (c) and validation (d) cohorts, with cell states organized into 3 ecotypes and tumor samples (columns) ordered by the most abundant ecotype per sample. NR non-responders, R responders, TN treatment naive patients.Pan-cancer bulk-Seq datasetsThe processed RNA-Seq data of 32 tumor types were downloaded from the USCS XENA portal (https://xena.ucsc.edu/) as TPM units. For those with processed data, these data were acquired directly for subsequent analysis. In total, 11,001 samples were included in analysis.Pan-cancer scRNA-Seq datasetsTo obtain immunotherapy-responsive ecotype signature (IRE.Sig), we collected 34 processed scRNA-Seq datasets and annotations from the TISCH2 portal (http://tisch.comp-genomics.org/). The process of quality control, normalization, unsupervised clustering and cell type annotation were already completed by TISCH2 portal.ICI bulk-Seq cohorts12 ICI bulk-Seq cohorts were acquired for validating the predictive value of IRE.Sig, including 6 SKCM cohorts (Van 201512, Abril-Rodriguez 202013, Puch 202114, Hugo 201615, Gide 201916, Auslander 201817), 2 urothelial carcinoma (UC) cohorts (Synder 201718, Mariathasan 201819), 1 renal cell carcinoma (RCC) cohort (Braun 202020), 1 non-small cell lung cancer (NSCLC) cohort (Wang 202221), 1 gastric cancer (GC) cohort (Kim 201822), 1 glioblastoma multiform (GBM) cohort (Zhao 201923). The details of each cohorts are summarized in Supplementary Table 5. Patients in these cohorts were categorized into two groups based on their response status. Specifically, complete response (CR) and partial response (PR) were designated as responders (R), or stable disease (SD) and progressive disease (PD) were labeled as non-responders (NR).CRISPR screening dataA total of 5 CRISPR screens from publicly available datasets were used in this study24,25,26,27. The experimental design in these screens were shown in Fig. 5a which introduced immunotherapy into tumor/immune co-culture system. Details of these 5 CRISPR screens are given in Supplementary Table 7. We obtained these data from the supplementary information of corresponding publication. Several studies provided processed data, while other studies provided raw data. For those with processed data, these data were acquired directly for subsequent analysis; for those with raw data, MAGeCK pipeline (v0.5.9) was run with the default parameters to estimate the abundance of single guide RNAs (sgRNAs) targeting the corresponding genes. For those screens using mouse model, the R package biomaRt was utilized to map mouse genes to human genes. To quantify the effect of gene knockout on immunotherapy, we calculated the log-fold changes of sgRNAs counts between paired screens (Supplementary Table 8). Then z-score normalization was conducted for the log-fold changes to remove the batch effect among different CRISPR datasets. After that, genes were ranked by average z-score across 5 datasets (Fig. 5b). Ultimately, ICI enhancer genes were defined as those with log-fold change >0 and adj.P < 0.05 which represent the knockout of these genes were associated with ICI resistance. ICI suppressor genes were defined as those with log-fold change <0 and adj.P < 0.05 which represent the knockout of these genes were associated with ICI responsiveness.Ecotype discoveryEcotype discovery was performed with EcoTyper framework which is developed by Luca et al. to identify and characterize cell states and ecosystem subtypes from scRNA-Seq data or bulk RNA-Seq data10. EcoTyper implements a community detection algorithm that can discover robust collaborative network, termed ecosystem subtype or ecotype, in tissue samples. The discovery and recovery of cell states and ecotypes using EcoTyper framework need scRNA-Seq cohorts with the same cell types. Therefore, we choose the 5 cell types common to all 5 ICI scRNA-Seq cohorts, including B cells, CD4 T cells, CD8 T cells, macrophage, NK cells. Since the Sade et al. has largest sample number (n = 25) and the Caushi et al. Has highest cell number (n = 42,535), they were selected as discovery cohorts. To identify ecotype associated with better immunotherapy response, we picked out the immunotherapy responsive samples (n = 9) from discovery cohort. And then we applied EcoTyper scRNA-seq discovery framework with default parameters to identify cell states and ecotypes for these samples (n = 9). In total, we discovered 17 cell states while 3 of them were excluded from subsequent analysis due to not being assigned to any ecotypes. The remaining 14 cell states were assigned to 3 ecotypes (Supplementary Fig. 1).Ecotype recoveryEcoTyper leverages the characteristics of Non-negative Matrix Factorization (NMF) to apply the discovered cell states and ecotypes to external cohorts10. The recovery of cell states and ecotypes in scRNA-Seq cohorts and bulk RNA-Seq cohorts adhered to the guidance provided in EcoTyper documentation (https://github.com/digitalcytometry/ecotyper), specifically the sections “Tutorial 2: Recovery of Cell States and Ecotypes in User-Provided scRNA-seq Data” and “Tutorial 1: Recovery of Cell States and Ecotypes in User-Provided Bulk Data”. After recovering the 3 ecotypes in TCGA pan-cancer samples, the patients of each cancer type were divided into 3 groups: ecotype1, ecotype2 and ecotype3 by EcoTyper.Pathway analysis of the immunotherapy-responsive ecotype (ecotype2)To investigate the enriched signaling pathways of the immunotherapy-responsive ecotype (ecotype2), we downloaded 27 functional gene expression signatures, which represent cellular phenotypes and signaling pathways, from a previous study28. The 27 signature scores across TCGA pan-cancer samples (n = 11,001) were computed by ssGSEA method using python29. tSNE projections was used to test intrinsic differences in 32 various cancers (Fig. 2a). To remove tissue-type-specific effects, we utilized median-score normalization for 27 signature scores of all samples. List of genes used in 27 functional gene expression signatures were shown in Supplementary Table 1.Fig. 2: Description of the immunotherapy response ecotype.a tSNE analysis of the samples across TCGA tumors. The dots represent individual tumor samples, and the colors represent the datasets (32 cancer types) from TCGA. b Bar graphs depicting recovery of the 3 ecotypes in TCGA datasets. c Enrichment analysis of 29 functional gene expression signatures across 11,001. TCGA carcinomas. d Enrichment analysis of several tumor associated pathways between tumor tissues with ecotype2 and other ecotype (ecotype1,3) across 32 cancer types in TCGA cohorts, NES normalized enrichment score in the GSEA algorithm, FDR false discovery rates.Gene set variation analysis (GSVA) was leveraged to compute the enrichment of multiple tumor associated pathways in ecotype2. The R package GSVA v1.46.0 was used to conduct GSVA29. The gene sets used in GSVA were listed in Supplementary Table 2.Gene ontology (GO) enrichment analysis was performed to identify whether known biological processes are enriched in IRE.Sig. The clusterProfiler v4.9.3R package was employed to perform this analysis and visualization30.Gene Set Enrichment Analysis(GSEA) was applied to identify enriched biological function of B cell state6.Construction of machine learning model for predicting ICI outcomesTo assess the predictive capability of IRE.Sig for response to ICI based immunotherapies, 12 ICI RNA-Seq cohorts were systemically collected. First, 7 ICI cohorts were integrated into a merged dataset (n = 741), including Abril-Rodriguez SKCM (n = 57), Kim GC (n = 45), Van SKCM (n = 36), Synder UC (n = 25), Mariathasan UC (n = 348), Puch SKCM (n = 49), Braun RCC (n = 181). The other five ICI RNA-Seq cohorts were consolidated as an independent testing set (n = 177). We employed ComBat algorithm to remove batch effects31. Subsequently, we randomly split the merged dataset into training (n = 593, 80%) and validation sets (n = 148, 20%). The genes in IRE.Sig was used to construct a prediction model in the training set. We integrated 10 machine learning algorithms: supervised principal components (SuperPC), random forest (RSF), gradient boosting machine (GBM), CoxBoost, elastic network (Enet), survival support vector machine (Survival-SVM), Stepwise Cox (StepCox), least absolute shrinkage and selection operator (LASSO), partial least squares regression for Cox (plsRcox) and ridge regression32. Among them, CoxBoost, Stepwise Cox, RSF and LASSO possess the capability of dimensionality reduction and variable screening. Consequently, they were combined with other algorithms, resulting in a total of 66 machine-learning algorithm combinations. To ensure the reliability of the algorithm combination, a rigorous validation procedure was followed, which involved performing 10 fold cross validation five times and utilizing validation set to compare their performance. Further, we employed testing set to evaluate the predictive ability. Utilizing the average C-index derived from validation set and testing set, we ultimately picked the best model.Comparing IRE.Sig with other predictive gene signaturesTo evaluate the predictive power of IRE.Sig, we compared IRE.Sig with 6 previously published ICI response signatures (TIS.Sig33, C-ECM-up.Sig34, GEP.Sig35, NLRP3.Sig36, PD − L1.Sig37, IFNG.Sig35). All the codes and algorithmsutilized for the 6 signatures were obtained from relevant published studies, such as average gene expression for TIS.Sig, ssGSEA for C-ECM-up.Sig and so on. Detailed information of these signatures is provided in Supplementary Table 6.Potential drug search against the CMAP databaseTo identify promising drugs that enhance immunotherapies, we employed Connectivity Map (CMap) database (https://clue.io)38. The CMap database is a tool to match query signature with drug signatures in the CMap database. The search of query signature within CMap database was conducted as a “rapid search” in the querying segment. And in the search results, only drugs that have been approved or have successfully completed phase I and II clinical trials were selected.Analysis of pembrolizumab plus eribulinWe downloaded gene expression and clinical data of the NCT03051659 trial from the supplementary information of the study39. First, we recovered our cell states and ecotypes in this bulk RNA-Seq dataset. Each sample consisted of various cell states with different proportions. Then, we examined the relationships among 17 cell states and overall survival time using Spearman correlation test. Subsequently, the marker genes of switched memory B cells were obtained from literature40,41. The scRNA-Seq discovery cohort of Yost et al. was used to present the expression of these marker genes among different B cell states. Finally, all samples in NCT03051659 trial were assigned to B cell state6 high and low groups according to the median value of the abundance of B cell state6.We used independent validation cohort from Liu et al.42. In this trial, patients with breast cancer received Camrelizumab plus Eribulin therapy. Because pembrolizumab and camrelizumab are all immune checkpoint PD-1 inhibition, we used this cohort as independent validation cohort. Data acquisition and analysis are the same as above.Spatial transcriptomicsFor Fig. 6D, F, two human tumor sections with tertiary lymphoid structures (TLS) profiled by 10x Visium were analyzed43. We used TransferData function of Seurat R package to transfer scRNA-seq cell types in Fig. 1b to spatial scRNA-seq. Then, we applied EcoTyper to recover our cell states in the Visium array. Specifically, we set the highest abundant cell state in each spatially-barcoded spot to 1, while the rest to 0. To annotate tertiary lymphoid structures in the sections, the location information of TLS was obtained from original research43. Finally, to construct the plots in Fig. 6e, g, we computed the mean Euclidean distance for each B cell state spot from the spot to the nearest three spots of tertiary lymphoid structures.Ethics approvalSince the sequenced data generated from TCGA and GEO were publicly available, additional ethics committee approval was not necessary.Statistical analysisAll statistical analyses were performed using R v4.3.2 (https://www.r-project.org). The Kaplan–Meier curves and the log-rank test were utilized to assess the disparities in survival rates between the two groups. For correlation analysis, the Pearson correlation coefficient was conducted for data exhibiting a normal distribution, whereas the Spearman correlation coefficient was applied to data with a non-normal distribution. To analyze the differences between two groups of data, unpaired Student’s t test was used for normally distributed variables and the Mann–Whitney U-test was performed for non-normally distributed variables. For comparisons involving more than two groups, we utilized the one-way analysis of variance (ANOVA) as the parametric method and the Kruskal–Wallis’s test as the non-parametric method. The figure legends provide the statistical details, including the statistical test employed for each dataset. Unless otherwise mentioned, a p-value of 0.05 was deemed statistically significant.

Hot Topics

Related Articles