Dissecting transcriptome signals of anti-PD-1 response in lung adenocarcinoma

Study overviewThe overall study design is summarized in Fig. 1. Our patient cohort, a subset of LUAD immunotherapy cohort in the European Nucleotide Archive under accession number EGAS0000100646113, consists of 85 LUAD patients who received PD-(L)1 mAb monotherapy. We had 23 complete or partial responders (CR or PR), 15 stable disease (SD) patients, and 47 progressive disease (PD) patients (Supplementary Table S1). SD patients were regarded as non-responders (NRs) in this study. Molecular features for predicting ICB response were derived from exome sequencing data (TMB and mutations) and transcriptome sequencing data (cell type abundance and gene set activities). Patients were divided into the PD-L1 positive and negative groups mainly because TMB turned out to be a good classifier for the PD-L1 negative patients. For the PD-L1 positive group, we built transcriptome-based prediction models that are essentially an ensemble of 100 XGBoost (XGB) machines where each machine was trained with a pseudo-randomly selected subset (80%) of patients. Lastly, we evaluated the performance of our predictors and deduced features that provided insights into understanding molecular mechanisms and developing biomarkers for immunotherapy of LUAD. Patient characteristics are summarized in the Supplementary Figure S1.Fig. 1Overview of predicting response to anti-PD(L)1 in 85 lung adenocarcinoma patients.Feature space for predicting the ICB responseTumor mutation burden (TMB) was estimated from exome sequencing data for each patient. Transcriptome data was used as input to the machine learning model in two ways – gene expression values themselves or inferred properties such cell type composition and pathway activities (Fig. 2a). Cell type composition was deduced by MCP-counter for 8 immune cell types (CD3+ T cells, CD8+ T cells, cytotoxic lymphocytes, NK cells, B lymphocytes, monocytic lineage cells, myeloid dendritic cells, and neutrophils) as well as endothelial cells and fibroblasts. ESTIMATE was used to calculate the stromal and immune cell portions. Patient-specific pathway activity was obtained by ssGSEA for a number of gene sets in MSigDB (v7.2). The ‘HCP’ gene sets were defined to include 50 hallmark gene sets (H) and 2922 curated canonical pathways (CP). The ‘emerging’ gene sets, defined to include 3368 gene sets from chemical and genetic perturbations (CGP), 189 oncogenic signature gene sets, and 5219 immunologic signature gene sets, represent the collection of differentially expressed genes in various experimental environments. Although being poorly characterized and annotated, they can serve as a gold mine to identify novel targets or relationships.Fig. 2Ensemble machine learning model for analyzing transcriptome data. (a) Three types of input variables. Number of input variables for each type is indicated in the parenthesis. (b) Pseudo-random patient selection of training and testing data in 8:2. The responder: non-responder ratio of each XGB machine was fixed to be the same as that of total patients. Test scores were evaluated for each XGB machine and the median value for each patient was used as the final prediction score. (c) Heat map of test scores. Each row and column represent XGB machine and patient, respectively. Gray rectangles indicate the training data for each XGB machine. Note that the test scores show considerable variation across 100 XGB machines, manifesting that each XGB machine reflects the heterogeneous nature of patient subgroups adequately.Computational model for predicting patient response to ICBsIn building the transcriptome-based prediction models with thousands of genes or gene sets, we adopted the XGBoost algorithm for machine learning because of its speed, accuracy, and support for automatic feature selection and feature importance analysis. Considering the complexity and heterogeneity of tumors, patient number of 85 is far short to build robust and accurate prediction models for ICB response. To alleviate the instability of a single XGB machine and to increase the prediction accuracy, we built an ensemble of 100 XGBoost machines (Fig. 2b). For each XGB machine, patients were randomly divided into 80% training and 20% test sets, keeping the responder and non-responder ratio the same as the total patients. Input variables were prefiltered according to the single-variable AUC score to feed manageable number of informative features into the XGB training process. Once the training step was finished for all XGB machines, we calculated the response score of each patient by amassing XGB machines that did not use the patient of interest in the training process and taking the median score of those XGB machines as the final response score. The heatmap of test scores showed that our ensemble model of XGB machines reflected properly the heterogeneous and diverse characteristics of patient subgroups used in the training process (Fig. 2c).Patient subgroup analysis by PD-L1 expressionWe examined the performance of our composite model for diverse subsets of patients (PD-L1 positives vs. PD-L1 negatives, smokers vs. nonsmokers, and TMB-high vs. TMB-low patients), using various features that included TMB, gene expression, cell type composition, and gene set activities of known expression signatures as well as thousands of gene sets in the MSigDB collection (Supplementary Table 2). The best performance was obtained when patients were divided into the PD-L1 positive (n = 65) and negative groups (n = 20). The responder ratio was slightly larger in the PD-L1 positive subgroup, consistent with the previous knowledge14 (Fig. 3a). Other classifications by smoking status or TMB-high/low groups did not show satisfactory performance.Fig. 3Patient characteristics of TMB and immune cells. (a) Response rate in the PD-L1 positive and negative patients. (b) Tumor mutation burden (TMB) comparison of responders (R) and non-responders (NR) in the PD-L1 positive and negative patients. (c–d) Receiver Operating Characteristic (ROC) curve for response prediction of TMB in the Samsung Medical Center (SMC) (c) and MSKCC (d) cohorts. Numbers for PD-L1 positive and negative patients indicate the Area Under Curve (AUC) values. (e–f) Abundance of immune cell types inferred from RNA-seq data using MCP-counter in the PD-L1 positive (e) and negative (f) patients.TMB was higher for the responder group in the PD-L1 negative patients (p = 0.072), whereas it showed no difference between the responder and nonresponder (NR) groups in the PD-L1 positive patients (Fig. 3b). In our predictive modeling, TMB turned out to be the best classifier for the PD-L1 negative patients with the area under the ROC curve (AUC) = 0.8, whereas TMB showed marginal predictive power (AUC = 0.62) for the PD-L1 positive patients (Fig. 3c). This observation was validated using an independent cohort of the MSKCC LUAD patients (n = 57) (Fig. 3d). Thus, we concluded that TMB is a decent classifier of ICB response for the PD-L1 negative LUAD patients.We also examined the abundance of immune cell types, inferred from transcriptome data by MCP-counter, for the PD-L1 positive and negative patients. T cells, CD8 T cells, cytotoxic lymphocytes, and B lineage cells were significantly more abundant in the PD-L1 positive patients (Fig. 3e), whereas their amount were rather similar in the PD-L1 negative patients (Fig. 3f). Other immune cell types showed a similar but weaker trend (Supplementary Fig. 2). This strongly suggested that the transcriptome-based model, from which the immune signatures were derived, would be successful only for the PD-L1 positive patients. In fact, we achieved the best performance of AUC = 0.93 for the PD-L1 positive patients whose details are given in the following sections.Transcriptome-based ensemble prediction models for the PD-L1 positive patientsIn an effort to deduce key features (i.e. genes, gene sets or pathways) of ICB response, we tested the performance of our ensemble prediction models for diverse subset of transcriptome signatures. Among the known gene signatures, only the IMPRES15 signature showed a better predictive power (AUC = 0.76) than TMB (AUC = 0.62), and all others (TIDE12, IPRES9, and F-TBRS11) showed poor performance similar to random prediction (Fig. 4a). This implies that LUAD is much different, in the immune perspective at least, from other tumor types such as melanoma or urothelial cancer where the known signatures were derived previously.Fig. 4ROC curves and important features from transcriptome-based models. a-c) ROC curves for individual known signatures (a), gene expression (b), and cell type abundance and gene sets (c). In the gene expression, Top numbers indicate the number of features used after prefiltering. (d–h) Importance scores from our ensemble XGBoost machines using individual gene expression (d), cell type abundance (e), known signatures from literature (f), emerging gene sets (g), and all transcriptome-derived variables (i.e. e + f + g) (h).We then explored the performance of the transcriptome-based XGB ensemble models in the two prediction modes of gene expression and signatures. In the gene expression mode where the expression values from RNA-seq were used as input features, the ensemble model showed variable performance of 0.85 < AUC < 0.93 depending on the prefiltering stringency (Fig. 4b). We obtained the best AUC = 0.93 when three genes for each XGB machine were used after prefiltering step, where 36 genes in total were used in 100 XGB machines. This test demonstrated that our ensemble approach was accurate as well as robust and that the prefiltering step enhanced the performance substantially.Next, we tested various features derived from the transcriptome data where the number of input features after prefiltering was fixed to 5 after trying a few alternatives (Fig. 4c). Cell type abundance achieved the lowest AUC = 0.78 likely due to the limited number of cell types and accuracy in deducing their abundance from the transcriptome data. Predictive models using 49 known ICB-related signatures from the previous studies or the HCP gene sets yielded the AUC = 0.87. Using ~ 8400 ‘emerging’ gene sets from the MSigDB (designated as MSigDB_emerging) gave AUC = 0.92. The best performance (AUC = 0.93) was obtained when we used all gene sets (i.e. the cell type abundance and the known ICB-related signatures added to the emerging gene sets).One of the major advantages of XGB machine is its automatic feature selection and feature importance estimation. Each XGB machine reported 4–5 important features and their contribution scores. The importance scores were highly variable across 100 XGB machines, implying that our ensemble approach sampled the heterogeneous characteristics of different patients groups adequately (Supplementary Fig. 3). In order to deduce key features of predicting the ICB response, the cumulative importance scores from 100 XGB machines were shown in the bar plots (Fig. 4d–g). Among the gene expression input features, IFNG and CD8B were the top 2 genes followed by KCNJ11, CD8A, CXCL9, SLC29A2, GZMA, and so on (Fig. 4d). The cell type abundance features showed the importance score in the order of CD8 T cells, T cells, Cytotoxic lymphocytes, NK cells, B lineage, and monocytic lineage (Fig. 4e). Among the known ICB-related signatures, CD8 T effector and ‘T cell inflamed GEP’ signatures were the top 2 features, followed by tumor microenvironment and immune checkpoint signatures (Fig. 4f). All these signatures were closely related to the immune activity including T cells and tumor microenvironments. Among the novel ‘emerging’ gene sets, we obtained 10 important features, most of which were poorly characterized (Fig. 4g). Dominant contribution came from the ‘SHIN B cell lymphoma cluster 1’ that consisted of 13 genes including IFNG, CXCL9, IL10, and CCL3. When we analyzed the full sets, the ‘SHIN B cell lymphoma cluster 1’ was still the dominant contributor followed by the ‘T cell inflamed GEP’ and ‘CD8 T effector’ signatures from the known signatures and ‘CD8 T cells’ from the cell type abundance (Fig. 4h). All others except the immune checkpoint gene set were from the MSigDB_emerging gene sets.Prediction outcome for the PD-L1 positive patientsThe prediction score from the full set predictor (AUC = 0.93) is shown as the waterfall plot for the PD-L1 positive patients (Fig. 5a). Our ensemble model predicted all 19 responders correctly, achieving the perfect sensitivity in predicting responders. All patients predicted to be non-responders were correct except nine case (32 out of 41). In the clinical point-of-view, our prediction model has extremely desirable characteristics – rescuing all responders and filtering out majority of non-responders successfully. Of note, we suggest an alternative treatment method for unexpected nonresponders (see below).Fig. 5Prediction outcome from the transcriptome-based model using all gene sets. a) Waterfall plot of the response score. The responder/non-responder cutoff was selected to maximize the prediction accuracy. (b) Clinical information and prediction scores from four previously known methods (TIDE, IPRES, IMPRES, and F-TBRS). c) Gene set activities of each patient in the order of the importance scores on the left bar graph. (d) Expression level of important genes obtained from the expression-based analysis. Blue color in the importance score indicate the features showing negative correlation with typical immune features. (e–g) Comparison of true positives (TP), false positives (FP indicated with asterisks), and true negatives (TN). CTLA4 expression (e), Th-17 related gene set activity (f), and PRKCQ expression (g) are shown in violin plots.We also examined the performance of previously known markers for each patient (Fig. 5b). PD-L1 expression was significantly higher (p = 0.0002) in the responder group (Supplementary Fig. 4a) and TMB was slightly higher in the responder group with no statistical significance (Fig. 3b). Progression-free survival (PFS) was better in the responder group as well (Supplementary Fig. 4b). Driver mutations, sex, smoking showed no difference between the responder and non-responder groups (Supplementary Fig. 1). Other expression-based signatures (TIDE, IPRES, IMPRES, and F-TBRS) showed virtually no predictive power for our patients.The heatmap of individual activity scores (Fig. 5c) or gene expression (Fig. 5d) of important features showed that those features could be classified into two groups. Most features showing elevated activity or expression in the responder group were closely related to T cell activation and their pairwise correlations were extremely high. Of note, we observed three features that showed the opposite trend—‘GSE15659 CD45RA negative CD4 T cell vs. Resting Treg DN’ activity and KCNJ11 and DDAH1 expression. Function of these features are poorly characterized, but they might play roles in negative regulation of immune activity, thus impeding successful treatment of ICBs.Importantly, we noticed that several patients with high immune activities according to mRNA expression data were falsely predicted to respond. We systematically searched for differential factors between the expected responders (true positives) and unexpected nonresponders (false positives). Differential analysis of gene expression and gene set activities yielded 27 gene sets p < 0.01 and 51 genes at p < 0.005 (Supplementary Fig. 5), most of which were difficult to interpret their biological meanings. Alternatively, we examined gene expression of immune checkpoint genes known to play roles in NSCLC16. Among 7 receptors and 14 ligands, only the CTLA4 expression showed a noticeable difference (p = 0.054) (Fig. 5e). This elevated CTLA4 expression might be the cause of anti-PD-1 treatment failure, which suggested that combination treatment of anti-PD-1 and anti-CTLA4 could be successful for those patients. Another factor that showed a similar trend with the CTLA4 expression was the gene set activity of ‘GO:2000318_Positive regulation of T-helper 17 type immune response’, which included PRKCQ (protein kinase C theta) with significant difference among our patient subgroups (Fig. 5f,g).Validation of prediction models using independent datasetsPerformance of machine learning models usually needs to be validated with independent datasets due to the overfitting possibility. We identified an NSCLC study of 16 patients who received nivolumab (anti-PD-1) treatment at the Yonsei Medical School (Seoul, Korea) and whose transcriptome sequencing data were available in public (GSE126044). Our prediction model showed an excellent performance of AUC = 0.96 when the emerging or all gene sets were used (Supplementary Fig. 6). Our ensemble prediction models with gene expression, cell type abundance, or known gene signatures were equally good for this data set. However, only the TIDE score among the known prediction signatures showed a decent performance (AUC = 0.84).Another relevant datasets are from two randomized clinical trials for atezolizumab (anti-PD-L1) as a second-line treatment of NSCLC17. The OAK and POPLAR datasets contained RNA-seq data from 241 and 55 nonsquamous tumor samples, respectively. Our ensemble models yielded a decent but limited performance with the best AUC = 0.68 and 0.74 in two datasets (Supplementary Table 3 & Supplementary Fig. 7). We reasoned that the cause of limited performance could be the inclusion of PD-L1 negative patients. Since the OAK dataset provided PD-L1 expression level as well, we examined the PD-L1 + subgroup (n = 56), which did not improve the performance. We further defined PD-L1 + patients into strong plosive (n = 21) and weak positive (n = 35) subgroups. Our models achieved the best performance in the PD-L1 strong positive patient subgroup with AUC = 0.87. Of note, the performance with gene set activities was better and more robust than gene expression. Reasons for the poor performance of PD-L1 weak positive patients are not clear, but the immune activities were clearly suppressed in this subgroup in spite of PD-L1 expression albeit low.Overall, our validation test with independent datasets confirmed that transcriptome-based ensemble models work best for the PD-L1 positive patients and that the geneset-based predictors are more robust than expression-based models. It also showed that the prediction accuracy of our predictive models was much larger in the Yonsei NSCLC cohort than in the OAK/POPLAR cohorts. The worse accuracy in the OAK/POPLAR datasets can be ascribed to the difference in ethnicity and ICBs (nivolumab vs atezolizumab; anti-PD-1 vs. anti-PD-L1).

Hot Topics

Related Articles