Transcriptomic classification of diffuse large B-cell lymphoma identifies a high-risk activated B-cell-like subpopulation with targetable MYC dysregulation

Ethics StatementOur research methodology complies with all relevant ethical regulations, including FDA approval for clinical trial protocols and patient consent, and Mayo Clinic IRB approval for the MER observational study. Animal studies were approved by the internal BMS Institutional Animal Care and Use Committee. Tumor size/burden was not used as an inclusion/exclusion criterion for the cohorts analyzed in this work.DatasetsThe Discovery cohort of 1208 patients was a combination of the ROBUST clinical trial screening population (NCT02285062, n = 1016)5 which included all DLBCLs irrespective of treatment or COO, and a set of commercially sourced newly diagnosed patient samples (n = 192). The replication cohorts included MER (Molecular Epidemiology Resource), an observational epidemiology cohort study of prospectively enrolled newly diagnosed DLBCL patients at Mayo Clinic (Rochester, MN) and the University of Iowa (Iowa City, IA) (n = 343)22 as part of Lymphoma Genome Project (LGP). Patient samples in these studies were collected with informed consent for research use and were approved by the Institutional Review Board at Mayo Clinic (for MER) and each study site’s Institutional Review Board (for ROBUST) in accordance with the Declaration of Helsinki. Other replication cohorts were obtained from public sources include REMoDL-B; the REMoDL-B clinical trial cohort (n = 928, n = 469 on R-CHOP arm;23 the GOYA clinical trial (n = 271, R-CHOP); and the Reddy cohort (n = 442, R-CHOP). All cohorts were unblinded with respect to treatment during analysis in order to assess differential effects by treatment arm.SubLymE was trained entirely on DLBCL NOS patients without HGBL. Samples from the MER cohort generally had high tumor cellularity by pathology review, with 83% of samples having >50% purity, and 57% having >70% purity. Unless otherwise specified, analysis of clinical outcome was conducted only in R-CHOP-treated patients, or in the MER dataset, R-CHOP-like-treated patients, which included a minority of patients treated with MR-CHOP (9% of the RCHOP-like treated cohort), R-EPOCH (7%), Other immunochemotherapy (3%), ER-CHOP (1%), RAD-RCHOP (1%), and RCHOP/Zevalin (1%). For ROBUST, clinical outcome was available only for ABC patients. Demographics and clinical characteristics of the discovery and replication cohorts are summarized in Table 1.RNAseq and DNA sequencingROBUST, MER, and the Commercial samples were sequenced at Expression Analysis, Inc (Durham, HC, USA) according to standard protocols. The Allprep DNA/RNA FFPE kit was used to simultaneously purify genomic DNA and total RNA from formalin-fixed, paraffin embedded (FFPE) tissue sections. RNAseq libraries (75PE, 50M) were constructed using Illumina TruSeq RNA Access method.Whole exome sequencing (WES) libraries (200x for tumor, 100x for germline control) were created for MER and Commercial cohorts using the Agilent SureSelect Human All Exon v6. Whole genome (WGS) libraries (60X for tumor, 30X for germline) were prepared for ROBUST samples using the Swift Accel-NGS 2S Plus DNA library kit (#21024 or 21096, Swift) with modifications to the Ampure Bead cleanup steps in the procedure.Sequencing data QC and processingRNAseq samples met minimum quality thresholds in terms of library size, duplication rate, and alignment. Cohort-level QC was performed by removing samples failing multiple outlier detection metrics. After an initial round of unsupervised clustering, a small, highly distinct cluster of borderline quality samples were removed for further analysis.Sequencing data were processed through an internal cloud-based platform which runs the Sentieon implementation of the GATK best practices, using BWA-mem for alignment and the Sentieon implementation of Mutect2 (tnhaplotyper). Variants were annotated with SnpEff using the dbnsfp database. For WGS, data copy number aberrations were called using Battenberg54 and structural variants were called by Manta. For WES data copy number aberrations were called using Sclust55. Structural variants were not called for the WES data. RNA-seq data was aligned with STAR aligner and quantified with salmon.RNAseq data normalizationThe MER, ROBUST, GOYA, and Reddy cohorts were reference normalized to a subset of the Discovery data, referred to as the commercial samples, which was fixed as a reference population. To do so, a sample-wise scaling was applied to TPM RNAseq data using the mean of five DLBCL-specific housekeeping genes used in the Nanostring Lymph2Cx assay (ISY1, R3HDM1, TRIM56, UBXN4, and WDR55). After sample-level scaling, each gene was standardized to the reference population by subtracting the reference mean and dividing by the reference standard deviation. Ultimately, the reference fixes all genes to have a mean of 0 and a variance of 1, while all other datasets were transformed to be a gene-wise Z-scoring with respect to the reference population.Because the REMoDL-B dataset was Illumina BeadArray and not RNAseq data, we applied a self-standardization approach that used the housekeeping scaling step as described above, followed by a gene-wise scaling that explicitly set each gene to have mean 0 and variance 1. This self-standardization approach is suitable for large, representative patient cohorts as applied herein, but could yield unexpected results for a small or non-representative cohort.The reference normalization approach puts all of the data in a unified numerical space with comparable expression levels and allows for portable models that can be trained in one dataset and applied directly to any other cohort without the need for re-parameterization. It also allows for the normalization of even a single sample, with no requirement for a representative batch, and furthermore, normalized data is never affected by the introduction of new samples. Existing classifiers such as the Reddy COO classifier20 and TME26 classifier14 were adapted to the normalized gene expression space by re-weighting decision thresholds.In practice, no significant batch effects by dataset were observed in the normalized combined datasets of all cohorts (Supplementary Fig. 23). We also validated that the normalization approach left relevant biological signals intact by comparing gene expression classifiers/signatures applied to the normalized data against orthogonal, non-RNAseq data. These included comparing the Reddy COO classification against the Hans IHC-based method, the double-hit signature classifier12 against FISH calls, and the cell type abundance GSVA scores against cell type marker density from IHC and MIBI. All features derived from the normalized RNAseq data were highly concordant with their corresponding non-RNAseq features.Statistical AnalysisDifferentially expressed significance was calculated by the t-test or Wilcoxon rank-sum test where appropriate. Multiple hypothesis correction was performed with the Benjamini-Hochburg false discovery rate method where noted. All statistical tests were two-tailed. Gene signature scores were calculated using the “ssgsea” method from the GSVA package. Correlation values are calculated as Spearman’s rho. General tests of categorical association are chi-squared (or Fisher where appropriate). The log-rank test is used to compute p-values for survival data. All analysis was performed in R version 4.2.1. Various statistical tests from the stats v3.5.3 R (54) CRAN package were used to check significance of the association of the subpopulations to different variables. Fisher’s exact test for binary data (mutations/CNVs), t-test for continuous variables (GE pathway scores), and global log-rank test for outcome (EFS/OS). Boxplot figures represent the lower, middle, and upper quartiles as the lower box edge, center line, and upper box edge respectively, with whiskers extending to up to 1.5 times the interquartile range. Error bars represent 95% confidence intervals. Statistical analysis was not stratified by (self-reported) sex, but post-hoc analysis did not find significant associations between our findings and sex.Cell type signaturesCell type specific signatures were generated from the LM22 matrix which describes 22 functionally defined leukocyte types56. This signature matrix was augmented and tailored to DLBCL by adding an additional cell type representing malignant DLBCL B cells, and was trained on purified cell populations57. Benchmarking deconvolution results using the augmented signature matrix identified high correlation between the abundance of CD20+ cells as measured by IHC. The addition of the DLBCL-specific cell type also significantly reduced the estimated abundance of the unclassifiable “Other” cell type population, which previously accounted for up to 40% of the estimated abundance.DLBCL classifiersThe Reddy method20 was implemented to call COO from RNAseq data and was shown to have excellent concordance with the gold standard Nanostring method.The DHIT/DZsig method11,12 was adapted to the normalized RNAseq space by using published gene-level DHITsig coefficients to compute a linear DHITsig score. A threshold for this score was selected in the MER data as the highest threshold that captured all cases known to be double-hit by FISH, which yielded a classifier matching the expected prevalence of DHITsig + cases. This signature was further extended to represent the DZsig classification by adding a third “indeterminate” group with approximately 10% prevalence between the positive and negative groups.For Ecotyper and Bcell state, we applied the online tool at https://ecotyper.stanford.edu/lymphoma/.For LME, we applied BostonGene’s package on Github (https://github.com/BostonGene/LME).For the HMRN classification, we used a total of 115 genetic features which occurred in at least 1% of patients. These features defined binary variables denoting 104 mutations, 4 amplifications (BCL2, MALT1, REL, XPO1), and 7 markers (TP53, TNFRSF14, TNFAIP3, RB1, PRDM1, CDKN2A, CD58) indicating presence of either a homozygous deletion or mutation. We did not have data on HIST1H2BD_S mutation and DDX3X marker.We applied the LymphGen method using the v1.0 tool available online at https://llmpp.nih.gov/lymphgen/index.php. We followed the documentation including copy number and mutation data where possible, but had a small number of missing input features. We omitted one CNV feature (LOC107986596), and two mutation features (TMEM121 and LOC107986596), out of 104 total features in each. We included synonymous, truncating, and MYD88 L265P annotations for the mutation data. We did not include arm-level copy number data.To apply a version of the LymphProg classifier to our normalized data, we applied a weighted sum of classifier genes according to the coefficients provided in the literature, and applied the provided score threshold of −0.521, which yielded class prevalences in line with expectations.The MHG classifier was examined in the REMoDL-B dataset, for which the MHG classifications were publicly available. Although differential signatures of MHG versus other COO subtypes were identified in the literature, we could not find an implementation of the classifier itself to apply to the MER data.Unsupervised clusteringThe clustering input data consisted of normalized RNAseq gene expression features plus feature scores derived from the gene expression data, agnostic to clinical and outcome data. Expression features were restricted to genes in the top quartile of mean expression and variance in TPM space. The derived features consisted of GSVA signature scores19 including the MSigDB Hallmark and C1 pathways, as well as cell type signatures57. While the derived features did share some information content with the raw gene expression features, most features were derived using raw features not selected for clustering. Some presence of redundant signal across the input data types is expected, and indeed relied upon by the latent variable model used.To better understand the unique signal carried by each input matrix, we calculated the sample-pairwise correlation matrix in each of the four individual feature spaces. We then calculated the Correlation Matrix Distance (CMD) between the correlation matrix in raw gene expression space and the other three correlation matrices, in order to assess whether the correlation structure among patients was repeated across the data types, or if patients showed different patterns of similarity in different feature spaces. This was compared against a null distribution of CMD values when using an equivalent number of raw gene expression features, rather than the engineered features. We found in each case that the derived features showed significantly greater CMD than selected features, indicating that the derived features allow different views of patient similarity not easily captured in raw gene expression space. In short, the engineered signature features provide distinct views of the patients rather than simply representing redundant signal already captured in the gene expression features.The iClusterPlus method18 was applied to the subset data for multiple choices of K from 2 to 12. This procedure was repeated 200 times, with cluster assignments recorded in each case. The 200 runs were then summarized using a sample-pairwise co-clustering frequency matrix, which was computed as the number of times two samples were assigned to the same cluster, divided by the number of times two samples appeared in the same run. This sample-pairwise matrix was then clustered using hierarchical clustering using the Ward method and 1 minus the co-cluster frequency as the distance metric, in order to obtain one final clustering per choice of K. One value of K was selected by evaluating cluster stability measures including the silhouette metric and co-cluster frequency distribution, with 8 being the being the best-fitting number of clusters (Supplementary Fig. 24).SubLymE classifierA generalized linear model (GLM) classifier was trained on the Discovery data using the consensus cluster labels (with A8 samples removed) as the gold standard. Several choices of the elastic net mixing parameter alpha were tested, with the goal of maximizing predictive performance and minimizing model complexity. The regularization parameter lambda was optimized using cross-fold validation and was selected as the minimum value that yielded a misclassification rate within one standard error of the minimum32. Cross-validation results indicated good performance of the classifier training methodology, with 93% accuracy on the training cohort, as well as 81-98% sensitivity/positive predictive value within each cluster individually. Since the training data was normalized to a reference population, the classifier is directly applicable to other datasets normalized to this space, with no need to re-train parameters or thresholds. The SubLymE classifier may be applied to any FFPE RNAseq sample normalized in the same way and will produce a class label for each case (i.e., no case will be unclassified). Although a class label will be produced for any input data, application to poor quality data will produce poor quality results. When applying SubLymE to the previously omitted A8 samples in the Discovery data, most were classified as A4, which was the least reproducible cluster in terms of pathway dysregulation and prevalence.MHC I/II expressionDifferential surface expression of MHC I and II molecules on DLBCL cell lines treated with DMSO or lenalidomide (1 μM) for 3 days was assessed via flow cytometry using anti-human HLA-ABC and anti-human HLA-DR/DP/DQ antibodies (clone G46-2.6 and clone Tu39, respectively, BD Biosciences). No commonly misidentified cell lines were used.Antigen-specific CD8 T-cell cytotoxicityDLBCL cell lines selected for the antigen-specific CD8 T cell cytotoxicity were obtained from American Type Culture Collection (ATCC), or Leibniz Institute-DSMZ (Braunschweig, Germany), and maintained in RPMI medium [RPMI-1640 with 10% fetal bovine serum (FBS), supplemented with 2 mmol/L L-glutamine, 1% penicillin/streptomycin and 1 mmol/L sodium pyruvate]. HLA-A2 surface expression was accessed via flow cytometry using antibodies (clone BB7.2, BD Biosciences). HLA-A2 positive, EBV-specific primary human CD8 T cells were obtained from Charles River (donor 213, CMV negative). HLA-A*02:01 EBV BMLF1 peptide was obtained from MBL International. To conduct the cytotoxicity assay, target DLBCL cell lines were labeled with 400 nM of CFSE (Thermo Fisher Scientific) and treated with either DMSO or lenalidomide (1 µM) for 3 days. On the day of co-culture, target cells were labeled with or without 1 ng/ml of EBV peptides for 30 min. EBV-specific effector CD8 T cells were thawed and counted. Co-culture of CD8 T cells with target cells was set up in 96 well round bottom plates at an effector:target ratio of 1:1 overnight. On the following day, supernatants from co-culture samples were collected for Granzyme B secretion via ELISA (Biolegend), while cells were stained with Annexin V and TO-PRO-3 (Thermo Fisher Scientific) and analyzed by flow cytometry. CFSE+ target cells were gated on, followed by gating on Annexin V-, TO-PRO-3- double negative cells as viable target cells. Absolute cell count was obtained using CountBright Absolute Counting Beads (Thermo Fisher Scientific).Multiplex immunofluorescence in mouse tissueMouse spleen samples were fixed in formalin 4% for 72 h and embedded into paraffin blocks. Sequential 4-µm-thick sections were placed on Super Frost slides for standard Hematoxylin–Eosin, IHC and IF staining. Chromogen staining was performed for immunohistochemistry validation of specificity of individual antibodies. Antibodies against B220 (dilution 1:50, PTPRC/1783R, NBP-54578, Novus Biologics), CD8 (dilution 1:200, D4W2Z, 98941S, Cell Signaling Technology), CD4 (dilution 1:50, D7D2Z, 27520, Cell Signaling Technology), KI67 (dilution 1:100, SP6, ab231172, Abcam) were validated as a chromogen and monoplex immunofluorescent staining. IHC and mIF assays were performed on using a VENTANA DISCOVERY ULTRA automated staining instrument, using VENTANA reagents, according to the manufacturer’s instructions. Slides were de-paraffinized using EZ Prep solution (950–102, Roche) and epitope retrieval was accomplished with CC1 solution (950–224, Roche). Slides were incubated with rabbit 1°Abs, diluted in antibody diluent (760-219, Roche), followed by secondary anti Rabbit-HQ (OC 07017812001, Ventana) for chromogen staining or Rabbit-Polymer (non-diluted, ARR1001KT, Ventana) for immunofluorescent staining. For chromogen detection, slides were developed using the Ventana Discovery ChromoMap DAB kit (760-159, Ventana) according to the manufacturer’s instructions. Slides were then counterstained with hematoxylin (790-2208, Ventana), followed by Bluing reagent (760-2037, Ventana). For fluorescent staining and detection, slides were stained with Opal Polaris 7‐Color Automation IHC Kit (NEL871001KT, Akoya Biosciences). We used Opal 480 (1:100), Opal 520 (1:100), Opal 570 (1:100), Opal 620 (1:100), and Opal 690 (dilution 1:50) and Opal 780 (1:25). Fluorescent singleplex staining was performed for each biomarker and compared to the appropriate chromogenic singleplex to assess staining performance. After the staining was completed, the slide was mounted by using Prolong Diamond Antifade Reagent (P36970, Invitrogen). Multispectral images were acquired on the PhenoImager HT system (Vectra Polaris, Akoya Bioscience), analyzed with InForm software (Vectra Polaris, Akoya Bioscience), and the quality report and marker quantification were generated with the phenoptrReports package.shRNA knockdownDoxycycline (Dox)-inducible shRNA constructs were generated by Cellecta (Mountain View, CA, USA) using pRSITEP-U6Tet-(sh)-EF1-TetRep-2A-Puro plasmid. Briefly, 293FT cells were co-transfected with lentiviral packaging plasmid mix (Cellecta, Cat# CPCP-K2A) and pRSITEP-shRNA constructs. Viral particle was collected 48 and 72 hrs after transfection and then concentrated with Lenti-X Concentrator (Takara Bio USA). For infections, cells were incubated overnight with concentrated viral supernatants in the presence of 8 µg/ml polybrene. Cells were then washed to remove polybrene the next day. At 48 h post-infection, cells were selected with puromycin (2 µg/ml) for more 1 week before experiments. For knockdown experiments, cells were seeded at 1×105 cells/ml and induced with 20 ng/ml of Dox or DMSO vehicle control. On day 3 of Dox induction, cells were re-seeded in medium with refreshed Dox or DMSO. For proliferation assay, 15,000 cells were seeded in 96 well U-bottom plate followed by measuring cell viability with CellTiter-Glo (Promega) for 5 consecutive days. The remaining cells were seeded at 5×105 cells/ml and incubated for additional 2 days. Cells were then harvested for Western blot analysis. The shRNA target sequences were: shNT: CAACAAGATGAAGAGCACCAA; shTCF4-13: GAGACTGAACGGCAATCTTTC; shTCF4-14: CACGAAATCTTCGGAGGACAA. shMYC-40: CAGTTGAAACACAAACTTGAA; shMYC-42: CCTGAGACAGATCAGCAACAAWestern blottingCells were lysed with cell lysis buffer (50 mM TrisHCl pH7.4, 250 mM NaCl, 0.5% Triton X100, 10% glycerol) supplemented with Halt protease/phosphatase inhibitors (Thermo Fisher Scientific, 78443). Cell lysates were subjected to sonication to breakdown nuclei and reduce viscosity caused by released genomic DNA. The protein concentration was measured by a Bradford Protein Assay (Bio-Rad). Samples were diluted to equal concentration followed by with NuPAGE LDS sample buffer and 2-Mercaptoethanol (1.25% final concentration) before boiling at 95 °C for 5 min. Whole-cell lysates were resolved on NuPAG 4-12% Bis-Tris Midi Protein Gels (Invitrogen) and transferred onto nitrocellulose membranes, which were then subjected to blocking in Intercept® (TBS) blocking buffer (LI-COR). Proteins of interest were detected by incubation with the primary antibodies listed below at 4 °C overnight. After washing with 1XTBST, the membrane was incubated with either IRDye 800CW goat anti-rabbit IgG or IRDye 680LT goat anti-mouse IgG secondary antibody (1:10,000) at RT for 1 h. After washing with 1XTBST, bends were visualized by Odyssey Imaging System (LI-COR). Antibody information: TCF4 (Proteintech, 22337-1-AP, 1:1,000), MYC (abcam, ab32072, 1:1,000), GAPDH (Cell Signaling Technology, 2118 L, 1:5,000).Capillary Western blot analysesCapillary Western analyses (Supplementary Fig. 16A, B, and F) were performed using the ProteinSimple Jess System (Catalog # 004-650). Protein samples (1 µg/well) were loaded 12-230 kDa separation capillary cartridges (ProteinSimple, SM-W004) and processed according to the manufacture instruction. The antibodies used were identical to those used in the traditional Western blot except for dilution factors. The dilution factors: TCF4 (1: 40), MYC (1:40), and GPADH (1:1,000). Results were analyzed by Compass for SW software (Version 6.2.0).Multiplexed ion beam imaging (MIBI)MIBI is performed by staining tissue with a panel of metal-labeled antibodies and then imaging the tissue using time-of-flight secondary ion mass spectrometry (ToF-SIMS). Tumor biopsy slides were stained using standard protocol at Ionpath Inc. (Menlo Park, CA). Expression of markers were quantified at the single cell level using scaled (arcsinh transformed) summed intensities. Thresholds for each marker were initially determined based on the histogram distribution of intensity values across all field of views (FOVs). Samples used in MIBI analysis were taken from the Commercial cohort.Immunohistochemistry and pathology scoringImmunohistochemistry (IHC) assay was performed using Bond Polymer Refine Detection Kit on Leica Bond slide autostainer (Leica Microsystems Inc., Buffalo Grove, IL). Briefly, formalin-fixed paraffin-embedded (FFPE) tissues were sectioned at 4 micron and deparaffinized on the Bond autostainer. Antigen retrieval was performed with Epitope Retrieval 2 (ER2, pH 9.0) for 20 min at 100 °C. The slides were blocked for endogenous peroxidase activity with Peroxide Block for 5 min at room temperature. Sections were then incubated with the rabbit monoclonal anti-MYC antibody (Abcam, Catalog No. ab32072) at a 1/200 and 1/1000 dilution, respectively, for 15 min at room temperature. Horseradish peroxidase (HRP) labeled Polymer was used at the instrument’s default condition. The antigen–antibody complex was then visualized with hydrogen peroxide substrate and diaminobenzidine tetrahydrochloride (DAB) chromogen. Slides were counterstained with hematoxylin. MYC immunoreactivity was evaluated on entire tumor sections by a pathologist.FISH and copy number analysisFor the MER cohort, translocation events in MYC (8q24.1), BCL2 (18q21.2), and BCL6 (3q27) were determined by FISH break apart probes (Abott Laboratories, Des Plaines, IL, USA) on tissue microarray slides. Imaging analysis was completed by a technologist in the Mayo Clinic department of Laboratory Medicine and Pathology. Detailed methods for FISH analysis are previously published58.Adoptive transfer of eu-myc/hCRBN splenocytes and treatment of recipient miceHumanized CRBN (hCRBN) C57BL/6 mice were bred with eu-myc C57BL/6 mice. Upon development of lymphoma, spleens were harvested and resulting splenocytes were viably frozen. To transplant diseased splenocytes to hCRBN mice, splenocytes were thawed, washed, and viability determined. Viable splenocytes cells were resuspended in RPMI media (GIBCO) and injected 1×105 lymphoma cells by tail vein into recipient hCRBN mice. Lymphoma cells were allowed to engraft for four days, followed by randomization between vehicle and lenalidomide (30 mg/kg) treatment once daily for via PO for four additional days. Spleens were harvested and subjected to multiplex immunofluorescence with indicated antibodies. Animal research followed the ARRIVE guidelines, using 8-10 week old female mice in groups of 10 We followed AAALAC and NIH guidelines for mouse health and body condition assessment.RNA extraction, reverse transcription, and real-time PCRTotal RNA was extracted by RNeasy Mini Kit (Qiagen) and reverse transcribed by iScript™ cDNA Synthesis Kit (Bio-rad). Quantitative real-time PCR (qPCR) was conducted on a VillA 7 System using the Power SYBR Green PCR Master Mix (Applied Biosystems). Gene expression values were calculated by normalization to 18S using the comparative CT method. Primers used in the study are listed in Supplementary Table 14,25,27,28,44,51,52.Reporting summaryFurther information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Hot Topics

Related Articles