Single-cell transcriptomes identify patient-tailored therapies for selective co-inhibition of cancer clones

This research complies with all relevant ethical regulations, approved by the institutional review boards, who approved the use of the human samples in the study. The AML patient samples and data were collected and published with signed informed consent in accordance with the Declaration of Helsinki (HUS Ethical Committee Statement 303/13/03/01/2011, latest amendment 7 dated June 15, 2016. Latest HUS study permit HUS/395/2018 dated February 13, 2018). The HGSC patient samples were collected as a part of a larger study cohort, where all patients participating in the study provided written informed consent. The study and the use of all clinical material have been approved by The Ethics Committee of the Hospital District of Southwest Finland (ETMK) under decision number EMTK: 145/1801/2015.Compiling a large-scale phenotypic response data for pre-training a LightGBM modelA comprehensive training dataset of large-scale phenotypic response profiles was created by merging data from three databases: Connectivity Map LINCS 202022, PharmacoDB23, and PubChem53 (Suppl. Figure 3, bottom part). These continuously expanding, publicly available databases allowed us to establish an extensive dataset that provides functional information on both viability and transcriptomic responses to increasing numbers of compounds. Details on the dataset used in the present study are outlined below. The Connectivity Map (CMap) LINCS 2020 is a reference database that houses gene expression response profiles of 12,328 genes measured in 240 cell lines across multiple doses and time points for 39,321 small-molecule compounds. Additionally, LINCS 2020 data includes paired control states for each perturbagen-cell line combination, enabling a comparison of the transcriptional changes before and after each treatment. To supplement our dataset, we leveraged information from PharmacoDB, a database that contains dose-response viability data for 56,149 drugs across 1758 cancer cell lines at multiple doses. For further analysis, we employed 10,303 overlapping compound-cell line pairs, which were common between 24 h transcriptional responses from CMap LINCS 2020 (passing quality control, i.e., qc_pass = 1) and PharmacoDB. For matching compounds between PharmacoDB and CMap LINCS 2020, we used compound identifiers, and for the cell line matching, we used cellosaurus IDs54. To extract structural information of the compounds, we used PubChem and RDKit (rcdk v3.6 and rcdklibs v2.3) to generate molecular fingerprints (ECFP4) from the SMILES representation of each common drug55.The light gradient boosting machine (LightGBM) model was trained on a comprehensive dataset of 3695 compounds tested at 1–35 doses in 167 cell lines. Drug-dose-cell line profiles (including transcriptomic response profiles, ECFP4 molecular fingerprints, and drug doses) were used as the model predictors, while the outcome variable is the inhibition percentage, derived from PharmacoDB dose-response viability data (Suppl. Figure 3). The LightGBM model was trained using Bayesian Optimization, with a repeated cross-validation (three repetitions), and ten-fold inner cross-validation (CV). This ensures a robust and generalizable model for patient applications. More specifically, the LightGBM model matches gene expression signatures (differentially expressed genes between cancer and non-cancer cells) to the transcriptional responses to small molecules tested at different doses from LINCS 2020 to find the compounds that induce opposite transcriptomic changes. In the next step, the model identifies which compounds and doses most effectively inhibit cell growth, by extracting percent inhibition responses for corresponding cell line-drug-dose triads from PharmacoDB. After examining tens of thousands of possible matches, the model provides a prediction of the most promising compounds and the effective dose. We also recommend including at least one dose-fold above and below the predicted dose in the experimental evaluation to delineate the most effective and least toxic drug dosage.Prediction of multi-targeting therapies using scRNA-seq data in AML patient samplesThe experimental-computational prediction approach consists of the following five subsequent steps (Suppl. Figure 3). These steps are described here for the AML case, and modifications to this pipeline in the HGSC case are described under section Tailoring the experimental-computational approach to ovarian tumor patient samples.Step 1: Longitudinal samplingAfter obtaining informed consent, bone marrow aspirates were collected from patients diagnosed with acute myeloid leukemia (AML) at the Helsinki University Hospital (HUS). For this study, a total of 12 longitudinal samples (7 at diagnosis, 2 at relapse stage and 3 at refractory stage) were obtained and stored at the Finnish Hematology Registry and Clinical Biobank (FHRB). The protocols used for this study were reviewed and approved by the institutional review board in compliance with the Declaration of Helsinki56. The below steps 2–5 were repeated for each sample individually to provide a customized set of effective and low-toxic multi-targeting options for each patient individually by considering the intratumoral heterogeneity of cancer cells that is present not only at later stages of the disease or resistance development but already at the diagnostic stage.Step 2: Single-cell data analysisFor the single-cell transcriptomic analysis, we processed the filtered gene-barcode matrix derived from 10X Genomics data using the ScType platform57, with Louvain clustering, as implemented in the Seurat version 4.3.058. To filter out low-quality cells, we removed cells that had either a low or high number of detected genes and also cells that had more than 10% of mitochondrial UMI counts in the AML scRNA-seq data. The quality control (QC) criteria depend on the sample types; for instance, in HGSC organoids, 20% of mitochondrial UMI count cut-off was used27. Such QC cell filtering step is critical to exclude technical noise and thus to avoid biases in the downstream analysis. To normalize the gene expression levels, we utilized the LogNormalize method implemented in Seurat.Step 3: Identification of malignant and normal cellsSingle-cell RNA sequencing profiles were used to identify malignant and normal cell clusters in each sample using three analytical tools, ScType57, CopyKAT31, and SCEVAN59. These tools were specifically selected for their ability to accurately classify and differentiate between malignant and normal cells in the given complex sample, eliminating the requirement for larger cohort samples. We demonstrated that the detection procedure maintains a surprisingly stable performance in most AML samples, even when as large proportion as 75% of cells are removed (Suppl. Fig. 11). This suggests its robust performance on diverse datasets with different proportions of healthy and malignant cells, and relatively stable capability to capture relevant malignant and healthy signatures of mixed cell samples.Step 3a: Cell type annotationWe utilized the ScType web-tool57 that enables fast, precise and fully-automated cell cluster annotation. ScType integrates cell type markers from the two most comprehensive resources for human cell populations and classifies cells based on gene expression changes across clusters. We used ScType to assign a confidence score to each cell type annotation and each cluster, with high scores indicating a high level of confidence in the cell type annotation. Clusters with low scores were labeled as “Unknown” cell types based on the default ScType cutoff (score < number of cells in the cluster divided by 4). In addition, we visually analyzed previously established marker genes for blasts, including CD33, CD34, CD38, PROM1, ENG, CD99 and KIT33, on the UMAP space and calculated the proportion of the blast cells in each patient sample to gain a better understanding of the distribution of leukemic cells. This resulted in a Seurat object that includes cell clusters and their corresponding annotations.Step 3b: Detection of aneuploid cellsTo further classify cell populations as normal or malignant, we developed an ensemble approach that utilizes multiple methods to generate a confident classification. The first method is a marker-based approach, which involves carefully filtered cell markers from CellMarker2.0 database60, and then using these as a custom marker dataset for ScType to identify normal and malignant cells. The second approach uses CopyKAT31, a Bayesian segmentation-based method, with default parameters and known normal cells (T cells in the AML case61) as a baseline to estimate copy number alterations (CNA). The third method is SCEVAN, with the non-cancerous control cells used as input, which employs a Mumford and Shah energy model to distinguish normal and malignant cell states59. The use of CNA estimation-based approaches allows us to classify malignant cells while taking into account overall variability within normal cells. We then constructed a majority vote based on the combined results of these tools to confidently identify both normal and malignant cell clusters. To further validate our approach, we superimposed the ensemble predictions onto the UMAP space and compared them with the cell-type information obtained from ScType. By integrating cell type and normal/malignant annotations from ScType, with ploidy information from CopyKAT and SCEVAN, we identified clusters of cells as either normal or malignant. Our ensemble approach accounts for variability within normal cells and therefore minimizes the risk of misclassification.Step 4: Identification of genetically distinct subclones and visualizing clonal lineagesAfter successfully identifying normal and malignant cell clusters, we used inferCNV62 to infer large-scale copy number variations, such as gains or deletions of whole chromosomes or segments from the scRNA-seq data. The input for the inferCNV analysis included the known non-cancerous cells identified in Step 3, genomic locations, cell type annotations, and the scRNA-seq count matrix data. CNVs were inferred using the Hidden Markov Model (HMM) approach implemented in the 6-state i6 HMM model (https://github.com/broadinstitute/infercnvApp/blob/master/inst/shiny/www/Infercnv-i6-HMM-type.md). In accordance with the inferCVN guidelines in the document “Using 10X data” section (https://github.com/broadinstitute/infercnv/wiki/infercnv-10x), we adjusted the “cutoff” parameter from 1 to 0.1, and subsequently computed the CNV profiles from the scRNA-seq expression counts. To explore the subclonal structures, we used the “subcluster” method on the HMM predicted CNVs.After identifying the genetically distinct subclones, we used Uphyloplot263 to visualize intra-tumoral heterogeneity and clonal evolution using the CNV calls from the inferCNV 6-state HMM “subcluster” method and its “.cellgroupings” file. We note that the resultant evolutionary tree does not follow a molecular clock; rather, the branch length is proportional to the percentage of cells in the subclone, hence providing information about which subclone dominates the tumor mass. Next, two broad-level subclones detected from the evolutionary tree were identified using visual analysis, and along with normal cells, overlaid on a UMAP projection for further analysis. To quantify gene expression differences between the normal cells (identified in Step 3) and the broad-level subclones (identified in Step 4), log-fold change values and determined significance levels via the nonparametric Wilcoxon rank-sum test, applied in Seurat 4.3.0 using the FindMarkers command.Step 5: Predictive modeling of multi-targeting therapiesWhen applied to patient samples, the subclone-specific differentially expressed genes (DEGs) were used as input for the pre-trained LightGBM model to predict single-agent cell inhibition percentages for each compound-dose pair in the particular patient cells. This allows us to take into account both the intratumoral and intertumoral heterogeneity, as captured by the scRNA-seq profiles of the patient samples. Our prediction approach is highly flexible and can be used in two ways: first, by utilizing a predefined set of drug-dose pairs for predictions, or second, by customizing the analysis with additional input of new drug structures (ECFP4 fingerprints) and/or specific doses of interest. The scTherapy tool offers flexibility for the users in performing either monotherapy or combination therapy predictions, depending on the number of cancer cells available from the patient samples for experimental testing. Currently, the predictions are limited to two majority clones, where a specific drug in the drug combination is uniquely predicted for each subclone, when the model is used for combinatorial treatment predictions (please see the ovarian cancer case study below for monotherapy predictions).As any ML model predictions inherently come with some degree of uncertainty, we used conformal prediction (CP) to eliminate low-confidence predictions and improve the prediction accuracy64. CP generates confidence intervals for each prediction by measuring uncertainty based on repeated CV residuals. Predictions with a nonconformity score < 0.8 were excluded, thereby ensuring inclusion of only confident and accurate predictions. In addition, to ensure that our model returns clinically more relevant predictions, we imposed a 1 μM dose maximum when utilizing the pre-defined set of drug-dose pairs. High drug doses, even though potentially increasing cancer cell inhibition, may also inhibit normal cells, hence compromising the selectivity of targeted agents65. By using such a dose restriction, we ensured the selectivity of targeted agents returned by the model and minimized the risk of toxic effects, making our predictions more clinically actionable. We applied this approach to each subclone, hence generating a set of drug-dose-response tuples for the experimental validation.Retrospective testing of the model predictions in single-agent data from AML patientsTo validate the performance of our model, we first used existing data from bulk drug response assays, available for the 12 patient samples from previous studies56. For the single-agent response testing, 20 μl of fresh AML cell (approximately 10,000) suspension in mononuclear cell medium was added per well to pre-drugged plates with 10-fold dilution series of five concentrations, and the whole-well cell viability was measured with CellTiter-Glo (CTG; Promega) in duplicate using established protocols35,56. After 72 h of incubation at 37 °C and 5% CO2, cell viability of each well was measured using the CTG luminescent assay and a PHERAstar FS (BMG Labtech) plate reader. The percentage inhibition was calculated by normalizing the cell viability to negative control wells containing 0.1% dimethyl sulfoxide (DMSO), and positive control wells containing 100 μM cell killing benzethonium chloride (BzCl). Notably, these existing single-agent response data were not used in the model training and were only employed retrospectively to test the accuracy of the model to predict effective monotherapies. Since the whole-well assay is not a cell population-specific assay, we performed this validation using the differentially expressed genes (DEGs) between the malignant cell types and normal cells to generate single-agent predictions for each patient sample. Subsequently, we matched the drugs and doses predicted by the model to the available patient-specific cell viability dose-response data (see Fig. 2b).Prospective testing using whole-well and flow cytometry assays in the AML patient cellsThe patient-specific predicted combinations were first tested on the bone marrow mononuclear cells of each patient in a 4 × 4 dose-response matrix using the bulk CTG viability assay, similarly as before33. The combination synergy in the experimental validations was quantified using ZIP model66, calculated based on the dose region around the predicted effective dose of each compound in the combination.Cell population-specific drug combination effects in primary AML patient samples were assessed by high-throughput flow cytometry assay. The compounds were dissolved in 100% dimethyl sulfoxide and dispensed on conical bottom 384-well plates (Greiner) either as single agents or combinations using an Echo 650 liquid handler (Beckman Colter). Cryopreserved bone marrow mononuclear cells were thawed and suspended in 12.5% HS-5 derived conditioned medium, and 2 − 3 × 104 live cells were seeded with a MultiFlo FX.RAD (BioTek) to 384 well-plates, followed by incubation for 72 h at 37 °C and 5% CO2. To profile the cell subpopulation responses, the cells were stained with BV785 Mouse Anti-Human CD14 (Biolegend, dilution 1:200), VB515 Recombinant Anti-Human CD56 (Miltenyi, dilution 1:400), and following antibodies from BD Biosciences; V500 Mouse Anti-Human CD45 (dilution 1:240), BV650 Mouse Anti-Human CD19 (dilution 1:120), PE-Cy7 Mouse Anti-Human CD3 (dilution 1:150), PE Mouse Anti-Human CD34 (dilution 1:240), BV421 Mouse Anti-Human CD38 (dilution 1:600) and APC Mouse Anti-Human CD117 (1:600), together with APC-Fire 750 Annexin V (Biolegend, dilution 1:80) and DRAQ7 (BD Biosciences, dilution 1:600). The cells were analyzed with an iQue3 flow cytometer (Sartorius). The remaining live cells after drug treatments were gated using Forecyt (Sartorius). Briefly, cell singlets were identified based on FSC-A (forward-scattered area) versus FSC-H ratio, and live cells were identified by excluding annexin V- and DRAQ7-positive cells, followed by identification of leukocytes (CD45 + ). Further characterization was done for NK cells (CD56 + CD3-), leukemic blasts (CD34+ and/or CD117 + ) leukemic stem cells (CD34 + CD38-), monocytes (CD14) and T/B- cells (SSC-A and CD3/19) from the leukocytes.Tailoring the experimental-computational approach to ovarian tumor patient samplesTo differentiate between cancer and non-cancerous cells in ovarian cancer patient scRNA-seq data, we utilized a panel of established marker genes, including PAX8, CA125, MUC16, WFDC2, and EPCAM, collectively referred to as PAX8+ cells; PAX8 is expressed in 80–96% of high-grade serous ovarian cancer (HGSC) tumors (Suppl. Figure 5)67,68. Our initial analysis focused on the HGSC Patient 1 sample, selected due to the availability of both scRNA-seq data and viable cells for experimental validation. Due to the small proportion of PAX8 +  malignant cells detected in the scRNA-seq data, we opted to predict only single-agent therapies as opposed to combination therapies. However, during the validation phase, the PAX8- stromal cells of Patient 1, serving as normal controls, died. This led us to integrate this sample with two other HGSC Patient 4 and 5 samples, which had readily available PAX8- cells (see https://ianevskialeksandr.github.io/figovfig145.png). The integration was achieved using the standard Seurat workflow, and the cell types were assigned using ScType. Both combined PAX8 +  and PAX8- cell populations were visualized using Seurat “FeaturePlots”. We used an average of previously-measured responses of PAX8- cells from patient 4 and 5 samples (serving as combined ovarian-sample normal controls) to 372 compounds overlapping with the LINCS 2020 compounds. We extended these initial analyzes with two additional HGSC patients (Patients 2 and 3) for which both organoids and the stromal cell cultures were available at the cell numbers sufficient for single-drug sensitivity and selectivity testing in PAX8 + and PAX8- cells (Suppl. Table 3).Prospective testing in ovarian tumor organoids and drug response assaysIn contrast to the AML case, where we had enough cancer cells for combinatorial testing of targeting two major subclones, in the HGSC case study, we opted to predict multi-targeting monotherapies, due to the small proportion of patient-derived cancer cells based on the scRNA-seq analyzes (Fig. 3a–c). In such limited cancer cell populations, identifying subclones may not be reliable for combinatorial targeting. Instead, we used all cancer cells from a patient sample as a collective entity, disregarding subclone distinctions and subdivisions, when identifying treatments that inhibit mostly PAX8+ cells. To predict the compounds that specifically target and eliminate cancer PAX8+ cells, while sparing PAX8- cells, we utilized the differentially expressed genes (DEGs) from the comparison between PAX8+ and PAX8- cells in the scRNA-seq data. These DEGs were used as input for the pre-trained LightGBM model. Among the predicted 372 compound responses (that overlapped with drugs tested on PAX8- cells), we selected the top-20 most effective compounds, and removed two with low confidence, hence resulting in 18 predicted agents. Subsequently, we validated the efficacy of these compounds in PAX8+ tumor organoids and compared the results, as shown in Fig. 3 (3 replicates).Ovarian cancer organoids were established and characterized according to established protocols27, and propagated in BME-2 matrix droplets in the sample-specific growth medium. The organoid cultures consisted only of cancer cells as judged by whole-genome sequencing of the organoid cultures, profiling of their copy-number variation, and determination of the tumor cell purity based on the sequencing-based factors in the original study27. Moreover, TP53 mutation analyzes, revealing a Variant Allele Frequency (VAF) of 1.0 in all organoids, confirmed that organoids comprise only cancer cells27.For the organoid drug sensitivity testing, the organoid cultures were trypsinized to obtain the single-cell suspension. The cells were resuspended in the fresh gel, dispensed to 384-well Ultra-Low Attachment microplates (#4588, Corning) at approximately 103 cells per well in 10 µl of the matrix, and covered with 40 µl of growth medium containing 5 µM ROCK inhibitor to facilitate the organoids formation. After 2–6 days, the medium was exchanged to ROCK inhibitor-free growth medium. Drug testing was performed as described above for single-agent AML sample testing, with the following modifications. The tested compounds (10-fold dilution series of four to five concentrations), vehicle (DMSO), or positive control compounds (100 μM benzethonium chloride or 10 μM bortezomib) were transferred to the wells using Echo 550 acoustic dispenser (Labcyte). The organoids were incubated with drugs for a total of 7 days (with fresh medium exchange and drug replenishment on after 4 days) in the humidified incubator at 37 °C and the viability was assessed using CellTiter-Glo 3D Cell Viability Assay (#G9683, Promega) using a SpectraMax Paradigm microplate reader (Molecular Devices) after 5 min of agitation and 25 min of incubation at room temperature, as indicated by the manufacturer.The PAX8-negative cells from the ovarian tumor samples were expanded in sample-optimized media, either RPMI-1640 medium, supplemented with 2 mM glutamine, 1% Pen/Strep and 10% FBS (Gibco), or M199 supplemented with 10% FBS, 1% Pen/Strep, 10 ng/mL EGF, 400 nM hydrocortisone, 870 nM insulin-transferrin-selenium, 0.3% Trace elements B, and 20 mM HEPES. Drug testing was performed as above. The culture was trypsinized, resuspended in fresh medium and seeded at 1000 cells in 25 μl of medium per well in pre-drugged 384-well microplates (#3864, Corning). After 7 days of the drug treatment, the viability was measured using the CellTiter-Glo 2.0 (Promega).Immunofluorescence and immunoblotting in ovarian cancer organoids and cellsCells were fixed with 4% PFA for 30 min, washed 5 times with PBS, and incubated in the blocking buffer (PBS, 0.5% BSA, 20 mM glycine, 0.1% TX-100) for 12 h (for organoids) or 1 h (for cell lines) at room temperature. Immunostaining with anti-PAX8 rabbit polyclonal antibody (dilution 1:100, Proteintech, #10336-1-AP) was performed overnight at 4 °C. After 3 washes with the blocking buffer, 8 h each, the donkey-anti-rabbit Alexa555 secondary conjugates (dilution 1:500, Invitrogen, #A31572) were applied for 1 h together with Hoechst 33342 (10 µg/mL in PBS). The cells were imaged at an Opera Phenix confocal screening microscope (Perkin Elmer), with the 20x or 40x water immersion objectives.For immunoblotting for PAX8 and EpCam, the cells were lysed in RIPA buffer with Pierce protease and phosphatase inhibitors cocktail (Thermofisher) on ice for 30 min. After centrifugation (17000 rcf, 4 °C, 20 min), the samples were loaded to Bis-Tris 4–12% gradient Bolt PAGs and run according to the manufacturer’s manual. After the protein transfer to the nitrocellulose membranes overnight in Towbin transfer buffer, the membranes were stained blocked in 5% non-fat milk in TBS-T and incubated with primary antibodies in 5% non-fat milk in TBS-T overnight at 4 °C. The antibodies were: anti-PAX8 rabbit polyclonal (dilution 1:250, Proteintech, #10336-1-AP); anti-EpCam mouse monoclonal (dilution 1:500, Santa Cruz, #sc-25308); and anti-GAPDH mouse monoclonal (dilution 1:5000, Novus, #NB300-221). After 3 washes with TBS-T, the membranes were incubated with the secondary fluorophore-conjugated antibodies diluted to 1:5000 in 5% non-fat milk in TBS-T (anti-mouse IRDye 680, #926-32220; anti-rabbit IRDye 800CW, #926-32211; anti-mouse IRDye 800CW, #926-32210, Licor), washed in TBS-T 3 times and scanned using LiCOR Odyssey imager.Statistics & ReproducibilityThe statistical tests applied and the significance values are included in the figure legends or results text. We used non-parametric tests in all the statistical comparisons. Patients were recruited as part of two ongoing translational studies for AML and HGSC. The patient samples for the current study were selected based on the availability of input data for modeling (scRNA-seq) and primary cells for experimental validations (single-cell drug assays). The model predictions and experimental testing were done for each patient separately, using scRNA-seq data and single-cell drug assays, respectively. Therefore, the age, sex, gender, race, ethnicity, or other social parameters are not considered as confounding factors. The patient characteristics are reported in Suppl. Tables 1–3. The experimental validations of the model predictions were made after the predictions, and experimental researchers were blinded to the model prediction outcomes. The validation experiments were not randomized, since this is not a case-control study, instead the non-cancerous cells from each patient sample were used as patient-specific control for the particular patient’s cancer cell responses. No data were excluded from the analyzes. All the validation drug assays were replicated either 2 or 3 times, depending on the availability of primary patient cells. All the replicate measurements were successful in the sense that the standard deviations were within an expected range based on previous studies.Single-cell RNA sequencing and data processingSingle-cell gene expression profiles were generated using the 10x Genomics Chromium Single Cell 3’ RNA-seq platform with Next GEM v3.1 Dual Index chemistry. Libraries were prepared using the Chromium Next GEM Single Cell 3’ Gene Expression version 3.1 Dual Index kit. Samples were sequenced on an Illumina NovaSeq 6000 system with read lengths of 28 bp (Read 1), 10 bp (i7 Index), 10 bp (i5 Index), and 90 bp (Read 2). Data processing and analysis were performed using 10x Genomics Cell Ranger v6.0.0 pipelines69. The “cellranger mkfastq” command, utilizing Illumina’s bcl2fastq v2.2.0, was used to generate FASTQ files from raw base calls. The “cellranger count” pipeline performed alignment against the human genome GRCh38.Reporting summaryFurther information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Hot Topics

Related Articles