Molecular and functional landscape of malignant serous effusions for precision oncology

Study design and participantsThis study was conducted as a prospective, non-randomized observational clinical study with feasibility as the primary outcome. Treatment decisions during the study were solely based on current clinical guidelines and the decisions made by the treating physician and the patient. Any patient with a metastatic solid malignancy from whom a fluid sample (ascites, pericardial or pleural effusion, washing or any other fluids containing malignant cells) was collected as part of routine diagnostic or therapeutic procedures at the University Hospital Zurich (USZ), Spital Uster or Kantonsspital Winterthur was eligible for the study. Thus, there is no pre-selection bias. Patients were included if they were older than 18 years and provided written informed consent either through the USZ general consent (GC) or a study-specific informed consent form. Patients matching the inclusion criteria were identified by the cytology team at USZ, who performed diagnostics on fluid samples and consented by the treating physician. Over the course of the project (May 2019–January 2023), 261 samples from 184 patients were included in this study. Follow-up data (see Supplementary Data 1) were collected as part of clinical routine during the same time period. The research project was carried out in accordance with the research plan and with principles enunciated in the current version of the Declaration of Helsinki (DoH), the Principles of Good Clinical Practice (GCP), the Swiss Law and Swiss regulatory authority’s requirements as applicable. Ethical approval was granted by the Ethics Committee of Kanton Zurich (CEC Zurich, BASEC-Nr: 2019-01700).Visualization of cohort statisticsCohort statistics in Fig. 1b were visualized using the circlize R package (v.0.4.13)79.Collection of cells from malignant effusions100 ml of fluid was centrifuged at 200 × g for 5 min. The supernatant was discarded, and the pellet was treated with red blood cell (RBC) lysis buffer (BioLegend #420302). This step was repeated If the pellet was still red after RBC lysis. The pellet was then resuspended in growth media (Gibco RPMI1640 + GlutaMax supplemented with 10% human type AB serum (PanBiotech PANP30-2502)) and optionally passed through a 70 μm cell strainer to remove large aggregates. Cell number and viability was determined using a Countess cell counter (Invitrogen). Samples that did not contain at least 2 millions of viable cells were excluded from the study and not further processed.Purification of cell subsets by FACSFor selected validation samples (FB103 and FB215 in Fig. 6; FB133 and FB295 in Fig. 7), cell subsets were purified from cryopreserved cells isolated from MSE by FACS. After thawing, cells were resuspended in a blocking buffer (5% (v/v) FBS in PBS) and passed through a 70 μm cell strainer. Cells were then blocked on ice for 10 min, spun down and resuspended in 100 μl of FACS buffer (1% (w/v) BSA in PBS) and stained using the following antibodies (5 μl of antibody per 5 million cells, see Supplementary Data 2 for details on the antibodies used): CD3 (Alexa Fluor 488) for T-cells, EpCAM (Alexa Fluor 555) for epithelial/adenocarcinoma cells, and CD14 (Alexa Fluor 647) for monocytes/macrophages/dendritic cells. Cells were incubated with antibodies on ice for 30 min. Then, cells were washed twice with 10 ml of cold FACS buffer, and finally resuspended in 0.5−2 ml FACS buffer to reach a concentration of ~5 million cells/ml. 1 μM SyTOX blue was added right before sorting to exclude dead cells. Cells were sorted on a BD FACS Aria Fusion using a 100 μm nozzle. We set a very inclusive gate on forward and side scatter, and excluded dead cells based on SyTOX blue intensity only. Individual cell populations were then sorted out based on single positivity for the corresponding markers.PharmacoscopyThe term PCY refers to short-term ex vivo culture and drug treatment of primary patient samples followed by immunohistochemistry, automated microscopy, and single-cell image analysis26,46,47. The technology thus encompasses both experimental procedures and computational analysis. PCY results in quantification of cellular and morphological sample composition, as well as cell type-resolved drug responses.Short-term culture and ex vivo drug treatment of MSE-derived cellsThe cellular component of the MSE sample was diluted to 0.1–0.2 million cells/ml in growth media (Gibco RPMI1640 + GlutaMax supplemented with 10% human type AB serum (PanBiotech PANP30-2502)), and 50 μl/well were seeded in CellCarrier 384 Ultra, clear-bottom, tissue-culture-treated plates (PerkinElmer). Cells were incubated with 10 μM of compound (see Supplementary Data 4) or matching control (DMSO for small molecules and isotype control for antibodies) for 24 h at 37॰C, 5% CO2. This drug concentration was chosen based on prior experience in hemato-oncology26,46,47. Afterwards, the media was aspirated using a microplate washer (Tecan), and cells were fixed with 20 μl/well of a periodate, lysine, formaldehyde fixative (75 mM lysine [Sigma Aldrich L5626-100G], 2.5 mg/ml sodium periodate [Sigma Aldrich 30323-100 G] and 1.25% (v/v) formalin [37% formaldehyde solution, Sigma Aldrich F8775-500ML] in PBS) for 15 min at room temperature. The fixative was removed, and 70 μl of PBS/well was added. Plates were then stored in the fridge for up to 2 weeks prior to staining.Staining with fluorescence-labeled antibodiesFor staining, PBS was aspirated, and cells were blocked, permeabilized and stained for DNA using 20 μl of PBS supplemented with 5% fetal bovine serum (FBS, Gibco/ThermoFisher 10270106), 4′,6-diamidino-2-phenylindole (DAPI, BioLegend 422801) and 0.1% (v/v) Triton-X100 (Sigma Aldrich T8787) for 30 min at room temperature. Blocking solution was aspirated, and cells were stained with fluorescently labeled antibodies (see Supplementary Data 2) diluted in PBS + 10 mg/ml bovine serum albumin (BSA, Sigma Aldrich A7906) for 1 h at RT or overnight at 4 °C. Antibody-containing solution was then aspirated, and PBS was added on top of the cells. Every plate was first stained with a panel of antibodies only in 8 wells (Supplementary Data 2) containing no drugs. If a sample did not contain >2% malignant cells, it was excluded from analysis and not further processed. Otherwise, the whole plate was stained with an antibody panel tailored to the respective sample’s tumor marker expression (Supplementary Data 2).Automated confocal microscopyAll samples were imaged on an automated spinning-disk confocal microscope (PerkinElmer Opera Phenix), using ×20 magnification and 25 images per well to cover the entire well area. We used five channels with non-overlapping excitation/emission filters to image the following features: Channel 1 (transmission/650–760 nm) for brightfield to capture general cell shape and texture, channel 2 (405 nm/435–480 nm) for DAPI/nuclei, channel 3 (488 nm/500–550 nm) for tumor stain, channel 4 (561 nm/570–630 nm) for a second tumor stain or additional markers, and channel 5 (640 nm/650–760 nm) for immune cells (CD45) channels.Image analysis by CellProfilerRaw images were first analyzed using CellProfiler v.2.2.080. Individual cells were detected based on maximum correlation thresholding of the DAPI signal. The exact parameters of the pipeline were adjusted per sample to account for differences in nucleus intensity, size and the presence of large spheroids. Staining intensities were extracted for the nucleus, and a region of 12 pixels around the nucleus that was used as a proxy for cytoplasmic intensity. For downstream analysis, intensities were log10 transformed and corrected for variation in the local background as described in ref. 81.Filtering of segmentation artifacts and removal of outlier wellsCells with very low DAPI intensities or abnormally high or low nucleus areas likely represent segmentation artifacts and were therefore removed from the analysis by manual gating. In addition, outlier wells (very low or high total cell numbers, or aberrant staining patterns) were removed if the observed patterns could be attributed to pipetting mistakes or the presence of large cell clumps by visual inspection.Training of convolutional neural networksFive different CNNs were trained (Supplementary Fig. 1). All of them are based on a ResNet18 architecture and were trained as previously described47,82. CNN 1 was trained to identify apoptotic cells using brightfield and nuclei (DAPI) images alone. Here, the training dataset was generated from test stains (2 wells without any drug treatment per sample), where we defined apoptotic cells based on the staining intensity of cleaved caspase 3 (clCASP3) and then generated a total of 95,666 2-channel single-cell crops (150 × 150 pixels corresponding to 45 × 45 μm) of apoptotic (n = 44,212) and non-apoptotic (n = 51,454) cells. CNN 2 was trained to recognize different cell types based on a combination of marker intensities and cellular morphology using all available channels. Here, the training data was generated by cropping 150 × 150 pixel images and then manually assigning them to either lymphocyte/granulocyte, macrophage, cancer, or other class. A total of 125,959 images across all samples were curated. CNN 3 classifies tumor cells into single adherent, single non-adherent, spheroid or adherent colony based on all channels and was trained on 36007 manually curated 150 × 150 pixel images. CNN 4 is used to distinguish round and elongated macrophages using the brightfield, DAPI, and 640 nm (CD45) channels, and was trained on a manually curated set of 12,846 150 × 150 pixel images.For all CNNs, 75% of the labeled images were used for training and 25% for evaluation of classification performance.CNN 1–4 were trained using the deep learning toolbox in Matlab R2021a and the adaptive learning rate optimization method ‘ADAM’. Except for CNN 2, equal numbers of images were used per class. Parameters were set as follows for the different CNNs: For CNN 1 and 2, the initial learning rate was set to 0.001 and kept constant. A mini-batch size of 200 and L2 regularization of 0.01 was used, and CNNs were trained for a total of 5 epochs. CNN 4 was trained the same way, except that the L2 regularization was increased to 0.1, the network was trained for 10 epochs, and the learning rate was decreased by a factor of 0.01 after the first 5 epochs. For CNN 3, the initial learning rate was set to 0.0001 and dropped by a factor of 0.01 every 5 epochs. L2 regularization was set to 0.05 and the network was trained for a total of 20 epochs with a mini batch size of 150. To strengthen the generalization of the networks, images were augmented with random rotations and reflections.CNN 5 recognizes lymphocyte/granulocyte morphology using the brightfield, DAPI and 640 nm (CD45) channels and was obtained by transfer learning of an existing network as described in refs. 47,82. For this network, 48 × 48 pixel (14.4 × 14.4 μm) images were used to account for the smaller size of lymphocytes and granulocytes, and 4452 images from this dataset were curated for transfer learning.Classification of cell typesThe trained CNNs were used to classify every single cell in the dataset in a hierarchical manner (Supplementary Fig. 2a). First, cells were classified as apoptotic or not, and cells classified as apoptotic with a confidence of at least 0.6 were removed from the analysis. Next, cells were assigned to one of four main cell types (Lymphocyte/granulocyte, macrophage, malignant or other). If a cell could not be confidently assigned to any of these classes (maximum confidence <0.6), it was classified as ‘unassigned’ instead. Finally, each cell was assigned a morphological subtype (polarized or conventional for lymphocytes/granulocytes, round or elongated for macrophages and single adherent, single non-adherent, spheroid or adherent colony for cancer cells). No confidence cutoff was set for morphological subtypes. Cell compositions per sample that were used in the MOFA analysis described below were calculated as the mean fraction of each cell type in DMSO control wells, with the abundance of the morphological subclasses normalized to the abundance of their corresponding parent cell type.Calculation of drug responsesWe calculated two different cancer-specific drug response readouts, RCN and RCF, relative to control. These are calculated as follows:RCN = 1−(number of cancer cells in drug-containing well/median (number of cancer cells in control wells))RCF = 1−(fraction of cancer cells in drug-containing well/median (fraction of cancer cells in control wells))In both cases, a positive score indicates a wanted outcome, i.e. a reduction in tumor cells. However, the RCN readout only considers what is happening to the tumor cells and thus assigns positive values to any cytotoxic treatment, regardless of its specificity to malignant cells. On the other hand, RCF only assigns a positive score to compounds that result in a reduction of malignant cells which exceeds the reduction in non-malignant cells, and thus prioritizes compounds that specifically act on malignant cells (see also Supplementary Fig. 4a). Statistical significance of the ex vivo response was assessed by a two-sided two-sample t test, comparing each drug to the matching control (DMSO for small molecules and isotype control for antibodies). In general, all small molecules were used at 10 μM, and antibodies at 10 μg/ml. The only exception to this was the drug panel used in the first 15 samples (plate barcode starting with “STP”, see Supplementary Data 2), where we additionally screened select compounds at 1 μM. An overview of all compounds used in this study is provided in Supplementary Data 4.For downstream analysis, the drug responses were averaged across replicate wells and scaled to −1 and 1 per sample. To this end, any negative value was divided by the absolute value of the most negative value, and any positive value was divided by the strongest positive value.Calculation of spheroid dissociation drug responseMorphological drug response was calculated on the cancer cells only, thus “fraction” refers to “the fraction of cells among cancer cells”. To avoid artifacts arising from calculating fractions from very low cell numbers, we only included samples in this analysis that had at least 5% spheroids as a fraction of tumor cells, as well as 2% spheroidal tumor cells as a fraction of all cells. We then defined spheroid dissociation as1−(fraction of cancer cells forming spheroids in drug-containing well/median(fraction of cancer cells forming spheroids in control wells))Thus, a positive value indicates spheroid dissociation, whereas zero indicates no change and negative values indicate increased fractions of cells in spheroids.RNA sequencingRNA extraction for baseline sequencingA pellet of around 1 million cells was collected, resuspended in 350 μl of TriZol reagent (Ambion, sold by ThermoFisher 15596026) and stored at −80 °C until extraction. RNA was extracted using a Zymo DirectZol RNA miniprep kit (Zymo R2052) or a DirectZol-96 (Zymo R2056) kit for larger batches according to the manufacturer’s instructions. RNA concentration and integrity was quantified using a TapeStation with RNA reagents (Agilent Technologies).RNA extraction of drug-treated cellsTo profile transcriptomes of drug-treated cells, around 100,000 cells/well were seeded into a PerkinElmer cell carrier ultra 96-well plate and incubated with different small molecules (10 μM each) for 24 h. To avoid losing non-adherent cells, the media was afterward not completely aspirated (~50 μl left per well), and cells were lysed in 150 μl/well TriZol LS reagent (Ambion, sold by ThermoFisher 10296028). Lysates in TriZol were transferred to a PCR plate and stored at −80 °C until extraction. RNA extraction was performed using a DirectZol-96 (Zymo R2056) kit according to the manufacturer’s instructions.Preparation and sequencing of RNA-seq librariesAll RNA-seq libraries were prepared using a customized version of the DRUG-seq protocol50. Briefly, this protocol consists of reverse transcription (RT) with barcoded poly-T primers combined with second-strand synthesis by template-switching, PCR amplification of the pooled cDNA, and sequencing library construction by tagmentation (Illumina Nextera). We made the following key modifications to the original protocol: (1) we changed the reverse transcription (RT) primer sequences to make them compatible with standard Illumina sequencing primers (see Supplementary Data 11); (2) we start from purified RNA rather than a cell lysate, in our hands this resulted in less genomic DNA contamination of the sequencing libraries and simplified sample storage; (3) we optimized the protocol for ultra-low input samples by using a Zymo IC column to purify cDNA after RT and eluting in only 10 μl of hot (80 °C) water, and by using the Nextera XT kit (Illumina FC-131-1096) for library preparation; (4) we optimized the unique mapping rates of our libraries by performing a stringent size selection with one 0.5× cleanup to remove large fragments followed by two times 0.6× cleanup using KAPA pure beads (KAPA KK8000) resulting in fragment sizes between 500–800 bp. Library size distributions were measured using a TapeStation D5000 HS kit, and library concentration was quantified using a Qubit. Libraries were sequenced at a concentration of 850 pM on a NextSeq2000 system using a mid-output 100 cycle kit (Illumina 20046811), with the following read configuration: 80 bp for read 1 to identify the transcript, 20 bp for read 2 to get the well barcode and unique molecular identifier (UMI), 6 bp index reads. We used ~4 million reads/sample for baseline RNA-seq samples and 1 million reads/well for drug-treated samples.Processing of raw reads for RNA-seqRaw sequencing reads were processed using a customized version of the Dropseq toolbox (https://github.com/broadinstitute/Drop-seq), which is available from GitHub83 (https://github.com/RebekkaWegmann/drugseq_toolbox). Briefly, UMIs and well barcodes are extracted from read 2, and well barcodes are removed if they do not match the set of barcodes used in the experiment (up to 2 mismatches allowed). Reads are then mapped to a reference genome (GRCh38 v93) using STAR v.2.4.2, and the number of UMIs per well barcode and gene are counted. The raw sequencing data are available from GEO under the series accession number GSE240953.Baseline RNA-seq data analysis
QC criteria
Samples were retained if they had >104 UMIs, >5000 genes, and a RIN score of at least 3. Genes were kept if they were protein-coding or long noncoding RNAs, and detected with at least two counts in at least five samples. After applying these filtering criteria, the final dataset contained 17046 genes × 145 samples (131 unique samples), with some samples sequenced across multiple batches to control for batch effects.

Normalization
Raw counts were normalized using the method implemented in the scran R package (v.1.20.1)84. For downstream analysis and visualization, a pseudocount of 1 was added to the normalized counts before they were log2 transformed.

Regressing out covariates for visualization
The log2-transformed normalized expression values were corrected for unwanted covariates (batch, tumor content) using a linear regression approach as implemented in the removeBatchEffect function implemented in the limma R package (v.3.48)85. To preserve biological signals that may be correlated with these covariates, tumor type, and sample type were included in the design parameter.

Visualization of expression profiles using t-SNE
Batch- and tumor content-corrected expression profiles were visualized using t-SNE. The embedding was calculated using the R package Rtsne (v.0.15)86,87,88 with all parameters set to their default values except perplexity = 5 and initial_dims = 10.

Identification of disease-specific genes
To identify genes that were specifically expressed in samples from patients with LUAD, OV, MESO, BRCA, or STAD, a differential expression (DE) analysis was run comparing one tumor type against all others. DE analysis was performed using a negative binomial generalized linear model (GLM) and a quasi-likelihood F-test implemented in edgeR (v.3.34.0)89,90. Batch, biological sex and tumor content were included as covariates in the model. Genes that were highly specific were selected for visualization in Fig. 4b.

Integration of drug responses and RNAseq data
Drug responses (scaled RCF) were integrated with baseline RNA-seq data using a negative binomial GLM and a quasi-likelihood F-test implemented in edgeR (v.3.34.0)89,90. The model included batch and tumor content as unwanted covariates and the scaled drug response as a predictor. Only genes that were detected in at least 70% of samples were included in this analysis.

Calculation of pathway scores
All pathway activation scores were calculated using singscore (v.1.12.0)91 with default parameters. For the baseline RNA-seq data, the batch- and tumor content-corrected expression values were used as input for singscore. For DRUG-seq data, normalized counts were used as input, and pathway scores were then z-scored across conditions per sample. For visualization of GSEA enrichments, only genes contributing to the enrichment (“core enrichment” gene set in clusterProfiler) were used to calculate pathway scores.

Selection of most variable genes for input to MOFA
To select the most informative genes for input to MOFA, a global mean-variance trend was fitted to the covariate-corrected expression values using a loess estimator. Genes with residual variance exceeding the 0.95 quantile of all residual variances were selected for downstream analysis. For input into MOFA, a variance stabilizing transformation (DESeq2 v.1.32.092 varianceStabilizingTransformation) was applied to the selected gene count,s and the transformed values were then z scored across samples.
RNA-seq data analysis for drug-treated samplesFor RNA-seq analysis of the drug-treated samples, QC criteria were adjusted to the lower sequencing depth. Wells were kept if they had more than 104 UMIs and >3000 genes, and genes were kept if they were protein-coding or long noncoding RNAs and detected at a count of 2 in at least 2 wells per sample. Genes that were not detected across all samples were discarded. Applying these criteria, the final dimension of the dataset was 8359 genes × 932 wells, corresponding to 47 drugs measured with 3–4 replicates across five samples.Drug-induced transcriptomic changes were assessed using a negative binomial GLM and a quasi-likelihood F-test implemented in edgeR (v.3.34.0)89,90. We included the SampleID as a covariate in the model and compared each drug to its matching control (DMSO for small compounds, isotype control for antibodies). Only genes detected in at least 70% of wells were included in the DE analysis.Enrichment analysis for drug-target-proximal gene setsDirect drug targets were obtained from the SITCH database51,52 (http://stitch.embl.de/, data downloaded on 24 May 2022). Data was filtered to only include interactions reported for human genes with a combined score >500. Drug-target-proximal genes were defined as any gene that functionally interacts with a direct drug target based on the STRING database. For STRING queries, the STRINGdb R package (v. 2.4.2)53 was used with the string database version 11.5. Interactions were filtered for species – human and a score threshold of 500 was used.To calculate enrichments of DE genes in drug target-proximal gene sets, genes were ordered by the edgeR t-statistic (calculated as t.stat = sign(qlf$table$logFC) * sqrt(qlf$table$F), z = limma::zscoreT(t.stat, df = qlf$df.total), where qlf is the output of edgeR’s glmQLFTest). P values were then calculated using a one-sided hypergeometric test for enrichment of the top and bottom 100 genes among the drug target-proximal genes.Drug-target networks were visualized using the R package igraph (v.1.2.6)93.Comparison of expression profiles to TCGA dataThe results shown in Fig. 3c and Supplementary Fig. 10 are based upon data generated by the TCGA Research Network: https://www.cancer.gov/tcga. Gene counts for the different tumor types (LUAD, MESO, OV, BRCA, STAD) were obtained from the GDC data portal. Any sample that was not a primary tumor was excluded, resulting in a total of 2452 samples. Counts were then z-scored per sample, and the genes were subset to the top 5% most variable genes in the MSE cohort. For the tSNE (Fig. 3c) and sample-level clustergram (Supplementary Fig. 10b), the TCGA dataset was combined with the z-scored batch- and tumor content-corrected expression values of the corresponding tumor types in the MSE cohort. To assess the accuracy of MSE tumor-type prediction from the TCGA data, we first reduced the combined gene expression matrix to 30 dimensions using PCA. We then trained a 1-nearest neighbor classifier (knn function from the R package class) on the principal components of the TCGA samples only, and evaluated its performance on the MSE samples. To create the tumor-type level comparisons shown in Supplementary Fig. 10a,c, gene expression was first averaged per tumor type and cohort. After averaging, gene expression values were z-scored per tumor type.Mutational profiling by FoundationOne CDxMutational profiling was performed using the FoundationOne®CDx assay94,95, which includes 324 cancer-related genes for the detection of base exchanges, insertions, deletions, and copy number changes. In addition, specimens are simultaneously profiled for loss of heterozygosity (LOH), tumor mutational burden (TMB) and microsatellite instability (MSI). Detailed information is available athttps://www.foundationmedicine.com/genomic-testing/foundation-one-cdx.DNA was extracted from formalin-fixed paraffin embedded tissue cell blocks of MSE samples with at least 10% tumor content with the Maxwell 16 FFPE Plus LEV DNA Purification Kit (AS1135) according to the recommendations of the manufacturer. The DNA stock concentration was between 50 and 100 ng/µL for further analysis. Samples were assayed by adaptor ligation hybrid capture, performed for all coding exons of 309 cancer-related genes plus select introns from 34 genes. Sequencing was performed using the Illumina HiSeq instrument to a median exon coverage ≥500×, and data were analyzed for all classes of genomic alterations. The computational pipeline used to analyze sequence patterns used Bayesian algorithms to identify base substitution mutations, local assembly to identify short insertions and deletions, comparisons with process-matched normal controls to determine gene amplifications and homozygous deletions and the analysis of chimeric read pairs to identify gene rearrangements and gene fusions94. Using 0.8 to 1.1 Mb of sequenced DNA for each case, the TMB was determined using the number of somatic base substitution or indel alterations per Mb after filtering to remove germline and pathogenic mutations96.In this manuscript, the attribution of the definition of “actionable” to any genomic alteration is based on prior knowledge as reported in ref. 97.Visualization of mutational profilesMutational profiles were visualized using the oncoPrint function in the R package ComplexHeatmap (v.2.8.0)98. Alterations were colored by type and somatic impact (known versus likely, uncertain or unknown) as provided by the FoundationOne CDx report, which is based on the COSMIC database (https://cancer.sanger.ac.uk/cosmic).Calculation of pairwise similarity between mutational profilesPairwise similarity between mutational profiles (Fig. 3d) was calculated using a JC as follows:JCi,j = number of matching alterations between sample i and sample j/number of all alterations in sample i or sample jHere, “alteration” refers to the type of change occurring in a single gene and sample, which can be wild type, substitution, rearrangement, amplification, deletion or a combination of several types. “Wild type” was also considered an alteration for calculating the JC, and only genes with at least four non-wild type samples were included in this calculation.Proteomics analysis of vinorelbine-treated MSE cellsSample preparation for mass spectrometry (MS) analysisTo profile the proteomes of MSE cells following 24 h of Vinorelbine treatment, frozen cells from FB002 (BRCA ascites) were thawed. Directly after thawing, 20 million cells per condition were incubated at a density of 1 million cells/ml either with DMSO or 10 μM vinorelbine for 24 h. DMSO-treated cells were used as control. For the protein extraction following the treatment, cells were split into three replicates for the vinorelbine treatment and four replicates for the DMSO controls. Cells were transferred to 50-ml tubes, centrifuged at 500 × g for 4 min at 4 °C, and washed with 10 ml of cold PBS. After two additional washing steps with cold PBS, the samples were transferred to 1.5-ml tubes and flash-frozen in liquid nitrogen after removal of PBS. Frozen cell pellets were resuspended in 300 µl cold lysis buffer (LB: 1 mM MgCl2, 150 mM KCl, 100 mM HEPES, pH 7.5), supplemented with 1× complete EDTA free protease inhibitor (Roche), transferred to bead-beater tubes and mixed with the same volume of ceramic beads (OMNI International). Subsequently, cells were lysed at 4 °C using a FastPrep‐24TM 5 G bead-beating grinder (MP Biomedicals) at a speed of 5.5 m/sec for 40 seconds. Samples were then transferred to new tubes using a gel-loader tip and spun down at 800 × g for 5 min at 4 °C to remove cell debris. Supernatants were transferred to new pre-cooled tubes, and protein concentrations were determined using the Pierce BCA Protein Assay Kit (Thermo Fisher Scientific) according to the manufacturer’s instructions. Protein concentrations were adjusted to 1 µg/µl in LB buffer, and 50 µl of lysate per sample were transferred to PCR tube strips. The strip was incubated at 99 °C for 5 minutes. After incubation, the samples were cooled down to 4 °C and sodium deoxycholate (Sigma Aldrich) was added to a final concentration of 5%. Cysteine residues were reduced with 5 mM tris(2-carboxyethyl)phosphine hydrochloride (Pierce) at 37 °C for 40 minutes with shaking at 600 rpm. The reduced cysteines were alkylated by addition of 40 mM iodoacetamide (Sigma Aldrich) at room temperature for 20 min. Samples were diluted with 4 volumes of 100 mM ammonium bicarbonate (Sigma Aldrich). Proteins were digested with lysyl-endopeptidase (FUJIFILM Wako Pure Chemical Corporation) and modified trypsin (Promega) at a 1:100 enzyme to substrate ratio (wt/wt) at 37 °C on a thermoshaker overnight. Protease digestion was quenched by adding formic acid (FA, Sigma Aldrich) to a final concentration of 4%. Precipitated sodium deoxycholate was removed using the AcroPrep Advance 96-well plate with 0.2 µm wwPTFE membrane (Pall Life Sciences). Samples were desalted using a 96-well C18-Spin plate with 7–70 µg capacity (The Nest Group). Peptides were eluted with 50% acetonitrile and 0.1% FA. Eluates were dried using vacuum centrifugation and resuspended in 25 µl buffer A with iRT peptides (1:30) and transferred to 1.5 ml Eppendorf tubes. Samples were vortexed and sonicated for 10 minutes and centrifuged at maximum speed for 20 minutes at 4 °C. 10 µl of the supernatant was transferred to MS vials. A second centrifugation was carried out at maximum speed for 5 min at 4 °C. The sample acquisition volume is 2 µl.LC-MS/MS data acquisitionPeptide digests were analyzed in randomized order on an Orbitrap Fusion Lumos Tribrid MS (Thermo Fisher Scientific) equipped with a nanoelectrospray ion source and coupled to an Easy-nLC 1200 system (Thermo Fisher). 2 µg of peptides were separated at ambient temperature on a 25 cm × 75 µm i.d. analytical column packed with 2.0 µm C18 beads (Acclaim PepMap C18 from Thermo Fisher) using a linear gradient from 5 to 40% buffer B (B:80% Acetonitrile and 0.1% formic acid; A: 2% Acetonitrile and 0.1% formic acid) over 120 minutes at a flow rate of 300 nl/min. Full MS1 scans were acquired at a resolution of 120,000 between 350 and 1500 m/z. The automatic gain control (AGC) target of 8 × 105 and a maximum injection time of 100 ms were used. 41 variable-width windows (Supplementary Data 13) were utilized to measure fragmented precursor ions. DIA-MS2 spectra were acquired at a resolution of 30,000 an AGC target of 5 × 105, and an injection time of 54 ms. The normalized collision energy was set to 30.Quantification and statistical analysisThe DIA raw files were searched using the default directDIA+ pipeline of Spectronaut (Biognosys AG, version 18.7.240325.55695) against the human UniProt FASTA (downloaded on 17.07.2020), including the sequence of proteinase K. Search criteria included carbamidomethylation of cysteine as a fixed modification, as well as oxidation of methionine and acetylation (protein N-terminus) as variable modifications. Up to two missed cleavages were allowed. The dynamic mass tolerance strategy was applied to calculate the ideal mass tolerances for data extraction, and no correction factor was applied (correction factor = 1). The local (non-linear) regression method was used for iRT calibration using the iRT kit peptides. The mutated decoy method was used to generate label-free decoys. The false discovery rate (FDR) was estimated with the mProphet approach99 and set to 1% on precursor and protein level. Protein inference was performed using the implemented IDPicker algorithm to define protein groups100. Protein quantification was performed on the MS2 level, and the intensities of protein groups were calculated as the mean of the intensities of the top 3 most abundant peptides. Differential protein abundance was assessed using limma-trend85. The analysis was performed on 5458 proteins that were detected across conditions, had an abundance of at least 3000 in all replicates of at least one experimental condition, and did not contain any strong outliers (more than 10-fold difference in abundance within replicates of one condition).RT-qPCR of tumor-type-specific genesPreviously extracted RNA was reverse transcribed using the iScript cDNA synthesis kit (BioRad #1708890) according to the manufacturer’s instructions. qPCR was performed using a custom TaqMan array (ThermoFisher #4413262) containing probes for the following 11 genes of interest: NAPSA (Hs00362192_m1), SFTA2 (Hs01588704_g1), NKX2-1 (Hs00968940_m1), CDH6 (Hs00191832_m1), EMX2 (Hs00244574_m1), PAX8 (Hs00247586_m1), VTN (Hs00940758_g1), CLDN15 (Hs00204982_m1), HEG1 (Hs00393516_m1), GATA3 (Hs00231122_m1), CREB3L4 (Hs00370116_m1). Additionally, the array contained four endogenous controls: 18 s rRNA (Hs99999901_s1), GAPDH (Hs99999905_m1), HPRT (Hs99999909_m1), and GUSB (Hs99999908_m1). Each gene was assayed in triplicate per sample, using 10 ng of cDNA as a template in each reaction.Gene expression (delta cycle threshold, dCT) was quantified by subtracting the mean CT values of the four endogenous controls from the CT values of the genes of interest.Multi-omics factor analysisMOFA was run as implemented in the R package MOFA2 (v.1.2.2) with default parameters. The input into MOFA was 4 data modalities (Supplementary Data 6): 1) sample composition (12 variables × 149 samples, Supplementary Data 3); (2) scaled drug responses (101 variables × 2 readouts × 149 samples, Supplementary Data 5); 3) gene expression of the 10% most variable genes (853 variables × 131 samples, Supplementary Data 7); (4) binarized mutational profile (Supplementary Data 8). Here, only genes with any alteration in at least 10 samples were included (22 genes, see Supplementary Fig. 6). Mutations were then further split into the type of alteration (short variant (SV), amplification (AMP), deletion (DEL), rearrangement (RA) and whether they have a known somatic impact, resulting in a total of 176 variables measured across 98 samples. For the mutations, the MOFA likelihood parameter was set to “bernoulli”, for all others, it was left at the default (gaussian). The top-15 factors were considered for downstream analysis. One sample (FB209) for which we only had RNA-seq and mutation data was excluded from MOFA analysis.Selection of top contributing features per factor for visualizationFor visualization, we selected the top 10 features with the highest weight per factor. Only features with absolute weights of at least 0.05 were included. To avoid showing only highly correlated features from the same data type, we selected at most five features per data type.Identifying disease-specific factorsTo identify disease-specific factors, we compared the values of each factor between every single tumor type and all others. Significance was assessed using a two-sided two-sample Wilcoxon test. In addition, for each factor and tumor type, we calculated an area under the receiver operating characteristic curve (AUROC) using the factor value as a binary classifier distinguishing a given tumor type from all others. Factors that specifically separate a certain tumor type from all others will thus have high AUROC values for this tumor type.Calculation of patient specificity per factorPatient specificity per factor was calculated using a linear regression model. We ran one model per patient and factor, using binarized patient ID as a predictor and factor value as the response variable, and then defined patent specificity as the maximum R-squared obtained per factor. This corresponds to the maximum amount of variance that can be attributed to a single patient per factor.Immunohistochemistry against c-METImmunohistochemical evaluation was performed using anti c-MET (SP44) rabbit monoclonal primary antibody (catalog # ab227637, Ventana Medical Systems, Tucson, AZ), dilution 1:50. The staining was carried out according to the manufacturer’s protocol on the BenchMark XT platform from Ventana utilizing the ultraView detection kit.Reporting summaryFurther information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Hot Topics

Related Articles