Multiomic profiling of medulloblastoma reveals subtype-specific targetable alterations at the proteome and N-glycan level

Subject detailsThis research complies with all relevant ethical regulations. Investigations were performed in accordance with local and national ethical rules of patient’s material and have, therefore, been performed in accordance with the ethical standards laid down in an appropriate version of the 1964 Declaration of Helsinki. The study protocol was approved by the Ethics Committee of the Hamburg Chamber of Physicians. All patients and parents or legal adult representatives provided informed consent in written format permitting scientific use of the data. There was no compensation provided for participation. All samples underwent anonymization.In house patient samples for main cohort and biological validation cohortFFPE Medulloblastoma samples of tumors within the years 1976–2022 were obtained from tissue archives from neuropathology units in Munich (Ludwig-Maximilians-University), Heidelberg (University Hospital Heidelberg), Hannover (Hannover Medical School (MHH)), Aachen (RWTH Aachen University Hospital), Augsburg (University of Augsburg) and Hamburg (University Medical Center Hamburg-Eppendorf). Some of these samples were collected as part of the HIT-MED study, which is a registry for developing treatments in children and adolescents with aggressive pediatric brain tumors such as Medulloblastoma and Ependymoma. The validation samples (both technical and biological validation) were a subset from all the samples collected from all the different institutions and HIT-MED. Some samples (Supplementary Data 1c, Supplementary Data 11) were part of SIOP-PNET5. SIOP-PNET5 is a clinical trial (NCT02066220) within HIT-MED, in which the primary outcomes are identification of long-term damage to disease and therapy, therapy deacceleration in low-risk patients to name a few. The PNET5 study protocol planned “comprehensive genome-wide investigations of medulloblastoma” as exploratory analyses without pre-defined methods. The present analysis was not a planned SIOP-PNET5-MB study question, but was done from archival material of PNET5-participants from the author’s own institution and informed consent of the trial participants. To avoid potential interference with the analysis of SIOP-PNET5-MB trial analyses, the inclusion of these patients was discussed with the PNET5 principal investigator (Stefan Rutkowski) and the analyses were classified not to interfere with predefined SIOP-PNET5-MB study hypotheses. Included PNET5 samples were used for all the analyses in this study, but excluded from survival analysis, since this clinical trial is still not yet published. Details of the samples used in this study can be found in Supplementary Data 1c (n = 6, in the main cohort) and Supplementary Data 11 (n = 2, in the biological validation cohort).Tumor samples were fixed in 4% paraformaldehyde, dehydrated, embedded in paraffin, and sectioned at 10 µm for microdissection using standard laboratory protocols. For further information on clinical details of samples, please refer to Supplementary data 1c and Supplementary Data 11 (ncurrent study main cohort with successful proteome subtype assignment = 62, nForget et al. (PMID: 302050439) with successful proteome subtype assignment = 38, nArcher et al. (PMID: 30205044) with successful proteome subtype assignment = 45, nPetralia et al. (PMID: 33242424) = 22, ntechnical validation cohort = 57, nbiological validation cohort = 30). An overview of all measured protein samples can be found in Supplementary Table 3.Medulloblastoma cell linesThe human Medulloblastoma cell lines DAOY (Ca#HTB-186) and D283med (Ca#HTB-185) were obtained from ATCC, Manassas, VA, USA. DAOY and D283med were authenticated using Eurofins using STR-profiling analysis. UW473 was kindly provided by Michael Bobola. All lines were used as Standards for TMT batches. Cells were cultivated in DMEM (Dulbecco’s Modified Eagle Medium, PAN-Biontech) supplemented with 10 % FCS at 37 °C, 5% CO2.Publicly available datasetsFor the data integration and harmonization of in-house and publicly available DNA Methylation data the following datasets were used: Archer et al.16: 42 FF MB samples, accessible as a subset of European Genome-phenome Archive ID: EGAS00001001953. Forget et al.17: 38 FF MB samples, accessible via Gene Expression Omnibus (GSE104728). For the analysis of RNA Expression data, processed and normalized data from the following datasets were used: Cavalli et al. (2017)5: 763 MB samples, accessible via Gene Expression Omnibus (GPL22286)5. For the data integration and harmonization of in-house and publicly available proteome data, the following datasets were included: Archer et al.16: 45 FF MB samples, available via the MassIVE online repository (MSV000082644, Tandem Mass Tag- (TMT) label-based protein quantification); Forget et al.17: 39 FF MB samples, available via the PRIDE archive (PXD006607, stable isotope labeling by amino acids in cell culture- (SILAC) label-based protein quantification); Petralia et al.15, 23 FF MB samples, available through the Clinical Proteomic Tumor Analysis Consortium Data Portal (https://cptac-data-portal.georgetown.edu/cptacPublic/) and the Proteomics Data Commons (https://pdc.cancer.gov/pdc/, Tandem Mass Tag- (TMT) label-based protein quantification). For validation of determined proteome subtypes, as well as the investigation of the proteome profile of ELP1 mutated SHH MB, a dataset published by Waszak et al. 29. was used (23 FF MB samples, available via the PRIDE archive (PXD016832, Data independent acquisition label free protein quantification).Sample preparation and data acquisitionDNA methylation profilingDNA methylation data was generated from FFPE tissue samples. DNA was isolated using the ReliaPrepTM FFPE gDNA Miniprep system (Promega) following the manufacturer’s instructions. 100–500 ng DNA was used for bisulfite conversation using the EZ DNA Methylation Kit (Zymo Research). Then the DNA Clean & Concentrator-5 (Zymo Research) and the Infinium HD FFPE DNA Restore Kit (Illumina) were applied. Infinium BeadChip array (EPIC) using manufacturer’s instructions was then used to quantify the methylation status of CpG sites on an iScan (Illumina, San Diego, USA). Data has been deposited using accession numbers GSE222478 and GSE243768 (linked to Series GSE243796). Additionally, previously published data measured on Infinium Human Methylation 450 BeadChip array (450 K) were included from EGAS0000100195316 from GSE10472817, and GSE13005178.Proteome profiling (main cohort, FFPE samples)FFPE MB tissue sections were deparaffinized with N-heptane for 10 min and centrifuged for 10 min at 14,000 g. The supernatant was discarded. Proteins were extracted in 0.1 M triethyl ammonium bicarbonate buffer (TEAB) with 1% sodium deoxycholate (SDC) at 99 °C for 1 h. Sonification was performed for ten pulses at 30% power, to degrade DNA, using a PowerPac™ HC High-Current power supply (Biorad Laboratories, Hercules, USA)) probe sonicator. For cell lines, proteins were extracted in 0.1 M TEAB with 1% SDC at 99 °C for 5 min. Sonification was performed for six pulses.The protein concentration of denatured proteins was determined by the Pierce BCA Protein assay kit (Thermo Fischer Scientific, Waltham, USA), following the manufacturer’s instructions. 60 μg of protein for each tissue lysate and 30 μg protein for each cell lysate were used for tryptic digestion. Disulfide bonds were reduced, using 10 mM dithiothreitol (DTT) for 30 min at 60 °C. Alkylation was achieved with 20 mM iodoacetamide (IAA) for 30 min at 37 °C in the dark. Tryptic digestion was performed at a trypsin:protein ratio of 1:100 overnight at 37 °C and stopped by adding 1% formic acid (FA). Centrifugation was performed for 10 min at 14,000 g to pellet precipitated SDC. The supernatant was dried in a vacuum concentrator (SpeedVac SC110 Savant, (Thermo Fisher Scientific, Bremen, Germany)) and stored at −80 °C until further analysis.For the main cohort, 50 μg sample per patient and internal reference, TMT-10 plex labeling (Thermo Fischer Scientific, Waltham, USA), was performed, following the manufacturer’s instruction. All 70 patients were run in 8 total TMT 10-plexes. Sample assignment to batches was performed in a semi-randomized manner, according to the four main molecular subtypes. In each batch, 1–2 internal reference samples were included, composed of equal amounts of peptide material from all 70 samples and cell lines. Isobarically labeled peptides were combined and fractionated, using high pH reversed-phase chromatography (ProSwiftTM RP-4H, Thermo Fischer Scientific Bremen, Germany) on an HPLC system (Agilent 12000 series, Agilent Technologies, Santa Crara, USA). Separation was performed using buffer A (10 mM ammonium hydrogen carbonate (NH4HCO3) inH2O) and buffer B (10 mM NH4HCO3 in ACN) within a 25-min gradient, linearly increasing from 3 to 35% buffer B at a flow rate of 200 nl/min. In total, 13 fractions were collected for each batch, dried in a vacuum concentrator, resuspended in 0.1% FA to a final concentration of 1 mg/ml and subjected to high pH liquid chromatography coupled mass spectrometry (LC-MS). All LC-MS measurements were performed on a UPLC system (Dionex Ultimate 3000, Thermo Fisher Scientific, Bremen, Germany, trapping column: Acclaim PepMap 100 C18 trap ((100 μm × 2 cm, 100 Å pore size,5 μm particle size); Thermo Fisher Scientific, Bremen, Germany), analytical column: Acclaim PepMap 100 C18 analytical column ((75 μm × 50 cm, 100 Å pore size, 2 μm particle size); Thermo Fisher Scientific, Bremen, Germany)), coupled to an quadrupole-orbitrap-iontrap mass spectrometer (Orbitrap Fusion, Thermo Fisher Scientific, Bremen, Germany). Separation was performed using buffer A (0.1% FA in H20) and buffer B (0.1% FA in H20) within a 60-min gradient, linearly increasing from 2-30% buffer B at a flow rate of 300 nl/min. Eluting peptides were analyzed, using a DDA-based MS3 method with synchronous precursor selection (SPS), as described by McAlister et al. 79. For MS—raw data please refer to the PRIDE archive (PXD039319).Proteome profiling for biological and technical validation cohortThe deparaffinization and quantification were conducted as previously described.20 µg of the provided samples were dissolved to a concentration of 70% ACN. 2 µL carboxylate modified magnetic beads (GE Healthcare Sera-Mag™, Chicago, USA) at 1:1 (hydrophilic/hydrophobic) in methanol were added following the SP3-protocol workflow80. Samples were shaken at 1400 rpm for 18 min RT and the supernatant was removed. Beads were washed two times with 100% ACN and two times with 70% EtOH. After resuspension in 50 mM ammonium bicarbonate, disulfide bonds were reduced in 10 mM DTT for 30 min, alkylated in the presence of 20 mM IAA for 30 min in the dark and digested with trypsin (sequencing grade, Promega) at 1:100 (enzyme:protein) at 37 °C overnight while shaking at 1400 rpm. Peptides were bound in 95% ACN and shaken at 1400 rpm for 10 min RT. The supernatant was and the beads were again two times with 100% ACN. Elution of peptides was performed with 20 µL 2% DMSO in 1% formic acid (FA). Samples were dried in a vacuum centrifuge and stored at −20 °C until further use.For the measurement samples were resuspended in 0.1% FA to a final concentration of 1 mg/ml and measured on either a Quadrupole Orbitrap hybrid mass spectrometer (QExactive, Thermo Fisher Scientific) or on a quadrupole-ion-trap-orbitrap MS (Orbitrap Fusion, Thermo Fisher) in orbitrap-orbitrap configuration. For MS—raw data please refer to the PRIDE archive (PXD048767.).
Quadrupole Orbitrap hybrid mass spectrometer set-up
Chromatographic separation of peptides was achieved by nano UPLC (nanoAcquity system, Waters) with a two-buffer system (buffer A: 0.1% FA in water, buffer B: 0.1% FA in ACN). Attached to the UPLC was a peptide trap (100 µm × 20 mm, 100 Å pore size, 5 µm particle size, Acclaim PepMap 100 C18 trap, Thermo Fisher Scientific) for online desalting and purification followed by a 25-cm C18 reversed-phase column (75 µm × 200 mm, 130 Å pore size, 1.7 µm particle size, Peptide BEH C18, Waters). Peptides were separated using an 80-min gradient with linearly increasing ACN concentration from 2% to 30% ACN in 65 min. The eluting peptides were analyzed on a Quadrupole Orbitrap hybrid mass spectrometer (QExactive, Thermo Fisher Scientific). Here, the ions being responsible for the 15 highest signal intensities per precursor scan (1 × 106 ions, 70,000 Resolution, 240 ms fill time) were analyzed by MS/MS (HCD at 25 normalized collision energy, 1 × 105 ions, 17,500 Resolution, 50 ms fill time) in a range of 400–1200 m/z. A dynamic precursor exclusion of 20 s was used.

Quadrupole-ion-trap-orbitrap mass spectrometer set-up
Chromatographic separation of peptides was achieved with a two-buffer system (buffer A: 0.1 % FA in water, buffer B: 0.1% FA in ACN). Attached to the UPLC was a peptide trap (100 μm × 200 mm, 100 Å pore size, 5 μm particle size, Acclaim PepMap 100 C18 trap, Thermo Fisher Scientific) for online desalting and purification followed by a 25 cm C18 reversed-phase column (75 μm × 250 mm, 130 Å pore size, 1.7 μm particle size, Peptide BEH C18, Waters). Peptides were separated using an 80-min gradient with linearly increasing ACN concentration from 2% to 30% ACN in 65 min. Eluting peptides were ionized using a nano-electrospray ionization source (nano-ESI) with a spray voltage of 1800, transferred into the MS, and analyzed in data-dependent acquisition (DDA) mode. For each MS1 scan, ions were accumulated for a maximum of 240 ms or until a charge density of 1 × 106 ions (AGC Target) was reached. Fourier-transformation-based mass analysis of the data from the orbitrap mass analyzer was performed, covering a mass range of 400–1200 m/z with a resolution 60,000. Peptides with charge states between 2+ and 5+ above an intensity threshold of 1 × 105 were isolated within a 2 m/z isolation window from each precursor scan and fragmented with a normalized collision energy of 25% using higher energy collisional dissociation (HCD). MS2 scanning was performed at a resolution of 17,500 on the quadrupole-ion-trap-orbitrap MS in orbitrap-orbitrap configuration, covering a mass range from 100 m/z and accumulated for 50 ms or to an AGC target of 1 × 105. Already fragmented peptides were excluded for 15 s.
Histology and ImmunohistochemistryFFPE tissue samples were sectioned into 2 µm thick slices, according to standard laboratory protocols. Immunohistochemical stainings were performed on an automated staining machine (Ventana BenchMark XT, Roche Diagnostics, Mannheim, Germany). The following primary antibodies were used: ALDH1A3 (NBP2-15339, Novus Biologicals, 1:1000), c-myc (Z2734RL, Zeta Corporation, 1:25), TENASCIN C (SAB4200782, Sigma-Aldrich, 1:1000), PALMD (NBP2-55156, Novus Biologicals, 1:750). Further information on the antibodies and staining program can be found in Supplementary Table 1.Transcriptome profilingMaxwell RSC RNA FFPE Kit was used to isolate RNA from 10 × 10 µm sections of FFPE tissue (PROMEGA Maxwell RSC RNA FFPE kit). RNA 6000 Nano Chip on an Agilent 2100 Bioanalyzer (Agilent Technologies) was used to analyse RNA integrity. From 400 ng total per sample, ribosomal RNA was depleted with the help of the RiboCop rRNA Depletion Kit (Lexogen) followed by RNA sequencing library generation using the CORALL Total RNA-Seq Library Prep Kit (Lexogen), followed by the Lexogen CORALL total RNA-Seq V2 Library Prep Kit with UDIs (according to manufacture protocol, short insert size version). Illumina NextSeq2000 machine using the P3 Reagents/100 cycle kit as paired-end sequencing 2 × 57 bp (+2× index read 12 bp). Data have been deposited under accession number GSE243795.Metabolic and amino acid profiling13C-Labeled Metabolite Yeast Extract (Catalog No. ISO-1, ISOtopic solutions e.U.) LOT: 20211007 and Canonical Amino Acid Mix (Catalog No. MSK-CAA-1, Cambridge Isotope Laboratories, Inc. (CIL)) were prepared according to instructions. Tissue sections of sSHH and tSHH medulloblastoma samples were deparaffinized by two 5 min washes in xylene. 20 µL of 13C-Labeled Metabolite Yeast Extract and 1 µL of diluted 0.1 M Canonical Amino Acid Mix were added, and samples were then homogenized in 180 µL water using the TissueLyser (Qiagen N.V., Netherlands) at 20 Hz for 2 min. Afterwards, protein precipitation and metabolite extraction were achieved by adding ice-cold methanol twice (800 µL and 400 µL) and 80% methanol (200 µL). The supernatant was combined and dried in a vacuum concentrator centrifuge, and stored at −20 °C until further use.Polar and polar ionic metabolites were analyzed by single ion monitoring (SIM) mass spectrometry coupled to ion chromatography and IC-SIM-MS raw data processing was performed as described by van Pijkeren and Egger et al. 81. using a quadrupole orbitrap mass spectrometer (Exploris 480, Thermo Fisher Scientific) and an ICS-6000 (Thermo Fisher Scientific).Amino acids were analyzed by multiple reaction monitoring (MRM) mass spectrometry using a triple quadrupole mass spectrometer coupled to ultra-high performance liquid chromatog-raphy (UPLC). Amino acids were separated using an Acquity Premier UPLC system (Waters) equipped with an Atlantis Premier BEH C18 AX column (1.7 μm, 2.1 × 150mm, Waters) heated to 45 °C. A gradient of mobile phase A (water, 0.1% formic acid (FA)) and mobile phase B (acetonitrile, 0.1% FA) was applied as followed: 1% B at 0.350 mL/min for 1 min, to 20% B in 1 min at 0.350 mL/min, to 40% B in 0.5 min at 0.350 mL/min, to 95% B in 1.5 min at 0.450 mL/min, hold for 0.5 min, for re-equilibration, switch to 1% B in 0.1 min at 0.450 mL/min, hold for 0.1 min at 0.450 mL/min and hold for 1.3 min at 0.350 mL/min. Samples were measured on a Xevo-TQ XS Mass spectrometer (Waters) equipped with an electrospray ionization source operated in positive ion mode. The mass spectrometer was operated in multiple reac-tion monitoring (MRM) mode using individual cone and collision voltages for each amino acid and its internal standard (Supplementary data 1). Raw files were analyzed by MS Quan in Waters Connect (Waters, V1.7.0.7). Details on MRM settings per metabolite and internal standard can be found in Supplementary Table 2.For MS raw data of the metabolites and amino acids please refer to MetaboLights repository82 MTBLS9830 and MTBLS9836, respectively.N-Glycan profiling100 µg of protein for 18 samples was denatured, reduced, and alkylated as described above. Samples were concentrated by 3 kDa Amicon Ultra centrifugal filters (Merck Millipore, R0NB30416) with 100 mM NH4HCO3 to exchange the buffer and retain globular particles above 3 kDa. Thirty units of PNGase F were added to each sample and incubated in a 37 °C Thermomixer for 24 h. After PNGase F digestion, purified N-glycans were eluted by Sep-Pak C18 cartridges (Water, WAT023590) with 5% acetic acid and dried in a speed vacuum. The purified N-glycans were then permethylated using an optimized solid-phase permethylation method and analyzed via LC-MS measurement as mentioned here83. Glycan data has been deposited at GlycoPOST84 with the identifier GPST000414.Raw data processingProcessing of DNA methylation array dataIdat files generated using the above protocol were processed in R (Version 4.0.5). The files were read using the minfi package (Version 1.36.0)85. Differentially methylated probes/CpG sites were found using the limma package (Version 3.46.0)86, corrected for multiple testing using Benjamini Hochberg (cut-off 5% FDR). M-values of 10,000 differentially methylated CpG sites which could cluster subtypes based on biological differences were selected for further analysis. Similarly, DMR analysis was performed using DMRcate package (V4.30.0). For DMR analysis, we set a min of 10 CpGs per DMR (<1000 nt from each other) to minimize gene overlap, which resulted in ~9000 DMRs with each DMR having 10–200 CpGs.Processing of Proteome raw data for main cohort
Processing of Proteome raw data for the integrated cohort
Obtained raw data from in-house generated and publicly available (Archer et al. 16, TMT 10-Plex; Petralia et al. 15, TMT 11-Plex). TMT-based LC-MS measurements were processed with the Andromeda algorithm, implemented in the MaxQuant software (Max Plank Institute for Biochemistry, Version 1.6.2.10)87 and searched against a reviewed human database (downloaded from Uniprot February 2019, 26,659 entries).). The Carboxymethylation of cysteine residues was set as a fixed modification. Methionine oxidation, N-terminal protein acetylation and the conversion of glutamine to pyroglutamate were set as variable modifications. Peptides with a minimum length of 6 amino acids and a maximum mass of 6000 Da were considered. The mass tolerance was set to 10 ppm. The maximum number of allowed missed cleavages in tryptic digestion was two. A false discovery rate (FDR) value threshold <0.01, using a reverted decoy peptide databases approach, was set for peptide identification. Quantification was performed, based on TMT reporter intensities at MS3 level for LC-MS3 in-house data and at MS2 level for LC-MS2 data, acquired by Archer et al. 16 and Petralia et al. 15. All studies were searched separately. Fractions for each TMT batch were searched jointly.
For stable isotope labeling by amino acids in cell culture (super-SILAC) data, acquired by Forget et al. 17, log2 transformed SILAC ratios were directly obtained from the MassIVE online repository (MSV000082644).
For the external validation the dataset published by Waszak et al. 29. was used. The DIA raw data spectra were downloaded from PRIDE and processed using Data Independent Acquisition with Neural Networks (DIA-NN, version 1.8.1)88. The spectra were searched against a peer-reviewed human FASTA database (downloaded from UniProt April 2020, 20,365 entries). A spectral library was generated in silico by DIA-NN using the same FASTA database. Smart profiling was enabled for library generation. Methionine oxidation, carboxymethylation of cysteine residues as well as N-terminal methionine excision were set as variable modifications. The maximum number of variable modifications was set to three, the maximum number of missed cleavages was two. The peptide length range was set from 7 to 30. Mass accuracy, MS1 accuracy, and the scan window were optimized by DIA-NN. An FDR < 0.01 was applied at the precursor level—decoys were generated by mutating target precursors’ amino acids adjacent to the peptide termini. Interference removal from fragment elution curves as well as normalization were disabled. Neural network classifier was set to single-pass mode and the fixed-width center of each elution peak was used for quantification.

Processing of the biological and technical validation cohorts
The spectra were searched with the Sequest algorithm integrated in the Proteome Discoverer software (v 3.0.0.757), Thermo Fisher Scientific) against a reviewed human database (downloaded from Uniprot in June 2021, Containing 20,683 entries)). Carbamidomethylation was set as fixed modification for cysteine residues and the oxidation of methionine, and pyro-glutamate formation at glutamine residues at the peptide N-terminus, as well as acetylation of the protein N-terminus were allowed as variable modifications. A maximum number of 2 missing tryptic cleavages was set. Peptides between 6 and 144 amino acids where considered. A strict cutoff (FDR < 0.01) was set for peptide and protein identification. Quantification was performed using the Minora Algorithm, implemented in Proteome discoverer.
Processing of N-Glycan raw dataN-Glycan raw data were opened with Xcalibur Qual Browser (Version No 4.2.28.14). MaxQuant were used for extracting all the detected masses and m/z from MS raw data of permethylated reducing N-glycans. An in-house Python-script was used to extract and calculate monosaccharide compositions based on the molecular weight of each derivatized N-glycan89. The N-glycan structures were identified, matched to N-glycan compositions and quantified using the Xcalibur, Glycoworkbench 2.1 and Skyline software (Version No 21.1.0.278)83. Further statistical analysis was performed with the Perseus software.Processing of raw transcriptome dataRaw fastq files of human samples were processed in usegalaxy.eu90. Low quality reads were detected using FastQC (Galaxy Version 0.73+galaxy0), and Trimmomatic (Galaxy Version 0.38.1) was used for trimming poor quality reads (reads with average quality <20). Reads were aligned to the GRh38 human reference genome using STAR aligner (Galaxy Version 2.7.8a+galaxy1). Gene expression was quantified with featureCounts (Galaxy Version 2.0.1+galaxy2) and VST-normalized files were generated by DEseq2 (Galaxy Version 2.11.40.7+galaxy2). Further processing of data was performed with R (v4.2.1). Transcriptome data was combined with publicly available transcriptome data16. Batch corrected with HarmonizR26.Processing of DNA methylation array dataRaw signal intensities for EPIC and 450 K files were read individually. Since ~93% of the loci of 450 K array are also present on EPIC array, they can be combined using minfi’s combineArrays(). After combining the two arrays they can be output as a virtual array. In this study, 450 K array was the output virtual array since a greater number of samples were measured on 450 K.The detection P value was used to identify sample quality and filter out bad quality samples (none were excluded, n = 0). Further, probes having bad quality (n = 49,091), probes with single nucleotide polymorphism (n = 12,868) and probes present on X and Y chromosomes (n = 8777) were filtered out. After normalization and probe filtering, the m-values log2(M/U) where methylation intensity is denoted by M and unmethylation intensity denoted by U were used for further analysis.Data normalization and integrationNormalization and integration of DNA methylation array dataSingle-sample noob normalization (ssNoob) was performed since we combined samples from different arrays (EPIC and 450 K). The detailed method development has been mentioned91,92.Normalization and integration of Proteome dataPrior to data integration, protein abundances were handled separately for each dataset. TMT reporter intensities were log2 transformed and median normalized across columns. Technical variances between TMT batches were corrected, using HarmonizR framework (Version 0.0.0.9). As described here26, mean subtraction across rows was applied to batch-effect corrected TMT reporter intensities to mimic SILAC ratios, prior to data integration. Log2 transformed super SILAC ratios were median normalized across columns prior to data integration.Processed data from individual studies was combined based on the UniProt identifier, data harmonization was performed as described above. Combined, harmonized protein abundances were mean-scaled across rows. Out of 176 analyzed cases, 9 patients were excluded from further analysis, as high blood protein yields, suppressing tumor-specific signals, were detected from LC-MS/MS measurements (Supplementary data 1a).For the external validation cohort protein abundances were log2 transformed and median normalized across columns. Samples were assigned to proteome subtypes individually. Protein abundances were reduced to the 3998 proteins, considered in the main cohort. Harmonized protein abundances from the main cohort were integrated with each individual sample. Mean row normalization was performed to adjust values from validation samples to the main cohort. Pearson correlation-based hierarchical clustering, with average linkage was applied using the Perseus software (Max Plank Institute for Biochemistry, Version 1.5.8.5)93.For biological and technical validation cohort the data was processed and harmonized as described above. For the biological validation, one sample had to be excluded due to high blood protein yields as described above. The proteome subtypes for the biological validation were assigned via the ACF classifier94. The proteome subtypes for the technical validation were taken from the main cohort. Protein abundances were treated as above.Normalization of N-Glycan dataN-Glycan intensities were log2 transformed and median normalized across columns to compensate for injection amount variations.Quantification and statistical analysisDimensionality reduction and hierarchical clusteringNonlinear Iterative vertical Least Squares (NIPALS) PCA and hierarchical clustering were performed in the R software environment (version 4.1.3). For Principal component calculation and visualization, the mixOmics package (Version 6.19.4.)31 was used in Bioconductor (version 3.14). Hierarchical clustering was performed based on pheatmap package (version 1.0.12) and ComplexHeatmap (Version 2.6.2)95. Pearson correlation was applied as a distance metric. Ward.D linkage was used. Pairwise complete correlation was used, to enable the consideration of missing values.Consensus clusteringTo determine the ideal number of clusters from proteome and DNA-methylation data, Consensus Clustering was applied on normalized and integrated datasets, using the ConsensusClusterPlus package (Version 1.6)96, in the R software environment (version 4.1.3). In correspondence with the current maximum number of suspected MB subtypes, the number of clusters was varied from 2 to 12 and calculated with 1000 subsamples for all combinations of two clustering methods (Hierarchical clustering (HC) and partition around medoids (PAM)) and three distance metrics (Euclidean, Spearman, Pearson). The Ward’s method was applied for linkage. Missing value tolerant pairwise complete correlation was used, to enable the consideration of missing values. For each sample, the cluster certainty was calculated by how many times under the application of different distance metrics (Euclidean, Spearman, Pearson) and clustering approaches (k-medoids, hierarchical clustering) a sample was associated with a certain cluster, while allowing a total number of six clusters.Differential analysis and visualizationStatistical testing was carried out, using the Perseus software93. ANOVA testing was performed for the comparison across multiple subgroups/subtypes. Factors, identified with p value < 0.05 were considered statistically significant differential abundant across groups. For the identification of subtype-specific biomarkers, Students t-testing was applied (p value < 0.05, Foldchange difference > 1.5). Visualization of t-test results and abundance distributions across groups was performed in PRISM (GraphPad, Version 5) and Microsoft excel (Version 16.5.).Functional annotation of data setsREACTOME- based97 Gene Set Enrichment Analysis was performed by using the GSEA software (version 4.1, Broad Institute, San Diego, CA, USA)98. 1000 permutations were used. Permutation was performed based on gene sets. A weighted enrichment statistic was applied, using the signal-to-noise ratio as a metric for ranking genes. No additional normalization was applied within GSEA. As in default mode, gene sets smaller than 15 and bigger than 500 genes were excluded from analysis. For visualization of GSEA results, the EnrichmentMap (version 3.3)99 application within the Cytoscape environment (version 3.8.2)100 was used. Gene sets were considered if they were identified at an FDR < 0.25 and a p value < 0.1. For gene-set-similarity filtering, data set edges were set automatically. A combined Jaccard and Overlap metric was used, applying a cutoff of 0.375. For gene set clustering, AutoAnnotate (version 1.3)99 was used, using the Markov cluster algorithm (MCL). The gene-set-similarity coefficient was utilized for edge weighting.Survival curvesKaplan-Meier curves were generated for the overall survival of 121 patients. All Kaplan-Meier curves and log-rank test p values were generated with PRISM (GraphPad, Version 5). A conservative log-rank test (Mantel-Cox) was used for the comparison of survival curves. A significant difference between curves was assumed at a p value < 0.05.Copy number frequency plots of Proteome and DNA Methylome dataCopy number analysis was performed on samples having both methylation and proteomic data (N = 115). Samples from 450 K and EPIC array were read in separately as mentioned above. Data were read using read.metharray.sheet() and read.metharray.exp() using the minfiData package (Version 0.36.0)85. For normalization, preprocessIllumina normalization using MsetEx data containing control samples for normalization of 450 K array data, while for EPIC array data minfidataEPIC (Version1.16.0)85 was used. IlluminaHumanMethylation450kanno.ilmn12.hg19 and IlluminaHumanMethylationEPICanno.ilm10b4.hg19 were used to generate the annotation files of 450 K and EPIC array data respectively.Individual sample CNV plots were generated as mentioned in the Conumee package (Version 1.24.0) vignette, and the segmentation information from each sample was saved and used later for generation of cumulative CNV plot using CNAppWeb tool101(cut-off> = |0.2|) for gain or loss). The segmentation information for all samples belonging to one subtype were combined into a single file in subgroup specific manner and then read into CNAppWeb tool.Combining the segmentation information from proteome data and methylome data in subgroup specific manner, Pearson correlation-based distance plot was generated.To map the protein abundancies to each of the chromosomes, protein names were converted to their respective gene names and a column containing mapping information for these genes was added. Copynumber (Version 1.30.0) package in R was used to generate segmentation information for these proteins. CNAppWeb tool using the cut-off mentioned above was used to map the protein abundancies to respective chromosomes.Integration of proteome and DNA methylome dataDIABLO from mixOmics (Version 6.19.4)31 was used for integration of proteome and methylome data to correlate the two data types. Proteome data (3990 proteins,115 samples) and methylome data (10,000 differentially methylated CpG sites, 115 proteins) were pre-processed as mentioned above. Steps followed were same as explained in the mixOmics vignette. Briefly, datasets were integrated, an output variable containing information about which subgroup the samples belong to was also supplied. Each data set is broken down into components (5 components for this study) or latent variables which are associated with the data. Components were selected using fivefold cross validation repeated 50 times and since the groups were imbalanced lowest overall error rate and centroid distance was used. For each dataset and for each component sparse DIABLO was applied which will select variables contributing maximally to the selected component. sPLS-DA was applied to the selected variables to generate the correlation circus plot (cut-off 0.7) which gives the variables that are either positively or negatively correlating with each other. DMRs between each methylome subtype was found in a pairwise manner, corrected for multiple testing using Benjamini Hochberg (cut-off 5% FDR) and integrated with proteome data in mixOmics.Global correlation of proteome and DNA methylation dataTo check for overall correlation between the two datasets, subgroup-specific (pWNT = 13, pSHHt = 29, pSHHs = 6, pG4 = 36, pG3 = 11, pG3Myc = 20) pearson correlation (cut-off 0.7) was performed between the proteome (3990 proteins and 115 samples) and methylome (381,717 probes and 115 samples) in R (Version 4.0.5. The data was subsetted for correlation value ≥ 0.7 and matches of proteins to their respective probes using Python script in anaconda JupyterLab (Version 3.0.14). Non-subgroup specific pearson correlation between the proteome and methylome data was similarly performed with focus on potential biomarkers for each subgroup and their correlation with methylation probes. Scatterplots of biomarker’s protein abundance and the M-values of CpG sites of its own gene (crossing the pearson correlation cut-off of 0.7) were plotted to confirm the correlations. For correlating DMRs and proteins, mean of all CpG sites belonging to each DMR was taken to find correlation between all DMRs and proteinsFor correlation of CCT complex components, all samples for which we had all three datasets were considered (n = 60) and Pearson correlation ≥0.7 was plotted using circlize(Version 0.4.15) and corrplot (Version 0.92) package in R (Version 4.3.0).Quantification of immunohistochemical stainingsImmunostained tissue sections were digitalized using a Hamamatsu NanoZoomer 2.0-HT C9600 whole slide scanner (Hamamatsu Photonics, Tokyo, Japan). Slide images were exported using NDP view v2.7.43 software. Digital image analysis was performed using ImageJ/Fiji software102 after white balance correction in Adobe Photoshop 2022 (Adobe Inc., San Jose, USA). Tumor areas were labeled via manually drawn regions of interest (ROIs). Tissue areas not eligible for quantification (e.g., non-tumorous tissue, technical or digital artifacts) were excluded from the analysis. Total tumor tissue areas were measured in grayscale-converted images via consistent global thresholding (0, 241) and subsequent pixel quantification within the ROIs. DAB-positive pixels (i.e., brown immunostaining) were quantified on a three-tiered intensity scale after application of the color deconvolution plugin. In detail, pixels were successively quantified within three distinct thresholds [0, 134 (strong/3+); 135, 182 (medium/2+); and 183, 203 (weak/1+)]. Based on the conventional Histo-score, pixel quantities of strong, medium and weak intensity were multiplied by three, two and one, respectively, and then summed up. The hereby generated score is referred to as a digital Histoscore (DH-score).Reporting summaryFurther information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Hot Topics

Related Articles