Proteomic landscape of epithelial ovarian cancer

Ethics statementThis study was approved by the Medical Ethics Committee of Zhejiang Cancer Hospital (IRB-2020-155) and Medical Ethical Committee of Westlake University (20190401GTN0009). The methodologies employed in this study adhered to the ethical standards outlined in the Declaration of Helsinki. Informed consent was signed before sample collection.For the purposes of quality control within our mass spectrometry analysis, liver samples were sourced from a single eight-week-old male C57BL/6 mouse. Upon collection, these samples were promptly stored at −80 °C to ensure their preservation until required for protein extraction. It is important to note that these mouse liver peptides were solely employed as a technical reference to validate the accuracy and reproducibility of our mass spectrometry procedures and were not utilized for any comparative proteomic analyses. As such, their use does not bear on the scientific findings related to the human clinical samples that are central to our study. The use of a single male mouse was deemed sufficient for the technical purpose it served, which is why additional consideration of sex in the study’s design and analysis was not applicable.All experimental procedures involving animals were conducted in strict accordance with ethical standards and were approved by the Laboratory Animal Resources Center of Westlake University, under the approval number 19-027-GTN.Sample collection and histological analysisIn our study on ovarian cancer, we focused solely on female patients because this type of cancer occurs only in women. Therefore, our research did not include a comparison of sexes or genders. We did not collect gender identity through self-report since our study was based on the biological characteristics of the disease, which are specific to female reproductive organs.Surgically resected EOC tissue samples were collected from 753 patients, comprising 555 primary EOC patients who underwent primary debulking surgery (PDS cohort), 74 primary EOC treated with neoadjuvant chemotherapy (NACT cohort), and 124 relapsed EOC patients (RLP cohort; Supplementary Data 1). Additionally, 108 non-carcinoma patients were included for comparison, consisting of 33 normal cases with uterine myoma or cervical cancer but without histologically documented ovarian involvement, 48 benign cases and 31 borderline cases. All patients were diagnosed between 2006 and 2018 following the WHO classification of Tumors: Female genital Tumors (fifth edition). Tissue specimens were collected without any clinical factor-based selection criteria other than availability and were stored at −80 °C. Samples of NACT cohort were collected at the interval debulking surgery. Prior to pathological examination, the tissue specimens were embedded in optimal cutting temperature (OCT) compound, and subsequently subjected to freezing microtome sectioning. Two senior pathologists independently confirmed the histologic subtypes and proportions of tumor nuclei (>60%) using the hematoxylin and eosin-stained histological slices. Surgical staging was determined according to the 1988 International Federation of Gynecology and Obstetrics (FIGO) staging system. A total of 180 plasma samples were collected immediately prior to surgery from 34 cases of patients with normal or benign ovarian tissues and 134 cases of primary patients with high-grade serous carcinoma of PDS-EOC cohort.Supplementary Data 1 contains comprehensive patient data, including age at diagnosis, residual tumor size, histotype, tumor grade, International Federation of Gynecology and Obstetrics (FIGO) stage, lymph metastasis status, chemotherapy frequency, recurrence status, recurrence-free survival time, pre-treatment levels of CA125 and HE4, CA125 levels after the last chemotherapy cycle, and the administration of Bevacizumab or PARP inhibitor therapy. All patients included in this study received platinum and taxane therapy. Any additional treatments with Bevacizumab or PARP inhibitors are indicated in Supplementary Data 1. Patients were divided into two groups based on their response to adjuvant therapy. Those who relapsed within 6 months following the last cycle were classified as the resistant group, while those who relapsed more than 6 months after the last cycle were identified as the sensitive group.For external validation, 57 tissue samples from 49 primary HGSOC patients with PDS and 30 paired plasma samples were collected between 2018 and 2019. All patients were diagnosed as FIGO stage III and IV. These patients received a minimum of six cycles of platinum-based chemotherapy following PDS. Detailed clinical information is listed in Supplementary Data 1.Batch designIn the discovery cohort, tissue samples were randomly distributed to 68 batches to minimize the batch effect. Multiple replicate samples were designed to monitor the quality during Pressure Cycling Technology (PCT)-assisted sample preparation and PulseDIA on the same Q Exactive HF hybrid Quadrupole-Orbitrap (QE-HF) (Thermo Fisher Scientific). During sample preparation, each batch contained one mouse liver sample and 121 cases of biological replicates, which represent different ovarian tissue samples dissected from the same patient. During MS acquisition, each batch included one pooled peptide sample combined from all samples and 132 cases of technical replicates, which represent the same peptide sample were run twice (Supplementary Data 1). Plasma samples were randomly distributed into 12 batches. Twelve biological replicates were designed during sample preparation, while during MS acquisition, each batch contained one pooled peptide sample labeled with the TMT126 channel.Proteomics data acquisitionApproximately 1 mg of fresh frozen specimens was weighed and washed sequentially with ethanol solutions to remove OCT compound109. The process included an initial wash with 70% ethanol, followed by a rinse with water, and subsequent washes with increasing ethanol concentrations (70%, 85%, and 100%) for efficient OCT removal, each involving vortexing and supernatant discarding steps. A four-step PCT-assisted lysis and digestion were then performed, generating peptide samples for each tissue specimen49,109,110. The procedure involved lysing tissues in urea-thiourea buffer within PCT-MicroTubes under pressure cycling, followed by reduction and alkylation with TCEP and IAA. Lys-C (enzyme-to-substrate ratio = 1:40) and trypsin (enzyme-to-substrate ratio = 1:50) were then sequentially added for proteolytic digestion under pressure cycling. The reaction was quenched with TFA, and peptides were obtained by centrifugation, yielding the peptide samples for analysis. The peptide samples were desalted, dried, redissolved in buffer A (2% ACN, 0.1% formic acid), and their concentrations were measured prior to MS analysis109. Generated peptides were injected and separated over a 30-minute LC gradient on a nanoflow DIONEX UltiMate 3000 RSLC nano System connected to a Q Exactive HF-X hybrid Quadrupole-Orbitrap (Thermo Fisher ScientificTM, San Jose, USA). PulseDIA parameters were set with two schemes of complementary and discontinuous isolation windows across two injections, each with 1 m/z overlap between adjacent windows49. MS1 scans covered a range of 390–1210 m/z at 60,000 resolution, with an AGC target of 3e6 and a maximum ion injection time of 80 ms. MS2 scans were performed at 30,000 resolution with an AGC target of 1e6 and a maximum ion injection time of 50 ms. The two parts of PulseDIA raw files were analyzed using DIA-NN (1.7.12) against the spectral library, respectively. The spectral library for ovarian tissue specimens contains 130,735 proteotypic peptides and 10,780 protein groups as previously released110, while spectral library for mouse liver samples for quality control contains 134,856 proteotypic peptides and 8764 protein groups. In the DIA-NN setting, RT profiling was performed, and other parameters were set to default. Quantitative data for precursor and proteins were both controlled below 1%, and those for precursors in two injections were merged by their average values using the R program named Pulsedia_DIANN_OpenSWATH_SpectronautResult_combine (https://github.com/guomics-lab/PulseDIA). The combined peptide matrix was converted into a protein matrix using the mean of the top 3 precursor intensities in ProteomeExpert111.Peptides were generated from plasma samples after depleting 14 high-abundance plasma proteins.. Plasma was mixed with the High Select™ Top14 Abundant Protein Depletion Resin (Thermo Fisher Scientific, San Jose, USA) and incubated to deplete high-abundance proteins112 Post-incubation, the proteins were digested using TCEP and IAA for reduction and alkylation, followed by a two-step trypsin digestion at a 1:100 enzyme-to substrate ratio, and the reaction was halted by adding TFA. The peptides were then desalted using SOLAμ™ HRP columns (Thermo Fisher Scientific, San Jose, USA), dried in a vacuum concentrator, and resuspended in MS buffer A for concentration measurement. A 16-plex labeling using TMTpro reagents (Thermo Fisher Scientific, San Jose, USA) was performed for 5 μg of peptides112, and 16 samples from each batch were pooled together for high-pH fractionation using basic pH reversed-phase liquid chromatography112. The 30 concatenated fractions per batch were initially separated over a 60-min gradient from 7% to 30% buffer B (buffer A: 2% ACN, 0.1% formic acid; buffer B: 98% ACN, 0.1% formic acid) and then analyzed using data-dependent acquisition (DDA) mode on a nanoflow DIONEX UltiMate 3000 RSLC nano System (Thermo Fisher ScientificTM, San Jose, USA) connected to an Orbitrap Exploris 480 mass spectrometer (Thermo Fisher ScientificTM, San Jose, USA). The mass spectrometer was operated in positive mode, equipped with a FAIMS Pro interface. Optimal compensation voltages were set at −48V and −68V with a cycle time of 1 s per FAIMS experiment. MS1 scans were performed at a resolution of 60,000 with a normalized AGC target of 300% over a mass range of 375-1800 m/z. Dynamic exclusion was customized with an exclusion duration of 40 s. MS2 scans were carried out at a resolution of 30,000 with a normalized AGC target of 200%, using an isolation window of 0.7 m/z and setting the first mass at 100 m/z. Normalized HCD collision energy was set to 36%, Turbo-TMT was enabled, and MS/MS data were recorded in centroid mode. The mass raw data were analyzed by Proteome Discoverer (Version 2.5.0.400, Thermo Fisher Scientific) using a FASTA file (downloaded on 2018-02-09) containing 20,259 reviewed Homo sapiens protein sequences. The Proteome Discoverer settings were configured with trypsin as the protease, allowing up to two missed cleavages. Static modifications included carbamidomethylation (+57.021464) on cysteine, TMTpro (+304.207145) on lysine residues, and acetylation (+42.010565) on peptides’ N-termini. Variable modifications were oxidation (+15.994915) on methionine and acetylation (+42.010565) on peptides’ N-termini. Precursor and product ion mass tolerances were set to 10 ppm and 0.02 Da, respectively, with peptide-spectrum match validation at 1% target FDR (strict) and 5% target FDR (relaxed). Normalization was conducted against the total peptide amount, and all other parameters were maintained at default settings. Protein expression levels were calculated as grouped abundance ratios using the pooled sample labeled by the TMT126 channel for batch alignment.Preprocessing of the protein matrix and quality controlProtein counts for each sample were summarized into four groups, namely normal, benign, borderline, and carcinoma groups. Outliers with fewer proteins in each group were identified using Tukey’s fences, where k equals 1.5, resulting in the exclusion of 50 samples from 34 patients. Then, the protein matrix of solid specimens was then standardized by quantile normalization, and the missing values were imputed as 0.8 times the minimum value. Unsupervised clustering of six groups, namely normal, benign, borderline, PDS-EOC, RLP-EOC, and NACT-EOC groups, was performed using global proteome, and 23 samples with incorrect grouping were excluded.To evaluate the reproducibility during sample preparation and MS acquisition, Pearson correlation coefficients were calculated (a) among mouse liver samples, (b) among pooled samples, (c) between technical replicates, and (d) between biological replicates using log2(intensity). Potential batch effects derived from designed batches, different columns, and injected peptide amounts were assessed through unsupervised clustering of pooled samples and ovarian tissue specimens.In the protein matrix of plasma samples, proteins with a missing value rate higher than 70% were excluded. Batch correction based on the designed batch was then performed using Combat in BatchServer for the remaining protein matrix of 1660 proteins113. After batch correction, reproducibility during sample preparation and MS acquisition was evaluated by calculating the median coefficient of variation (CV) between biological replicates (using the ratio) and among pooled samples (using log2(abundance)), respectively. The batch effect of the designed batch and MS machines was assessed through unsupervised clustering of Principal component analysis (PCA) for the proteomics data.The selection of upregulated proteins along with the increased malignancy and their validation in plasmaOne-Way Analysis of variance (ANOVA) was performed among five ovarian tissue groups: normal, benign, borderline, early stage carcinoma (FIGO stage I and II) of the PDS cohort and late-stage carcinoma (FIGO stage III and IV) of the PDS cohort. Proteins with Benjamini–Hochberg [B–H] adjusted p-value < 0.05 were selected for Mfuzz clustering (Supplementary Data 3). As a result, 8741 proteins were classified into 20 clusters among five groups. Two-sided unpaired Welch’s t test was also performed to identify dysregulated proteins (a) between normal and carcinoma groups of PDS cohort using log2(abundance); (b) between non-carcinoma and carcinoma groups of primary HGSOC plasma samples using the ratio. Considering that ovarian carcinoma can originate not only from ovarian cells but also from FTE cells, we implemented a control measure to reduce potential biases. Five proteins (LYPLA2, MED17, RAB27B, and VMP1), which have been previously reported to exhibit significant upregulation in FTE compared to OSE (with a p-value < 0.05)36, were excluded from our list of identified dysregulated proteins.The criteria for the potential biomarkers of ovarian cancer were as follows: (a) Seven clusters exhibited upregulation along with increased malignancy with membership values > 0.4; (b) B–H adjusted p-value < 0.05 by two-sided unpaired Welch’s t test and fold change > 2 between normal and carcinoma groups of ovarian tissue samples (Supplementary Data 3); (c) The human secretome and membrane proteome annotated by The Human Protein Atlas (Supplementary Data 3); (d) B–H adjusted p-value < 0.05 by two-sided unpaired Welch’s t test and fold change > 1.2 between non-carcinoma and carcinoma groups of ovarian plasma samples (Supplementary Data 3).Plasma protein classifiers to distinguish ovarian carcinoma and non-carcinoma patientsFirstly, we identified secreting proteins associated with malignancy and validated these proteins in plasma samples. Subsequently, utilizing the Random Forest package, models were built using either single or combinations of two to eight proteins to distinguish carcinoma patients from non-carcinoma patients.For the protein matrix of the eight selected potential biomarkers, missing values were imputed as 0. We employed the R package randomForest (version 4.6.14) to build a thousand trees with five-fold cross-validation. Initially, we constructed nine models: one using all eight features and eight additional models, each employing one of these features individually. For the model encompassing all eight features, we calculated the average value of the mean decrease accuracy for each feature across the five-fold cross-validation as an importance value. Subsequently, we excluded the least important protein in sequence to construct models using seven to two features. The total area under the curve (AUC) was calculated for 168 plasma samples when each was grouped into test set. Statistical differences between receiver operating characteristic (ROC) curves of different models were evaluated using bootstrap test with the pROC package114.Histotype-specific differentially expressed proteins (DEPs) and pathwaysFirst, two-sided unpaired Welch’s t test was performed to identify dysregulated proteins (B–H adjusted p-value < 0.05 and fold change > 2) between each histological subtype of primary carcinoma by PDS and normal ovarian tissues. Considering that ovarian carcinoma can originate not only from ovarian cells but also from FTE cells, we implemented a control measure to reduce potential biases. Fourteen proteins (CDKN2AIPNL, DDB2, H1-0, H1-10, H1-1, HMGB2, LYPLA2, MED17, PHGDH, PRKAG1, PTMS, RAB27B, TNRC6B, and VMP1), which have been previously reported to exhibit significant dysregulation between FTE and OSE (with a p-value < 0.05)36, were excluded from our list of identified dysregulated proteins. Second, One-Way ANOVA was carried out among the five histological subtypes of the PDS cohort and found that 4534 proteins were differentially expressed among five group (B–H adjusted p-value < 0.05). Additionally, 2709 proteins were identified as dysregulated both by two-sided unpaired Welch’s t test and One-Way ANOVA.The criteria for the histotype-specific DEPs were as follows: (a) DEPs were defined as those with a B–H adjusted p-value < 0.05 by two-sided unpaired Welch’s t test and fold change > 2 between normal and each histotype group; (b) B–H adjusted p-value < 0.05 by One-Way ANOVA; (c) DEPs present in only one histotype.Unsupervised clustering was then performed for these histotype-specific DEPs using Ward’s minimum variance method. In each cluster, the major histotype to which the DEPs belong was chosen, and pathway enrichment was performed for DEPs of these major histotype using Metascape.Univariable and Multivariable Cox regression analysisProteins with a missing value ratio of less than 70% were included for univariable Cox regression analysis. Residuals of the linear regression models were calculated to remove the potential effect of age at diagnosis on protein expressions. These residuals were then standardized using rank-based inverse normal transformation. After standardization, univariable Cox regression analysis was performed to identify prognostic proteins with a p-value < 0.05 based on the likelihood p-value. Kaplan–Meier plots were drawn for representative proteins to show their significant relationship between protein expression and the optimal cut point for each protein, determined by surv_cutpoint.Univariate Cox regression analysis was also performed for clinical factors, and missing values in CA125 and HE4 levels were imputed as median values. To determine the prognostic proteins’ independence of clinical factors, multivariable Cox regression analysis was performed for each protein to adjust the effects of four prognostic clinical factors.To validate the prognostic proteins identified in our study, we performed a comparative analysis with the potential prognostic proteins pinpointed in Chowdury et al.‘s paper (linear regression model, p-value < 0.05)42.Targeted genomic sequencingFor the targeted genomic sequencing, a 295-gene panel was employed for four balanced patient groups: primary sensitive (N = 27), primary resistant (N = 26), relapsed sensitive (N = 26), and relapsed resistant (N = 17). No other specific inclusion criteria were applied for these samples, and we did not include patients who underwent NACT. Somatic DNA was extracted from fresh frozen tumor tissues using the NucleoSpin TriPrep Kit (Macherey-Nagel, Germany), and patient-matched genomic DNA was extracted from peripheral blood lymphocytes using NucleoSpin Blood Kit (Macherey-Nagel, Germany) according to manufacturer’s instructions. The quality of isolated genomic DNA was verified through agarose gel electrophoresis and concentration measurement using Qubit® DNA Assay Kit in Qubit® 3.0 Flurometer (Invitrogen, USA).Extracted DNA was fragmented into 180–280 bp by hydrodynamic shearing system (Covaris, Massachusetts, USA). DNA fragments underwent end repair, 3’ ends adenylation and ligation-mediated PCR (LM-PCR). The fragments were then hybridized to probes designed for each targeted gene, and non-hybridized ones were washed out. Real-time PCR was performed to estimate the product magnitude from LM-PCR. After library quality assessment, the clustering of the index-coded samples was generated using Illumina PE Cluster Kit (Illumina, USA) on a cBot Cluster Generation System, and then high-throughput sequencing was conducted on an Illumina platform to generate 150 bp paired-end reads.Sequence artifacts, including those paired reads in either read containing adapter contamination (>10 nucleotides aligned to the adapter, allowing ≤ 10% mismatches), uncertain bases (more than 10%) or low-quality bases (Phred quality <5, proportion > 50%), were discarded. More detailed quality control statistics are summarized in Supplementary Data 7.Valid sequencing data were mapped to the reference genome (GRCh37/hg19) using Burrows-Wheeler Aligner (BWA) software (http://github.com/lh3/bwa)115. BAM files were sorted, and duplicate-marking was done using SAMtools116 and Sambamba117. Somatic single-nucleotide variants (SNVs) and insertions/deletions (indels) were retrieved with MuTect (v 3.1-0-g72492bb) (http://github.com/broadinstitute/mutect) and Strelka (v 1.0.14) (http://github.com/Illumina/strelka), respectively. Germline SNVs and indels were called using Genome Analysis Toolkit (GATK, v 3.1-0-g72492bb). Mutations in coding regions were manually checked using Integrative Genomics Viewer (IGV, version 2.3.34), and filtered variants were annotated using Oncotator (version 1.5.1.0) (http://github.com/broadinstitute/oncotator) and Variant Effect Predictor (VEP, v 83) (http://github.com/Ensembl/ensembl-vep). Copy number variations were analyzed using Cnvkit v0.9.9118.Bioinformatic analysis for genomic and proteomic dataFirstly, Fisher’s exact test was performed to evaluate the associations between each gene mutation (combining germline and somatic mutations) and chemoresistance in the 295-gene panel. This panel included 14 genes with a direct or indirect role in homologous recombination repair (HRR)107. The associations between HRR mutations and chemoresistance were also evaluated.Next, two-sided unpaired Welch’s t test was performed to identify dysregulated proteins between (i) chemosensitive HGSOC patients with HRR mutations versus chemoresistant ones without any HRR mutations; (ii) chemosensitive versus chemoresistant HGSOC patients in the relapsed cohort. Dysregulated proteins were defined as those with a p-value less than 0.05 and a fold change greater than 1.5. Lastly, pathway enrichment for these dysregulated proteins was performed using Metascape and String.Targeted proteome by MRMQuantification of prognostic proteins was performed using multiple reaction monitoring (MRM) in tissue and plasma samples. For tissue samples, 71 out of 281 prognostic proteins were quantified by MRM, while for plasma samples, 51 out of 241 prognostic proteins were quantified by MRM (Supplementary Data 6). Common internal retention time (CiRT) standard peptides were used for retention time prediction, with 13 and 12 peptides selected from OVLib110 and a published blood spectral library119, respectively (Supplementary Data 6). Peptides were separated at a flow rate of 0.2 mL/min over a 15-min LC gradient from 10% to 40% buffer B (buffer A: 0.1% formic acid aqueous solution; buffer B: 0.1% formic acid in acetonitrile solution) in JasperTM HPLC system (SCIEX, CA, USA). The ionized peptides were transferred into TRIPLE QUADTM 4500MD (SCIEX, CA, USA) for analysis.A total of 388 transitions of 100 peptides from tissue samples and 389 transitions of 101 peptides from plasma samples were selected and analyzed within a ± 1 min time window using time-scheduled acquisition. The target scan time per cycle was set as 2.5 s for tissue samples and 1.7 s for plasma samples.Machine learningTo predict one-year relapse after the last chemotherapy, we first identified prognostic proteins in the global proteomic data of the discovery cohort, and verified these prognostic proteins using targeted proteomics and optimized models by machine learning. Finally, we evaluated the predictive utility of the final model using an independent validation cohort. The discovery cohort consisted of primary HGSOC patients with at least six cycles of platinum-based chemotherapy from PDS cohort. We excluded patients with an inconclusive outcome of recurrence within one year, resulting in 400 tissue samples from 347 patients and 141 plasma samples from 131 patients (Supplementary Data 1).Prognostic proteins were identified by univariate Cox analysis and two-sided unpaired Welch’s t test. For ovarian tissues, 281 prognostic proteins met both criteria (p-value < 0.05 by univariate Cox analysis and p-value < 0.05 by two-sided unpaired Welch’s t test between patients relapsing within one year and those after one year). For plasma samples, 241 prognostic proteins met either of the two criteria mentioned above. Then, 71 out of 281 prognostic proteins from ovarian tissues and 51 out of 241 prognostic proteins from plasma were quantified by MRM. Forty tissue proteins and 34 plasma proteins were verified using MRM assay (Supplementary Data 6). Two immunoglobins among verified tissue proteins were excluded. Thus, 38 tissue proteins and 34 plasma proteins were left to build the predictive model by eXtreme Gradient Boosting (XGBoost) algorithm.Seven clinical factors, including age at diagnosis, residual tumor size, FIGO stage, metastasis of lymph, CA125 and HE4 levels before the treatment, and CA125 at the last cycle of chemotherapy, and verified prognostic proteins quantified by MRM were used to select features to optimize three predictive models (A, B, and C). Model A was based on clinical factors only, while Model B and C were based on clinical and protein features from tissue and plasma samples, respectively.We randomly split the discovery cohort into a training set and an internal test set at a ratio of 3:1. Then, one hundred iterations of 60% under-sampling of the training set were performed to build models using XGBoost. Two parameters, namely subsample (from 0.5 to 1 at a step of 0.05) and leaning rate (from 0.1 to 0.3 at a step of 0.04), were optimized. The features were ranked by frequency in each model and top 5 to 15 features were selected to build models for the entire training set using XGBoost. The other four parameters, namely gamma (from 0 to 0.2 with a step at 0.05), max_depth (from 3 to 10 with a step at 1), colsamp_bytree (from 0.1 to 1 with a step at 0.1), min_child_weight (from 1 to 5 with a step at 1), were optimized. The model with maximal accuracy of discovery cohort was selected finally. The independent validation set was used to evaluate the predictive utility of the final model.We utilized the CPTAC cohort as an external validation set to verify the generalizability of our model beyond the Chinese population. This cohort had 32 samples with measurements obtained by both Johns Hopkins University (JHU) and Pacific Northwest National Laboratory (PNNL). To avoid redundancy, we removed the 32 duplicate samples assayed by PNNL, resulting in a final set of 126 unique samples. The protein matrix of these samples underwent Z-score normalization for standardization. As the CPTAC dataset lacked two clinical factors and the expression data for the AGRE5 protein, we adapted tissue model B by retaining the original parameters of the remaining eleven protein features. This revised model was then applied to predict one-year recurrence in the CPTAC cohort.Statistics and reproducibilityAll patient diagnoses were established between 2006 and 2018, adhering to the WHO Classification of Tumors: Female Genital Tumors (5th edition). Tissue specimens were collected based solely on their availability, without any clinical factor-based selection criteria. While a formal sample-size calculation was not performed, we ensured that each analyzed group contained at least 10 samples, a number deemed sufficient for statistical purposes.To verify the reproducibility of our proteomic data, biological replicates were utilized during experimentation. Any additional replication data not reported in the manuscript, whether successful or unsuccessful, are not available.Protein counts from individual samples were categorized into four groups: normal, benign, borderline, and carcinoma. We identified and excluded outliers with abnormally low protein counts in each category using Tukey’s fences (k = 1.5), resulting in the removal of 50 samples from 34 patients. Unsupervised clustering was applied to six designated groups—normal, benign, borderline, primary debulking surgery epithelial ovarian cancer (PDS-EOC), recurrent low platinum-sensitive epithelial ovarian cancer (RLP-EOC), and neoadjuvant chemotherapy epithelial ovarian cancer (NACT-EOC)—based on global proteome profiles; 23 samples that clustered incorrectly were excluded from the analysis.Statistical analyses were conducted using R software (versions 4.0.5 and 4.3.1). Within the stats package (v4.3.1), we performed Analysis of Variance (ANOVA), Welch’s t test, and Principal Component Analysis (PCA). The Benjamini-Hochberg procedure was utilized to adjust p-values for multiple comparisons using the p.adjust function. We calculated correlation coefficients with the corrplot package (v0.92) and conducted soft clustering using the Mfuzz package (v2.60.0). The randomForest package (v4.6.14) was employed to develop plasma protein classifiers to differentiate between carcinoma and non-carcinoma cases. Cox proportional hazards regression analysis was carried out using the survival package (v3.5-7). Heatmaps were generated with the pheatmap package (v1.0.12), employing ward.D2 linkage for protein clustering. Prognostic predictive models were built using the xgboost function in the xgboost package (v1.6.0.1), with SHAP values derived from the SHAPforxgboost package (v0.1.3).Language polishingDuring the preparation of this work the authors used ChatGPT in order to improve language and readability. After using this tool, the authors reviewed and edited the content as needed and take full responsibility for the content of the publication.Reporting summaryFurther information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Hot Topics

Related Articles