A multi-classifier system integrated by clinico-histology-genomic analysis for predicting recurrence of papillary renal cell carcinoma

PatientsThis study was approved by the Institutional Review Boards of the First Affiliated Hospital of Sun Yat-sen University, Sun Yat-sen University Cancer Center, Renji Hospital of Shanghai Jiao Tong University, Peking University First Hospital, and Affiliated Yantai Yuhuangding Hospital of Qingdao University. The informed consent was waived because patients were not directly recruited for this study.In this study, we used 589 formalin-fixed, paraffin-embedded (FFPE) tissue samples from 589 patients with pRCC who were treated between January 2008 and December 2018. Inclusion criteria were patients with sporadic unilateral pRCC, stage I–III, who underwent resection without neoadjuvant therapy or adjuvant therapy, and for whom clinicopathological characteristics, follow-up information, and fixed tumor tissue were available. Two genitourinary pathologists (Y.C. and B.L.) reassessed all of these samples. The training set (382 cases) included 189 patients from South China (First Affiliated Hospital and Cancer Center of Sun Yat-sen University) and 193 patients from East China (Renji Hospital of Shanghai Jiao Tong University). The independent validation set included 207 cases from North China and comprised 121 patients from Peking University First Hospital and 86 patients from Affiliated Yantai Yuhuangding Hospital of Qingdao University. For all patients in our cohorts from China, baseline imaging assessments were conducted within 3 months following surgery. Beyond the initial assessments, patients diagnosed with stage I disease underwent annual evaluations until the occurrence of disease recurrence, metastasis, or death, whichever occurred first. For patients with stage II and stage III disease, evaluations were scheduled every 3–6 months during the first 3 years, followed by annual assessments until the occurrence of disease recurrence, metastasis, or death, whichever occurred first. Imaging assessments were collaboratively performed by urologists and radiologists at each site. CT scans (preferred) or MRIs (when CT was unavailable or impractical) were utilized for imaging the chest, abdomen, and pelvis. Additionally, bone scans, brain imaging, and other supplementary imaging procedures were undertaken as indicated by symptoms. For the TCGA set, RNA-seq data (level 3), clear diagnostic whole-slide images (WSIs) and clinical data were downloaded from the Genomic Data Commons portal (https://portal.gdc.cancer.gov/) on June 1, 2023. 204 cases with stage I–III, complete RNA-seq data, clear diagnostic WSIs and follow-up data were finally selected for further analysis. Cases with conflicting information were thoroughly re-evaluated and discussed again using all available information to reach a final diagnosis.Genome-wide RNA-seq data analysisTo generate lncRNA expression profiles, we obtained, as a discovery set, a panel of 53 fresh-frozen tumor samples with paired adjacent normal tissue from patients with pRCC treated between January 2016 and June 2020 at First Affiliated Hospital, Cancer Center of Sun Yat-sen University, and from Renji Hospital of Shanghai Jiao Tong University. Total RNA was extracted using TRIzol reagent (Invitrogen, Carlsbad, CA, USA). RNA purity and integrity were analyzed using a NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA) and an Agilent Bioanalyzer 2100 (Agilent Technologies, Santa Clara, CA, USA). The libraries were built and sequenced by CapitalBio Corporation (Beijing, China). Briefly, total RNA was subjected to removal of ribosomal RNA using a Ribo-Zero rRNA removal kit (Illumina, USA). A NEBNext Ultra II RNA Library Prep Kit for Illumina (New England Biolabs, Ipswich, MA, USA) was used to generate sequencing libraries following the manufacturer’s protocols, and the library quality was monitored on an Agilent Bioanalyzer 2100 (Agilent). Fragments were finally sequenced on a HiSeq 2000 platform (Illumina).Quality control and pre-processing of FASTQ files were done using fastp to obtain the clean reads (clean data)40. Obtained RNA-seq paired-end clean data were then analyzed using Hisat241, Samtools 1.942, Stringtie 1.3.543, and DESeq244. Briefly, human reference genome (Homo_sapiens.GRCh38.84.gtf.gz) was obtained from Ensembl (ftp://ftp.ensembl.org/pub/release-84/gtf/homo_sapiens/Homo_sapiens.GRCh38.84.gtf.gz). Then, the reference genome index was created by the build-index function in the Hisat2 software (http://ccb.jhu.edu/software/hisat2) package with default options. The alignment of paired-end reads from each sample to the reference genome was performed using Hisat2 with default settings. After the alignment, the generated SAM files were sorted to BAM files using Samtools 1.9 (http://samtools.sourceforge.net). Subsequently, Stringtie 1.3.5 was used to assemble the transcripts using BAM files as inputs. We then used Stringtie and its prepDE.py to generate raw read count matrices for genes and transcripts. The genes were annotated by GENCODE version 22. Those lncRNAs located in the sex chromosomes were filtered. Only lncRNAs expressed in at least 80% of the tumor samples (counts ≥1) were included, and, finally, 8219 lncRNAs were selected for subsequent differential expression analysis. Differential expression analysis was performed using the DESeq2 package in R software. 40 lncRNAs were identified with a log2 fold change of more than 1, and the false discovery rate (FDR) was less than 10−25 (Supplementary Table 1). Raw sequencing data analyzed in this study have been uploaded to the Gene Expression Omnibus at the National Center of Biotechnology Information and can be found under accession number: GSE180777 or (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE180777).For the TCGA set, the expression of lncRNAs was derived from RNA-seq data downloaded from the Genomic Data Commons portal (https://portal.gdc.cancer.gov/).qRT–PCRFor each sample of FFPE tumor tissue from the training set and the independent validation set, serial sections were stained with hematoxylin and eosin for histological confirmation of the presence (>80%) of tumor cells. Three 20-μm tissue sections from each sample were used to obtain sufficient RNA. RNA was extracted with the Qiagen FFPE RNeasy kit and the HiPure FFPE RNA Kit (Magen, Guangzhou, China) following the manufacturer’s instructions. For qRT–PCR, total RNA (2 μg) was reverse transcribed using PrimeScript RT Master Mix (Takara, Shiga, Japan), and qPCR was performed on triplicate samples in a SYBR Green Reaction Mix (Yeasen, Shanghai, China) with ABI 7900HT Fast Real Time PCR System (Applied Biosystems, Carlsbad, CA, USA). The sequences of the primers used in this study are listed in Supplementary Table S1, and the results were normalized to the expression levels of ACTB using the 2−ΔCt method; ΔCt is the difference of Ct values between the lncRNA of target gene and the internal reference (ACTB).The construction of the lncRNA-based classifierA multivariate LASSO Cox regression model was employed to select four lncRNAs, which were then used to generate a lncRNA-based risk score for RFS for each patient, as described in Eq. (1). The expression levels of the four lncRNAs for the training and independent validation sets were obtained from qRT-PCR assays, while for the TCGA set, they were extracted from RNA-seq data. Given the distinct data types, the applied cutoffs varied accordingly. In the training set, patients were divided into high-risk and low-risk groups using the median risk score of 0.9800 as the cutoff. This same cutoff was employed for classifying patients in the independent validation set into respective risk groups. For the TCGA set, patients were divided into high-risk and low-risk groups, using a distinct median risk score (1.8100) as the cutoff.Sample preparation and image pre-processing for WSIA representative H&E-stained tumor slide (the one containing the highest grade of tumor in the specimen) was created for each patient in the training and independent validation sets20,45. Next, a WSI in the SVS file format was created for each representative slide using a KF-PRO-020 scanner (KFBIO, Ningbo, China), at 40 equivalent magnification (0.25 μm/pixel). Then, each representative slide was scanned with KF-PRO-020 scanner at 40× equivalent magnification to generate a WSI in the SVS file format. A WSI with the 40× resolution typically contained an order of 100,000 ×10,000 pixels—multiple orders of magnitude larger than images currently feasible for classification by deep learning methods46,47. The WSIs were annotated using the ASAP 1·8 platform, available at https://github.com/computationalpathologygroup/ASAP/releases. Next, a manual annotation method was applied to annotate the tissue regions and map the tumor area on all WSIs using polygons to draw the outline. This step was independently checked by another pathologist48. To preserve the prognostic information present at high resolution, WSIs were divided into multiple non-overlapping image regions known as tiles at 10× and 40× resolutions. Each pixel at 40× represents a physical size of approximately 0.24 × 0.24 μm. By creating a grid of potential tiles starting from the top-left corner of the slide picture, including areas outside the tumor segmentation, tiling was carried out. Candidates were tiles with an overall tissue percentage of more than 75% (compared to the whole area in each tile)45. Tiles were extracted with OpenSlide from level 0, converted to numpy arrays, and resized with OpenCV using the resize function (https://docs.opencv.org/3.4.0/da/d54/group__imgproc__transform.html), with interpolation set to cv2.INTER_CUBIC for up-sampling and cv2.INTER_AREA for down-sampling, and saved in a lossless format (as TIFF files). To prepare each tile for the later prediction model, non-tissue-containing white background was removed from each tile using a series of RGB filters, and then color was normalized using the Macenko method20.Construction and evaluation of the deep learning WSI-based classifier182 cases from the training set with a so-called distinct outcome, either good or poor, were used as the development set to obtain clear ground truth for developing a prognostic score utilizing deep learning on digital pathological images. 76,348 tiles at 10× resolution and 1,789,634 tiles at 40× resolution in the developing cohort were used to train the deep learning models. Then, the developing cohort was randomly split into three sets—a discovery set (1,322,009 tiles from 136 WSIs), a tuning set (209,600 tiles from 18 WSIs), and a holdout internal test set (334,373 tiles from 28 WSIs)—for development of the outcome classifier45. The model was built around multiple instance learning and comprised a MobileNet V3 representation network49,50, a Noisy-AND pooling function51, and a fully connected classification network similar to the one used by Skrede et al.20. The entire network was trained end-to-end (i.e., directly from image to patient outcome), and each training iteration used a batch size of 32 collections with 64 tiles each. In brief, this convolutional neural network was a modified MobileNetV3 architecture that had been pre-trained on ImageNet and fine-tuned by transfer learning on the development set, which took tiles from an image and output a slide-level 4 probability (risk score) of tumor recurrence20,45. We added positional transforms, such as a horizontal flip and a rotation of 0°, 90°, 180°, or 270°, at random to the training data. We used a gradient approximation method that significantly lowers memory usage during training, which allowed us to use these tiles20,45. The Noisy-AND pooling function applied a trained non-linear function on tile representation averages. This function enhances robustness against tiles not representing the ground truth and, together with the large number of tiles, alleviates the issues of spatial heterogeneity. The network processed each tile in the WSI during inference. Using TensorFlow 2.4.0, the models were trained beyond apparent convergence, and each 10× model was evaluated at iteration 5000, 5500, and so on up to iteration 15,000. Each 40× model was evaluated at iterations 10,000, 11,000, and so on up to iteration 30,000. The models were selected from each network training using the performance in the tuning set from developing cohort with the C-index as metric, resulting in optimal models for each resolution. The predictive accuracy of the WSI-based score with the 10× resolution was higher than that with the 40× resolution.We used the internal holdout test set from the development set for internal assessment, and we independently evaluated any potential performance variability related to randomization in data splitting using a four-fold cross-validation scheme. On the holdout test set, the WSI-based model with the 10× resolution was able to predict tumor recurrence possibilities with a C-index of 0.723 (95% CI 0.553–0.994), and, upon four-fold cross-validation in the development set, the C-index ranged from 0.634 to 0.721, with a mean of 0.688 across the four values. Furthermore, the WSI-based model with the 40× resolution had lower performance than that with the 10× resolution, which achieved a C-index of 0.675 (95% CI 0.514–0.727), and, upon four-fold cross validation in the development set, the C-index ranged from 0.581 to 0.643, with a mean of 0.609 across the four values. Thus, we chose the WSI-based model with the 10× resolution for the subsequent analyses. In the process of training and validating the multi-classifier system, 141,029 tiles at 10× resolution from 382 WSIs in the training set, 79,866 tiles at 10× resolution from 207 WSIs in the independent validation set, and 81,323 tiles at 10× resolution from 204 WSIs in the TCGA set were included for analysis. Using the WSI based model, we calculated a risk score (with 10× resolution) for each patient in the training, independent validation, and TCGA sets. Patients in all three sets were divided into high-risk and low-risk groups using the median risk score from the training set (0.2857) as the cutoff. The source code for the deep learning model is available online (https://github.com/guichengpeng1/WSI-based-deep-learning-classifier-in-papillary-renal-cell-carcinoma).OutcomesThe primary outcome was RFS, defined as the time from surgery to local recurrence and distant metastasis10. The secondary outcome was OS, defined as the time from surgery to death from any cause.Statistical analysisLASSO Cox regression analysis was used to select the most useful prognostic markers among candidate lncRNA and to construct a lncRNA-based classifier for predicting the RFS of patients with pRCC in the training set. We used the Kaplan–Meier method to analyze the correlation between variables and survival. We used the Cox regression model for multivariate survival analysis and Cox regression coefficients to generate a prognostic model and a nomogram. HRs and 95% confidence intervals (CIs) were calculated using the Cox proportional hazards model. Harrell’s C-indexes were calculated to examine discrimination52. All statistical tests were performed with R software version 4.1.0 (R Foundation for Statistical Computing, Vienna, Austria). Statistical significance was set at p < 0.05.Reporting summaryFurther information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Hot Topics

Related Articles