Noninvasive multi-cancer detection using blood-based cell-free microRNAs

Noninvasive screening tests for MCED via analyzing circulating cell-free nucleic acids and/or proteins in the body fluid, especially blood, have attracted high attention for the last decade. In this study, we reported the development and validation of a serum 4-miRNA diagnostic model and demonstrated that in three large independent validation sets totaling 8597 participants (4875 cancer patients across 13 cancer types and 3722 non-cancer individuals), the 4-miRNA model can detect 12 cancer types simultaneously with high sensitivities (> 90% for 9 cancer types, and ≥ 75% for 3 cancer types) while still achieving a very high specificity of ~ 99%. In addition, the observation that the diagnostic indices for the post-surgery serum samples were reduced to normal levels suggests the potential utility of the model for monitoring response to treatment and detection of recurrence.Importantly, our model was able to detect early-stage cancers at high sensitivity. Specifically, in Validation Set 1 of lung cancer patients, the model detects stage I and II cancers at a sensitivity ranging from 98.4 to 99.6% (Fig. 4D). In Validation Sets 2 and 3, while individual patient-level stage information was not available, aggregate stage information was provided for 6 of the 12 cancer types examined. First, all gastric cancer patients were stage I or II, thus the 100% sensitivity of our model applied to early-stage gastric cancer. Second, 88% and 93% of bladder and prostate cancer patients had node negative disease. Thus, with 99% and 98% sensitivity for these two cancers, the sensitivity for stage I or II bladder and prostate cancers should be very high as well. Third, 66% and 70% of esophageal and liver cancer patients were stage I or II, respectively. It was reasonable to speculate that the sensitivity for stage I or II of these two cancers should not be far off from the 92% and 84% sensitivity reported for all stages included. In summary, based on the data currently available in the three validation sets, we concluded that our 4-miRNA model achieves high sensitivity for stage I or II disease of six cancer types (lung, gastric, bladder, prostate, esophageal, and liver).Of note, the original studies that generated the eight miRNA microarray datasets analyzed in this study also proposed miRNA panels for detecting each of the eight caner types (lung, ovarian, liver, bladder, esophageal squamous, gastric, prostate and glioma), respectively. These eight miRNA panels included 41 unique miRNAs with only one overlapping miRNA, hsa-miR-6724-5p, which occurred in the liver and bladder cancer panels. While some of these panels demonstrated higher performance characteristics than our model for their respective cancer types, this is expected given their specific focus. However, if these panels were to be used to detect these eight cancer types together in a sequential fashion, the cumulative incidence of false positives was approximately 33% based on the published performance matrix. In contrast, our model, which detects 12 cancer types simultaneously, achieves a false positive rate of less than 1%.Among the four miRNAs used in our model, hsa-miR-5100 has been reported to be overexpressed in lung, gastric, oral squamous cell carcinoma, and pancreatic cancers20,21,22,23,24. On the other hand, has-miR-1228-5p has been implicated as overexpressed in hepatocellular carcinoma and kidney clear cell carcinoma25,26, while hsa-miR-663a has been found to be overexpressed in colon cancer and metastatic prostate cancer27,28. Gene set enrichment and network analysis showed that transforming growth factor beta-1 (TGFB1), a gene regulated by has-miR-663a, was implicated in signaling pathways across multiple cancer types including colorectal cancer, pancreatic cancer, gastric cancer, renal cell carcinoma, hepatocellular carcinoma and leukemia. The observation that the PI3K Akt and MAPK signaling pathways are among the most regulated by the top 50 miRNAs certainly suggests that the origin of the miRNAs is from the cancer cells, but not from reactive stromal fibroblasts, tumor-associated immune cells, or biopsy-induced wound-related changes29. Taken together, these data support the use of these miRNAs as potential biomarkers for cancer early detection across multiple cancer types.Several commercial assays for MCED have emerged in recent years. Most of these tests used next generation sequencing (NGS) technology to evaluate either methylation or fragmentation patterns of circulating tumor DNAs30,31,32,33. The most prominent MCED test that attracted high attention was the Galleri test that examined > 100,000 targeted methylated regions and > 1,000,000 CpG dinucleotides. In its prospective and case-controlled the Circulating Cell-free Genome Atlas (CCGA) study, Galleri achieved an overall sensitivity of 67.6% across 12 stage I-III pre-specified cancer types and 99.5% specificity30. However, the sensitivity was only 16.8% for stage I and 40.4% for stage II. The other MCED test not based on NGS technology is CancerSEEK that assesses four biomarker classes (aneuploidy, DNA methylation, mutations and proteins). In its latest retrospective, case-controlled study of 566 cancer patients across 12 cancer types and 566 non-cancer controls, it showed an overall 61% sensitivity and 98.2% specificity34. The sensitivity dropped to 49.8% for stage I-III cancers. In summary, these MCED tests generally showed modest sensitivities in the range of 60–70% when a high 99% specificity was required, and the sensitivities dropped further for stage I or II cancers. Compared to these assays, our diagnostic model, while much simpler, demonstrated substantially higher sensitivities in the range of 90–100% for 9 out of 12 cancer types in large validation cohorts totaling almost 8600 participants. More importantly, our model achieves similarly high sensitivities for stage I or II cancers.The clinical utility of these MCED assays must be ultimately demonstrated in prospective screening trials with asymptomatic individuals. For example, Galleri was evaluated in the prospective screening study of PATHFINDER that analyzed 6621 participants aged ≥ 50y with 1 year follow-up35. The study detected a cancer signal in 92 (1.4%) participants and confirmed 35 as true positives, resulting in a 38% positive predictive value (PPV). In addition, 121 participants had cancer diagnosed at the end of 1-year follow-up, which corresponded to a 29% sensitivity by Galleri. In its latest prospective observation study SYMPLIFY with 5461 symptomatic participants referred from primary care and 368 (6.7%) diagnosed with a cancer, Galleri achieved 66.3% sensitivity and 98.4% specificity36. For our 4-miRNA diagnostic model, assuming a screening population with 1% cancer incidence rate, 90% sensitivity and 99.3% specificity, our model would provide a PPV of 56%, significantly higher than the 3.7–4.4% PPVs for the four single-cancer screening tests recommended by USPSTF37,38,39.It is worth noting that a simple four-parameter diagnostic model like the one described here not only costs significantly less, but also can be developed into an in vitro diagnostic (IVD) test using RT-qPCR capable of decentralized testing, which has an advantage over NGS-based tests that are usually implemented as a laboratory developed test (LDT). These characteristics are important to drive adoption and increase affordability of MCED tests as they are intended to target high risk or at-risk general public, especially for those from low-income communities.We acknowledge that the current study is a computational analysis using public datasets. Experimental validation and investigations on the role of the 4 miRNAs will shed light on the mechanistic understanding of the predictive power of these miRNAs. In particular, validating these miRNAs in different cohorts using different molecular techniques such as PCR is crucial before considering the current study results definitive, which is also critically important in developing these miRNAs into a lower-cost and practical diagnostic assay for clinical use. These will be the focus of our future work and are beyond the scope of the current study.In summary, our study has provided proof-of-concept data for developing a blood screening test based on expression profiles of circulating cell-free miRNAs for 12 cancer types, which account for 50% estimated new cancer cases and 63% cancer deaths in the US in 20222.MethodsStudy design and construction of train and validation datasetsWe identified eight serum miRNA microarray datasets from Gene Expression Omnibus (GEO)10,11. After removing redundant cases, we assembled three large datasets that were independent of each other: a lung cancer dataset (n = 3744)10,12, a combined dataset by merging the ovarian, liver and bladders cancer datasets (n = 3792)10,13,14,15, and a combined dataset by merging the esophageal squamous cell, gastric, prostate and glioma cancer datasets (n = 3877)11,16,17,18,19.Based on these three large datasets, we constructed a large training set (‘Train Set’) that included 1408 cancer patients from 7 cancer types (208 lung cancer patients and 200 patients each for ovarian, liver, bladder, esophageal, gastric, and prostate) and 1408 age- and gender-matched non-cancer controls for the development of a diagnostic model for detecting multiple cancer types. All the remaining cases formed three separate independent validation sets (Fig. 1A and B). Details of how the cancer case and control samples for the Train Set and Validation Sets were selected are described in the Supplemental Methods.

Hot Topics

Related Articles