Mapping extrachromosomal DNA amplifications during cancer progression

Disease progression, including metastasis, is a leading cause of death from cancer as tumors acquire resistance and become increasingly less responsive to therapies1,2. Characterizing the genomic features of primary untreated and metastatic treated tumors is critical to improving our understanding of the processing driving cancer progression3,4. Cancer is driven by genomic alterations, including focal DNA amplifications, in which DNA segments containing oncogenes or oncogenic regulatory elements are multiplied, resulting in oncogene transcription and activation5. Amplifications may occur through mechanisms tethered to chromosomes, forming homogeneously staining regions (HSRs), or by excising and circularizing DNA segments to form extrachromosomal DNA (ecDNA) elements6,7. HSRs and ecDNAs both create gene amplification, but their functional consequences may vary8,9. EcDNAs replicate with the linear genome but lack centromeres, resulting in uneven segregation and enabling rapid accumulation of ecDNAs in tumor cell nuclei9,10. If the ecDNA endows the tumor cell with a competitive advantage, cells containing ecDNAs undergo selection, creating a dominant tumor cell clone driven by an ecDNA-activated oncogene11. The ecDNAs are detected in most human cancer types at the time of diagnosis and are enriched in poor prognosis tumor types such as glioblastoma, sarcoma and esophageal carcinoma8. However, the role of ecDNAs in advanced cancers remains unclear.The genes carried on or activated by ecDNAs include ERBB2, EGFR and CDK4, which are targets of commonly used inhibitors for the treatment of patients with cancer. In addition, oncogenes that are considered undruggable are detected on ecDNAs, such as MYC, TERT and MCL1. In fact, all genes known to be focally amplified in cancer are detected on ecDNAs in some tumors8,12,13. The discovery of ecDNA clusters that appear to function as hubs where transcriptional machinery is assembled and shared9,14, the absence of centromeres that results in uneven segregation11,15, the detection of ecDNA sequences in micronuclei16,17 and the enrichment of enhancer elements on ecDNA molecules18,19 contribute to the hypothesis that proteins regulating ecDNA-related processes may represent potent drug targets. Effective targeting of ecDNA elements requires understanding the role of ecDNA during cancer progression.Here we have compared ecDNA frequencies and properties in cancers at the time of diagnosis and at later stages of disease to evaluate whether ecDNAs act as drivers of tumor evolution11. We determined the presence of ecDNAs through a computationally intensive and standardized analysis pipeline to uniformly process 8,060 whole-genome sequencing (WGS) datasets generated from biopsy specimens obtained from patients at cancer diagnosis and in patients with advanced pretreated and/or metastatic cancer, including 231 cases with multiple time-separated specimens.ecDNAs are frequently detected in advanced tumorsWe determined the incidence of ecDNA in progressed tumors through analysis of WGS datasets from 4,170 advanced cancer samples, derived from 4,170 patients, available through the Hartwig Medical Foundation (HMF)20. The HMF cohort included tumors from 2,333 pretreated patients, 1,191 untreated patients and 646 patients with unknown treatment status. We compared HMF results with those derived from analyzing the whole genomes of 3,464 newly diagnosed tumors and 226 pretreated tumors from The Cancer Genome Atlas–the International Cancer Genomics Consortium (TCGA–ICGC)8 and 100 matching primary-recurrent pairs from the Glioma Longitudinal Analysis (GLASS) consortium21. The datasets were analyzed using AmpliconSuite-pipeline (v.0.1344.2) to detect focally amplified genomic loci and reconstruct the structures of the resulting amplicons from the whole-genome sequences from all 8,060 samples. The AmpliconSuite-pipeline includes the AmpliconArchitect22 method to derive amplicon structures and the AmpliconClassifier to assign amplicons to an amplicon class (Supplementary Table 1)23. Amplicons carrying a circular amplicon structure signature were classified as ecDNA, and noncircular amplicons were grouped into the chromosomal amplification (ChrAmp) class23. In total, across 8,060 tumors, we detected 2,602 ecDNA amplicons and 8,594 ChrAmp amplicons. We further assigned sample-level classes, labeling tumors containing at least one ecDNA amplicon as ecDNA and samples with at least one noncircular amplicon as ChrAmp. Tumors lacking amplicons were labeled ‘no focal somatic copy-number amplification’ (NoAmp).To be able to evaluate ecDNA frequencies between cohorts, we determined whether tumor purity and sequencing depth impacted the sensitivity of amplicon detection. We observed that a reduced number of ecDNAs were detected in samples with an average coverage of less than ten times (Extended Data Fig. 1a). Additionally, we found a significant difference in ecDNA frequency between ICGC and HMF samples in tumor purity bins 0.3–0.4 and 0.4–0.5 (Extended Data Fig. 1b). Comparisons in the TCGA cohort were limited by low sample numbers, following filtering of the <10× samples. Based on this observation, we additionally removed samples with tumor purity less than 0.4 from comparisons between cohorts. As a result, 2,196 TCGA–ICGC and 3,045 HMF tumors passed all filtering criteria. These samples were then used to construct a tissue-matched primary cancer cohort (n = 1,490) consisting of newly diagnosed and untreated TCGA–ICGC tumors and an advanced cancer cohort (n = 2,440) comprising metastatic and/or pretreated tumors from TCGA–ICGC and HMF, by including only tumor types represented by at least 20 samples in both primary and advanced cohorts (Fig. 1a and Extended Data Fig. 1c). After applying the same filters on 508 paired primary and recurrent/metastatic specimens, a longitudinal cohort consisting of 306 multitime point samples from 153 patients was created across TCGA, HMF and GLASS cohorts (Extended Data Fig. 1d).Fig. 1: Sample classification.a, Schematic dataset overview. b, Overview of sample classification for 1,490 patients in the primary cancer cohort and 2,440 patients in the advanced cancer. Only tumor types with at least 20 patients in each cohort were included. c, Average number of ecDNA and ChrAmp amplicons detected per ecDNA patient and ChrAmp patient, respectively. Tumor lineages represented by at least 20 tumors in both cancer cohorts are included. Numbers in parentheses indicate the number of patients. Points represent mean values, and error bars show a 95% CI. P values were computed using a two-sided Mann–Whitney U test. d, Percentage of ecDNA samples. e, The average number of distinct ecDNA amplicons per sample in primary and advanced cancer cohorts, showing tumor lineage represented by at least 20 tumors in both cohorts. P values were computed using a one-sided binomial test with the ecDNA-carrying tumor fraction in the primary cancer cohort as a null probability in d and using a one-sided Mann–Whitney U test in e where not significant unless noted otherwise. f, Number of kataegis events normalized by the number of intervals present on ecDNA or ChrAmp amplicons in the primary and advanced cohorts, respectively. Numbers indicate the number of amplicons. Bars represent mean values, and error bars show 95% CIs. P values were computed using a two-sided Mann–Whitney U test. Asterisks indicate level of significance: *1.00 × 10−2 < P ≤ 5.00 × 10−2, **1.00 × 10−3 < P ≤ 1.00 × 10−2, ***1.00 × 10−4 < P ≤ 1.00 × 10−3 and ****P ≤ 1.00 × 10−4. NS, not significant; GBM, glioblastoma multiforme; SARC, sarcoma; KIRC, kidney renal clear cell carcinoma; PACA, pancreatic cancer; PAEN, pancreatic cancer endocrine neoplasms; BLCA, bladder urothelial carcinoma; LUAD, lung adenocarcinoma; LICA, liver cancer; COADREAD, colorectal cancer; PRAD, prostate adenocarcinoma; HNSC, head and neck squamous cell carcinoma; ESCA, esophageal carcinoma; BRCA, breast invasive carcinoma; STAD, stomach adenocarcinoma; OV, ovarian serous cystadenocarcinoma; UCEC, uterine corpus endometrial carcinoma.At least one ecDNA was detected in 346 (23.2%) tumors from the primary cancer cohort and 777 tumors (31.8%) of the advanced cancer cohort (Fig. 1b and Extended Data Fig. 2a). A significantly larger fraction of the advanced cancer cohort harbored ecDNA and ChrAmp amplifications, and the average number of ecDNAs and ChrAmp amplicons per tumor in both amplicon classes was comparable between cohorts (Fig. 1c). We performed a resampling analysis in which tumor-type distribution was equal between cohorts, which confirmed that the increase in ecDNA and ChrAmp frequencies in advanced cohort tumors was independent of tumor lineage (Extended Data Fig. 2b). We confirmed high frequencies of samples containing ecDNA amplicons in glioblastomas (76%), esophageal carcinoma (52%) and bladder carcinoma (50%) cancers from the primary cancer cohort (Fig. 1d)8. The fraction of ecDNA samples and the average number of ecDNAs per sample significantly increased in the advanced cancer cohort clear cell renal and esophageal carcinoma, colorectal, prostate and breast cancer (Fig. 1e). In contrast, we observed a significant decrease in ecDNA sample fraction and ecDNA count in glioblastoma, sarcoma, head and neck and ovarian carcinoma. ChrAmp sample fraction and ChrAmp amplicon counts were observed to follow similar patterns (Extended Data Fig. 2c–e). These observations suggested that the driving roles of ecDNA and chromosomal amplicons may vary by tumor lineage.We evaluated the genomic characteristics of amplicons and found that the presence of an oncogene on the amplicon is the major determinant of amplicon complexity, which is a composite value based on the distribution of copy numbers assigned to reconstructions of the focal amplification’s genome structure and the total number of genomic segments comprising an amplicon23. This was true for both ecDNA and ChrAmp (Extended Data Fig. 3a–c). Amplicon complexity, copy number and size did not significantly differ between primary and advanced cancer cohorts. Increased genome ploidy, whole-genome duplication and microsatellite instability but not homologous recombination associated with higher rates of ecDNA and contributed to the increased rates of ecDNA in the advanced cohort (Extended Data Fig. 3d–g and Extended Data Fig. 4a–d). The observed increased frequency of ecDNA in tumors of the advanced cohort is thus, in part, explained by the higher levels of ploidy and whole-genome duplication.Localized hypermutation (kataegis) has been reported to occur frequently on ecDNAs in primary tumors24,25. We confirmed the frequent co-occurrence of kataegis on ecDNA and ChrAmp amplicons in primary cancer tumors (Fig. 1f). As localized hypermutations often happen in the context of single- and double-strand DNA break repair26, we normalized the frequency of clustered mutation events by the number of amplicon intervals. Kataegic clustered mutation events were detected at significantly higher rates in oncogene-containing but not nononcogenic ecDNAs, from the advanced cancer cohort and relative to the primary cancer cohort (Extended Data Fig. 4e). The significant difference in kataegis frequency was also observed among breast cancers, the largest cohort of a single tumor type within our datasets (Extended Data Fig. 4f). Our results suggest that ecDNAs containing oncogenes and kataegis are most likely to be detected as tumors progress.Clinical associations of ecDNA across cancersWe previously showed that the presence of an ecDNA amplicon is associated with poor prognosis in newly diagnosed tumors8. We confirmed this association in the primary and advanced cancer cohorts (Fig. 2a). A multivariate analysis that additionally considered primary tumor location, primary versus advanced cohort, sex, age across multiple bins, whole-genome doubling status, microsatellite instability status, homologous recombination status and tumor stage showed that the presence of ecDNA was associated with an increase hazard ratio (P < 0.001 ecDNA versus NoAmp, P = 0.002 ChrAmp versus NoAmp; P values by multivariate cox proportional-hazard model; Extended Data Fig. 5a).Fig. 2: Clinical associations.a, Five-year Kaplan–Meier survival curves by amplification category using patients. The P value derived from comparing the survival curves was based on a log-rank test in the primary and advanced cohorts, separately. b, Distribution of the number of distinct ecDNA and ChrAmp amplicons by pretreatment status across primary, untreated advanced cancers and pretreated advanced cancer tumors. Pretreated advanced cancer tumors show a significantly higher number of distinct ecDNAs and ChrAmps per tumor compared to primary cancer or untreated advanced cancer tumors (two-sided Mann–Whitney U test). Y axis represents the number of distinct ecDNA and ChrAmp amplicons detected per tumor. Numbers indicate patient counts. All tumors with available pretreatment information were included in the analysis. Points represent mean values, and error bars show 95% CIs. c, Distribution of the number of distinct ecDNA and ChrAmp amplicons by the number of pretreatments received across pretreated HMF advanced cancers. P value was calculated using a two-sided Mann–Kendall trend test. Points represent mean values, and error bars show a 95% CI. Only patients with available clinical information were included. Numbers indicate the number of patients. d, Distribution of the number of distinct ecDNA and ChrAmp amplicons by different prebiopsy treatment types in the advanced cancer cohort. ‘Untreated’ category only includes tumors from the advanced cohort. Number of patients per category is shown on the bottom. Only treatment types used in more than 50 patients are shown. P values were calculated using a two-sided Mann–Whitney U test. Points represent mean values, and error bars show a 95% CI.Many but not all patients included in HMF have previously undergone cancer therapy, which can alter the genomic properties of the tumor27. Untreated HMF patients (n = 542) were in majority newly diagnosed with metastatic cancer4. We observed that the ecDNA count per tumor was significantly higher in untreated HMF tumors compared to the primary cancer cohort (0.34, 95% confidence interval (CI): 0.30, 0.39 versus 0.4, 95% CI: 0.33, 0.47, P = 0.045, Mann–Whitney U test; Fig. 2b and Extended Data Fig. 5b). Next, we compared untreated HMF cancers to HMF tumors that had been exposed to anticancer treatment before the tumor biopsy collection. Pretreated HMF tumors showed a further significant increase (0.57, 95% CI: 0.50, 0.63, P = 3.8 × 10−3; Fig. 2b). A resampling analysis in which the number of samples per tumor type was equal between primary cancer cohort, untreated advanced cancer and treated advanced cancer cohort sets demonstrated that the ecDNA frequency increase following therapy exposure is independent of tumor type (Extended Data Fig. 5c). Grouping of HMF patients by the number of pretreatments demonstrated that the ecDNA frequency increase correlated with the number of therapies received (Fig. 2c and Extended Data Fig. 6a). We repeated this analysis in two tumor types with at least 20 samples per pretreatment group and observed the same trend in colorectal cancer, but not in breast cancer (Extended Data Fig. 6b). Further grouping of previously treated HMF patients by treatment class showed that chemotherapy demonstrates the strongest association with ecDNA frequency (Fig. 2d and Extended Data Fig. 6c). Tumors from patients treated with targeted therapy contained fewer ecDNAs compared to untreated tumors in the advanced cohort. Targeted therapies may specifically inhibit oncogenes carried on ecDNAs, which has been related to ecDNA genome reintegration as a mechanism of therapy resistance28. We evaluated whether pretreatment with a targeted inhibitor altered the ratio of oncogene target-carrying ecDNAs to chromosomal amplifications by comparing the observed ratio to a randomly sampled background distribution from comparable untreated cohorts. We found that the actual ratio was significantly higher compared to the background distribution, suggesting that treatment using inhibitors of oncogenes amplified on ecDNAs did not result in the formation of ChrAmps (Extended Data Fig. 6d).To investigate whether different types of chemotherapy showed different associations with the number of ecDNAs, we categorized chemotherapy mechanisms into the following three types: antimetabolite, DNA damage agent and tubulin inhibitor. HMF patients pretreated with tubulin inhibitor had a higher ecDNA frequency (Extended Data Fig. 6e). The trend observed in the ecDNA counts mirrored that of the ChrAmp counts, which may indicate that antitubulin therapy results in genomic instability that leads to the formation of new amplicons (Extended Data Fig. 6e,f)29,30. These observations implicate newly acquired focal amplifications as a marker for therapy response and suggest that specific anticancer therapies may act as drivers of amplicon formation.ecDNAs are preferentially preserved over timeAmong patients whose tumors have been sequenced as part of TCGA and HMF, a subset (n = 131) was enrolled multiple times, resulting in WGS profiles from multiple time points31. The availability of longitudinal datasets provides an opportunity for evaluation of the stability and evolution of ecDNA structure. Time-separated whole-genome tumor sequences were also available through the GLASS consortium (n = 100)21,32,33. We constructed a cohort of 153 patients with multiple whole genomes passing quality filters (Extended Data Fig. 1d). The dataset includes 70 glioblastomas and gliomas, 18 prostate cancers, 16 breast cancers and 49 matched samples from other tumor types.In total, 343 amplicons were detected at the first time point (T1), of which 55 amplicons were extrachromosomal. At time point 2 (T2), 258 amplicons were detected, including 61 ecDNAs. To determine how often amplicons were maintained over time, we determined amplicon similarity in a pair-wise fashion23. An amplicon similarity metric ranging from 0 to 1 was computed between two amplicons with overlapping territory based on shared breakpoints and genomic content. Specifically, 30 of 55 (54.5%) ecDNA and 46 of 288 (16%) ChrAmp T1 amplicons were found to match a T2 amplicon with a statistically significant similarity score. In the majority, amplicons classified as either ecDNA or ChrAmp maintained the amplicon class at T2, with 30 of 36 T1-ecDNA/T2-ecDNA amplicons and 46 of 51 T1-ChrAmp/T2-ChrAmp amplicons (Fig. 3a). Similarly, 82% of T1 samples classified as ecDNA/ChrAmp/NoAmp were assigned to the same class at T2 (Extended Data Fig. 7a). We evaluated the amplicon location and structure of five HMF-derived T1-ecDNA amplicons that were initially classified as ChrAmp at T2. Those ChrAmp amplicons were detected in tumors with tumor purity >0.7 and mean tumor genome sequence coverage >93×, substantiating that the amplicon classification was accurate. Genomic reintegration of ecDNA elements has been observed in response to treatment28. However, we did not detect sequence reads linking the T2-ChrAmp amplicons outside their original location of the genome (Extended Data Fig. 7b–f). We, therefore, suggest that the classification change from ecDNA to ChrAmp is not the result of reintegration but of clonal selection; that is, the ecDNA clone is dominant in the T1 tumor but has been outcompeted by a clone driven by a ChrAmp amplicon in T2.Fig. 3: Longitudinal amplicon analysis.a, Sankey plot showing amplicon classification over time. Only amplicon pairs with statistically significant similarity were included (n = 91). Colors reflect amplicon classification, and numbers indicate the number of amplicons retained between two time points over all amplicons from the first tumor in the corresponding amplicon category. b, The fraction of ecDNA and ChrAmp amplicon pairs retained between the first and the second tumor. Numbers in parentheses indicate the numbers of first tumor amplicons also detected in the second tumor, over the number of all first tumor amplicons. P value was calculated using the chi-square test for tumors 1 and 2. OR, odds ratio.At both time points, the fraction of ecDNA amplicons with a matching ecDNA amplicon in the reciprocal tumor was significantly higher compared to the fraction of matching ChrAmp amplicons, showing that ecDNA amplifications are more likely to be retained over time (Fig. 3b). Amplicon pairs did not show significant differences in amplicon complexity, amplicon copy number or amplicon size (Extended Data Fig. 8a–c).Next, we evaluated clustered mutation event frequency, as we found higher rates of kataegis in ecDNAs from the advanced cancer cohort compared to the primary cancer cohort. Confirming our observations from the singleton cohorts, we found that the number of clustered mutation events was significantly higher in ecDNA compared to ChrAmp amplicons (Extended Data Fig. 8d). The fraction of amplicons containing one or more clustered mutation events was significantly higher in ecDNA as well as ChrAmp amplicons that were shared, compared to amplicons that were private to one of the two time points. This finding was true when counting clustered mutations at T1 as well as at T2 (Fig. 4a,b). Vice versa, T1 ecDNAs and T1 ChrAmps were more likely to be preserved at T2 when marked by a clustered mutation event (Extended Data Fig. 9a,b). Further separating amplicons by oncogene status suggested that these results are independent of whether an oncogene is present on the amplicon, while the analysis was limited by smaller numbers (Extended Data Fig. 9c,d).Fig. 4: Clustered mutation events by amplicon category.a, The fraction and the number of ecDNA and ChrAmp amplicons with overlapping clustered mutation events in the T1 tumor. P values were computed using a binomial test (two-sided) with the fraction in the private category as a null probability for ecDNA and ChrAmp, respectively. b, The fraction and number of ecDNA and ChrAmp amplicons with overlapping clustered mutation events in the T2 tumor. P values were computed using a binomial test (two-sided) with the fraction in the private category as a null probability for ecDNA and ChrAmp, respectively.We evaluated the variant allele fractions of clustered and nonclustered mutations on ecDNA and ChrAmp amplicons. Clustered mutations showed significantly higher variant allele fractions compared to nonclustered mutations at both T1 and T2 (Fig. 5a). There was no statistically significant difference in variant allele fraction between clustered mutations detected in private compared to shared ecDNAs. To complement this analysis and adjust for possible differences in tumor purity and ploidy, we inferred mutation cancer cell fractions. Mutations on shared ecDNAs showed significantly higher cancer cell fractions compared to mutations on private ecDNAs (Fig. 5b). Both shared and private T2 clustered mutation events were carried out at significantly higher cancer cell fractions compared to nonclustered mutations. Comparable patterns were observed among ChrAmp amplicons (Extended Data Fig. 10). Combined, the differences observed between variant allele and cancer cell fraction levels of shared and private ecDNAs and ChrAmps reflect that shared ecDNAs have undergone selection over a longer period of time. In addition, the higher variant allele and cancer cell fraction of clustered relative to nonclustered mutations suggest that clustered mutations generally occurred earlier in the amplicon lifetime.Fig. 5: Variant allele fraction by mutational category.a,b, Comparison of (a) VAFs and (b) CCFs of different mutational categories detected on longitudinally shared or private ecDNA amplicons. Boxplots represent minimum (0th percentile), maximum (100th percentile), first and third quartiles and median with outliers excluded. P values were calculated using a two-sided Mann–Whitney U test. VAFs, variant allele fractions; CCFs, cancer cell fractions.

Hot Topics

Related Articles