Mitochondrial DNA mosaicism in normal human somatic cells

Landscape of mtDNA heteroplasmy in normal cellsWe explored 2,096 WGSs of clones expanded from nonneoplastic healthy single cells collected from the colorectal epithelium (431 crypts from 20 individuals)6, fibroblasts (334 cells from 7 individuals)5 and hematopoietic stem and progenitor cells (HSPCs; 1,331 cells from 4 individuals)7,8 (Fig. 1a and Supplementary Tables 1 and 2). In addition, we analyzed 31 WGSs from tumors, including 19 matched colorectal carcinoma bulk tissues from individuals who donated normal colorectal clones and 12 clones established from adenomatous polyps from one individual with MUTYH-associated polyposis6.Fig. 1: Landscape of mtDNA heteroplasmy in normal cells.a, Experimental design. b, mtDNA mosaicism identified in 2,096 normal clones from 31 donors. Donor, developmental phylogeny, total number of mtDNA variants, maximum clone-VAF, tissue type and donor age for each clone are shown. Shared variants among clones from an individual are shown in yellow. c, Landscape of mosaic mtDNA variants. Variants were classified as heavy (external circle) or light (internal circle) strands according to the mutated pyrimidine. Mutation types and consequences are represented by colors and shapes. The heavy strand replication origin region (Orib-OH) is highlighted in yellow. d, Strand-asymmetric mutational spectrum across three tissue types. e, Strand bias of C:G>T:A and T:A>C:G base substitutions according to the heavy strand replication origin (Orib-OH; m.16,197-191). The log2-transformed ratio between the numbers of heavy and light strand mutations is shown. f, Distribution of clone-VAFs and phasing of mtDNA alterations in a fibroblast clone, 10_ARL10-4_001D4. H, heavy strand; L, light strand.Using the variant allele frequencies (VAFs) of the somatically acquired mutations in nDNA, we verified the clonality of the clones (Extended Data Fig. 1a). The average mtDNA read-depth was 6,931× from normal clones (188× to 40,421×; Extended Data Fig. 1b,c), allowing for robust assessment of mtDNAs in a single clone to a heteroplasmy level of ~0.3%. For more systematic analysis, we established and applied a locus-specific background noise matrix (Methods; Extended Data Fig. 1d–g and Supplementary Table 3). To trace the developmental origin of mtDNA alterations, we constructed the early embryonic phylogeny of the clones using shared somatic nDNA mutations3,5. Of note, the first branching in each phylogeny was close to the first cell division in life, as reported previously3,5, given the VAFs of lineage-defining variants in the matched bulk blood tissues (Extended Data Fig. 2 and Supplementary Note 1).Overall, we identified 6,451 mosaic mtDNA base substitutions and insertions and deletions (InDels) from the normal clones, revealing an average of 3.1 mtDNA alterations per clone (Fig. 1b and Supplementary Table 4). Most clones (92.4%; 1,937 of 2,096) exhibited one or more mtDNA alterations, and approximately 18% of the clones (383 of 2,096) carried one or more nearly homoplasmic mtDNA alterations (defined as VAF > 90%). We believe that VAFs of mtDNA alterations in each clone (referred to as clone-VAFs hereafter) are approximate to original levels in the clone’s founder cell, as clone-VAFs were overall consistent throughout cell culture (Extended Data Fig. 3a–c and Supplementary Note 2). Additionally, direct genome sequencing of colorectal crypts obtained via laser-capture microdissection (LCM) revealed a highly similar mtDNA mutational landscape, indicating minimal culture-associated bias in mutational diversity (Extended Data Fig. 3d–g and Supplementary Note 2).The spectrum of mtDNA base substitutions was predominantly composed of transitions (C:G>T:A and T:A>C:G base substitutions; collectively 95%; Fig. 1c). These alterations exhibited an extreme level of replication-strand asymmetry, as previously observed in cancers31,32. Generally (outside the heavy strand replication origin; m.192-16,196), mutated cytosine bases of C:G>T:A alterations were predominantly on the heavy strand (92.5%), despite the scarcity of cytosines on the strand (ncytosine:nguanine = 1:2.4; Fig. 1d). Similarly, mutated thymine bases of T:A>C:G alterations were predominantly on the light strand (63.4%), despite their relative rarity on the strand (nthymine:nadenine = 1:1.3; Fig. 1d). Additionally, the strand asymmetry was reversed within the replication origin (m.16,197-191; Fig. 1c,e), where the bidirectional mtDNA replication process is operative36,37. These collectively suggest that mtDNA variant acquisition is tightly coupled with the strand-asymmetric mtDNA replication process, as speculated previously31. However, the strand asymmetry was not completely uniform across cell types (P = 3.3 × 10−52, Pearson’s chi-squared test; Fig. 1d), implying that the mtDNA replication processes may be slightly different across cell types.We occasionally observed localized acquisition of multiple mtDNA variants32. For example, 12 substitutions, with similar clone-VAFs (1.1–2.5%), were detected in a fibroblast clone (Fig. 1f). These were predominantly T:A>C:G substitutions (11 of 12), and six of them were enriched in a localized region (m.7,318-8,388) with direct evidence of coclonality in phasing, suggesting that a single mutational hit may create multiple mutations in mtDNA, like kataegis in nDNA38.Two origins of mtDNA alterationsUsing shared patterns in the developmental phylogenies and tissues, we categorized the origin of mosaic mtDNA alterations into the following two main groups: (1) heteroplasmy in the fertilized egg (termed HetFE variants; n = 409 alterations, 153 events when collapsed) and (2) postzygotic mutations acquired in somatic lineages (termed postzygotic mutations; n = 6,042; Fig. 2a,b and Extended Data Fig. 4a). Briefly, consistent with their presence from the first cell of life, HetFE variants were shared by multiple clones and/or tissues in a particular individual. In contrast, postzygotic mutations were predominantly confined to one or a few clones (n = 5,652; either as singletons (n = 5,276) or coincidentally recurrent mutations (n = 376); referred to as postzygotic simple (PZsimple) mutations). A small subset of postzygotic mutations (n = 390 from 32 mtDNA sites) were recurrent across multiple clones and not confined to a specific donor (referred to as postzygotic recurrent (PZrecurrent) mutations), suggesting a higher mutation rate at these sites compared to other mtDNA loci.Fig. 2: Fertilized egg-originated variants in mtDNA.a, A simple diagram showing how mtDNA alterations were categorized. b, Schematic diagram illustrating different shared patterns of mtDNA alterations according to their origin. c,d, Examples of HetFE variants—m.16,400 C>T (c) and m.7,496 T>C (d). Early clonal phylogenies, reconstructed using somatic nDNA mutations, are shown. Branch lengths are proportional to the number of somatic nDNA mutations. Clone-VAF in each clone, caVAF and blood-VAF are represented by bar plots at the bottom. Two pie charts in c and d indicate the proportions of mutant clones among clones of the individual (left) and among clones of other individuals (right). e, Linear correlation between the caVAF and blood-VAF. The blue line represents the regression line, and the shaded area indicates its 95% confidence interval. Pearson’s correlation coefficient and P value are provided. Two-sided Pearson’s correlation. f, Landscape of HetFE variants detected. The colors in ‘number of clones’ represent the number of clones for each individual with corresponding values indicated alongside. The numbers within the yellow circles indicate the count of detected HetFE variants with the caVAF within the specified ranges. The gray area represents the range of caVAF that cannot be detected when considering the number of clones for each individual. The average landscape of HetFE variants is shown on the right. g, The VAF distribution of heteroplasmic mtDNA variants in offspring obtained from bulk blood of 407 mother–offspring pairs. h, A bar plot categorizing HetFE variants found in the mother’s bulk blood (HetFE variants in 407 maternal blood) and the offspring’s bulk blood (HetFE variants in 407 offspring’s blood).mtDNA heteroplasmy in the fertilized eggAnnotating mtDNA mosaicism with early developmental phylogenies enabled us to capture HetFE variants5. For example, m.16,400 C>T substitution was shared by 14 fibroblast clones (51.9% of 27 clones) established from DB2 (Fig. 2c). Despite its high prevalence in DB2, the variant was extremely rare in clones from other donors (0.1%; two of 2,069 clones). Similarly, m.7,496 T>C substitution was recurrently but exclusively observed in HC19, including three normal colorectal clones (13.0% of 23 clones) and their matched colorectal cancer tissue (Fig. 2d). In both cases, the mutant clones converged at the first node of each phylogeny (Fig. 2c,d). These patterns strongly suggest that the most recent common ancestor (MRCA) cell, possibly the fertilized egg, carried the heteroplasmic variants. Consistent with their pregastrulation timing, these variants were also found in matched bulk blood tissues with substantial VAFs (0.584 and 0.149, respectively; Fig. 2c,d).Overall, we categorized 153 variant events as HetFE variants (Supplementary Table 5). They include 391 shared variants by multiple clones in an individual (6.1% of the total mtDNA variants; 135 events when collapsed) and 18 singleton variants in clones but shared by matched blood tissues. These variants were twofold enriched in the D-loop (m.16,024-576) and 1.5-fold depleted in the rRNA regions (m.648-1,601 and m.1,671-3,229) compared to PZsimple mutations39 (P = 0.0031 and 0.0363, respectively, two-sided Fisher’s exact test; Extended Data Fig. 4b).Then, we inferred the original heteroplasmy levels in the fertilized egg of HetFE variants. Notably, we observed that the average clone-VAF value of a HetFE variant across all clones from a donor (referred to as clone-averaged VAF (caVAF); Extended Data Fig. 4c) closely correlated with the heteroplasmy level in the matched polyclonal blood tissue (R = 0.967, P = 2.3 × 10−16, Pearson’s correlation; Fig. 2e). We speculated that a plausible mechanistic link between these two independent values was the heteroplasmy level in its origin; although clone-VAFs of a HetFE variant may fluctuate across clonal lineages with aging, the average (caVAF) would remain overall stable from the original heteroplasmy level, consistent with our computational simulation (Extended Data Fig. 4d). Similarly, as the VAF from the bulk blood tissues (blood-VAF) inherently represents an averaged VAF among many polyclonal blood cells, it should also closely reflect the initial heteroplasmy level. We extended our speculation to the correlation of VAFs of heteroplasmic mtDNA variants between buccal–buccal and/or buccal–blood tissues in 19 monozygotic twins (Extended Data Fig. 4e,f). Therefore, we used caVAF as a proxy for the heteroplasmy level in the fertilized egg of a HetFE variant (Supplementary Note 3).Most donors (80.6%; 25 of 31) carried one or more HetFE variants with caVAFs over 0.03% (Fig. 2f). Twelve individuals (39%) had HetFE variants with substantial caVAFs (>4%). As expected, the statistical power for capturing HetFE variants was associated with the number of clones in a donor. For example, a HetFE variant was identified with caVAF as low as 0.047% from HC02 (22 clones). In contrast, the minimum caVAF of a HetFE variant was sevenfold lower (0.0067%) in KX008 (364 clones). Considering the detection sensitivity, we profiled the average landscape of HetFE variants, which showed ~2 HetFE variants over 0.5% heteroplasmy level per fertilized egg (Fig. 2f).Notably, we believe that the actual number of HetFE variants is higher than we observed, as our detection thresholds were ~0.02% for most donors. Given that a fertilized egg typically contains ~100,000 mtDNA copies40, HetFE variants detectable in this study should be shared by at least 20 mtDNA copies in the first cell, and those restricted to a smaller number of mtDNA copies would likely be undetectable. In addition, we speculate that the origin of most HetFE variants found in this study was the maternal germline rather than new acquisitions in the fertilized egg, as newly acquired mutations would be restricted to a single mtDNA copy.To validate our findings, we explored WGSs from bulk blood tissues of 294 families (including 407 mother–offspring pairs)41. We discovered 425 heteroplasmic variants (>0.5% VAF) in the polyclonal blood of offspring, which are most likely HetFE variants in offspring (Fig. 2g). We further found that ~20% of the variants were heteroplasmic in the polyclonal blood of the mother (likely HetFE variants of the mother; Fig. 2g,h, Extended Data Fig. 4g and Supplementary Note 4). Our findings collectively indicate that (1) mtDNA heteroplasmy in the fertilized egg is not rare, likely being continuously generated in the germline; (2) a substantial fraction of HetFE variants are transmitted to the next generation39, despite the purification process during oogenesis in the maternal germline lineage42,43 and (3) these variants are one of the sources of mtDNA mosaicism observed in aged somatic cells.mtDNA turnover and drift in somatic lineagesThe distribution of clone-VAFs of a HetFE variant among the clones exhibited pressures that were shifting them to both extremes (0% or 100%) from the initial heteroplasmy level (Fig. 3a; two examples in Fig. 2c,d). For instance, the m.16,256 C>T mutation in DB10 (39 clones), which had a caVAF of 0.32, was observed as homoplasmic in 11 clones (28.2%) and almost wild type in 25 clones (64.1%; Fig. 3b). Two underlying possible scenarios include the following: (1) early embryonic bottleneck during progressive mtDNA copy number reduction in the cleavage of early embryogenesis44,45 and (2) lifetime drift through the continuous mtDNA turnovers in each somatic lineage for a lifetime46,47 (Fig. 3c).Fig. 3: mtDNA turnover leading to mtDNA drift over a lifetime.a, Clone-VAFs of HetFE variants across clones. Clones with a complete absence of corresponding HetFE variants (that is, clone-VAF = 0) are not shown. The colors in age and ‘number of clones’ represent age and the number of clones for each individual, respectively, with corresponding values indicated alongside. Each column represents each HetFE variant, and a vertical line connects clones carrying the same HetFE variant. Gray bars (bottom) show caVAF values for each HetFE variant. b, Clone-VAF distributions of a HetFE variant, m.16,256 C>T, in DB10. A gray dashed line indicates the caVAF. c, Schematic diagram illustrating clone-VAF dynamics by early embryonic bottleneck and lifetime drift. Expected mtDNA copy numbers per cell (bottom) are shown. Two alternative drift models (mitotic and homeostatic turnovers) are depicted in different colors. d, Clone-VAF distributions of HetFE variants with similar caVAF values in four individuals of different ages. e, Linear correlation between age and fixation index for HetFE variants of caVAF > 0.01. Circles represent variants colored by caVAF values. A gray line and the shaded area represent the regression line and its 95% confidence interval. Pearson’s correlation coefficient and P value are presented. Two-sided Pearson’s correlation. f, Clone-VAF difference of a HetFE variant between all possible clone pairs based on their embryonic branching time. Differences were computed when at least one clone-VAF exceeded 0.5 from each pair. Cell generations were calculated using a fixed mutation rate previously reported5. g, Average mitotic turnover counts to reach homoplasmy according to caVAF from the simulation studies. h, Estimated turnover rates for each HetFE variant in mitotic and homeostatic turnover models. Error bars represent the range of 50 simulated results with the lowest MSEs of 1,000,000 simulations per HetFE variant, with circles representing average values. HetFE variants with caVAF > 0.005 were included. Dark gray lines and shaded areas represent average turnover rates and their 95% confidence intervals in each tissue type. EEM, early embryonic mutation; CG, cell generation.The foundation of the early embryonic mtDNA bottleneck is caused by the lack of mtDNA replication until a certain stage of embryogenesis48,49 (Fig. 3c). If each embryonic cell has one or only a few mtDNA copies at a certain stage, the heteroplasmy level can be quantized according to the composition of founder mtDNAs in each embryonic cell.In parallel, mtDNAs are lost and newly replicated in somatic lineages12,50,51 (for example, cell-cycle-dependent mtDNA duplication and random segregation by half in two daughter cells in dividing cells (mitotic turnover) or cell-cycle-independent homeostatic mtDNA replacement in nondividing cells (homeostatic turnover); Fig. 3c). The processes can slightly drift heteroplasmy levels continuously over time, generating a substantial impact in a lifetime. Of these two nonexclusive scenarios, our observations indicate that the lifetime drift is dominant.First, purification of HetFE variants was age-dependent or much weaker in clones from young donors (for example, clones established from an aborted 19-week-old fetus; Fig. 3d,e and Extended Data Fig. 5a). This suggests that purification was not fixed in the early stages of human life. Second, sister clone pairs that branched out at a later time point did not exhibit more similar heteroplasmy levels of a HetFE variant than clone pairs that diverged earlier (Fig. 3f). For example, clone pairs that had an MRCA cell at the ~30th cell generation, which was much later than the early embryonic bottleneck, showed tremendous heterogeneity in clone-VAFs of a HetFE variant (Fig. 3f).Finally, the computational simulation suggested that the lifetime drift model alone was sufficient to explain the skewed distribution of clone-VAFs in a HetFE variant. Simulation studies using the mitotic turnover model (Extended Data Fig. 5b,c) indicated that 1,440 rounds of mtDNA mitotic turnovers shifted a HetFE variant with 10% initial heteroplasmy level (caVAF) to homoplasmy (100%) in ~10% of the clones when clones had 750 basal mtDNA copy numbers (a turnover was defined as replication of an mtDNA for n times, where n is the basal mtDNA copy number in a somatic cell; Fig. 3g). Likewise, simulations assuming homeostatic turnover (Extended Data Fig. 5d,e) suggested a similar conclusion, but ~50% of rounds were necessary for a similar effect under the same conditions (Supplementary Note 5).Based on the clone-VAF distributions of HetFE variants, the maximum likelihood mtDNA turnover rates across cell types were inferred (14.3, 20.8 and 17.9 mitotic turnovers per year, or 6.5, 11.5 and 9.4 homeostatic turnovers per year for the colon epithelium, fibroblasts and HSPCs, respectively; Fig. 3h). Although we believe that mitotic and homeostatic mtDNA turnovers are predominant mechanisms for colorectal epithelium and fibroblasts, respectively, their relative balance between two turnover models in each cell type is uncertain.Postzygotic mtDNA mutationsOf the 2,096 clones, 6,042 mtDNA variants (93.7% of all the variants) were categorized as postzygotic mutations, newly acquired from each somatic lineage. As mentioned above, 32 mtDNA loci showed an elevated mutation rate with 390 PZrecurrent variants (Supplementary Table 6 and Supplementary Note 6). These mutations were predominantly located in the hypervariable regions of the D-loop, homopolymer sequences or both33,52 (Extended Data Fig. 6a). Interestingly, mutations in a hotspot (m.414 T>G) were recurrently found in clones with ultraviolet (UV) light exposure (estimated using UV-associated somatic mutations in the nDNA of a clone53), suggesting UV-dependent acquisition54,55 (Extended Data Fig. 6b).Except for HetFE and PZrecurrent mutations, we detected 5,652 PZsimple mtDNA alterations. Unlike somatic mutations in nuclear genomes, it is challenging to absolutely count PZsimple mutations, as mutations with clone-VAFs below our detection threshold (~0.3%) would remain undetected. Indeed, the crude number of PZsimple mutations detected in clones was not substantially correlated with age (R = 0.282, P = 0.131, Pearson’s correlation; Fig. 4a). Instead, the overall heteroplasmy levels of PZsimple mutations in clones displayed stronger clock-like properties—PZsimple mutations with higher clone-VAF were more frequent in aged donors than young donors (Fig. 4b), and the sum of the clone-VAFs of all detected PZsimple mutations in a clone (referred to as SVAF) showed more measurable characteristics. For example, in an older individual (DB8; 93 years old), 55% of clones (26 of 47) had an SVAF of ~1.0 by 1–3 clone-specific PZsimple mutations (Fig. 4c). In contrast, in a young individual (HC10; 37 years old), all clones exhibited an SVAF far below 1.0 (0.55 versus 0, P = 5.2 × 10−6, two-sided Fisher’s exact test; Fig. 4c). Of note, there was no significant difference in the crude number of PZsimple mutations between the clones of the two individuals (Extended Data Fig. 6c). The average SVAF in the clones of an individual exhibited a strong positive correlation with age (Extended Data Fig. 6d). The correlation became stronger when the age of individuals was converted to turnover numbers from birth using the cell-type-specific turnover rates estimated from HetFE variants (R = 0.787, P = 2.5 × 10−7, Pearson’s correlation; Fig. 4d).Fig. 4: Postzygotic mtDNA variants toward homoplasmy in aged cells.a, Linear correlation between the average PZsimple mutation count and age across 31 individuals. The gray line and the shaded area represent the regression line and its 95% confidence interval. One individual aged 0 was not included in the regression. Vertical lines indicate the range of PZsimple mutation counts per clone in an individual. b, Proportions of clones with maximum clone-VAFs across 31 individuals (individual ages indicated in parentheses). Individuals are sorted by tissues, then by age, in ascending order. c, Bar plots of SVAF for each clone in two individuals, DB8 (93 years old; top) and HC10 (37 years old; bottom), with developmental phylogenies. Bar plots include up to the top three clone-VAF PZsimple mutations. Pie charts categorize clones based on SVAF. d, Linear correlation between average SVAF and total mitotic turnovers across 31 individuals. The total number of mitotic turnovers was calculated using the rates estimated by HetFE variants for each tissue type. The gray line and shaded area represent the regression line and its 95% confidence interval. One individual aged 0 was not included in the regression. Vertical lines indicate the range of SVAF per clone in an individual. (a,b,d, Clones with high UV exposure were excluded to remove UV radiation’s impact. Pearson’s correlation coefficient and P value are provided. Two-sided Pearson’s correlation.) e, Comparison of clone proportions with maximum clone-VAFs of PZsimple mutations in fibroblast clones with low and high UV-derived nDNA mutation burdens in three donors. f, Estimated mtDNA mutation rate in 31 individuals under mitotic and homeostatic turnover models (individual ages indicated in parentheses). Error bars represent the range of 50 simulated results with the lowest MSEs of 10,000 simulations per individual, with circles representing average values.Interestingly, we observed that a few fibroblast clones with a higher amount of lifetime UV-light exposure exhibited a higher SVAF of PZsimple mutations than those with a lower amount of lifetime UV-light exposure (P = 7.7 × 10−4, two-sided Fisher’s exact test; Fig. 4e). This indicates that UV exposure accelerated mtDNA turnovers in the cellular lineage. We speculate that UV exposure damages mtDNA, followed by mtDNA degradation and triggering additional mtDNA replications for their replacement21. Of note, the mtDNA mutational signatures in clones with a higher UV exposure were similar to the other clones (Extended Data Fig. 6e), indicating that UV light does not directly lead to PZsimple mutations fixed in mtDNA.With the mtDNA turnover rates estimated using HetFE variants and the landscape of detectable PZsimple mutations, we estimated the absolute number of mtDNA alterations that are newly appearing in every mtDNA replication. In all individuals and both turnover models, the absolute mtDNA mutation rates converged to 5.0 × 10−8 alterations per base pair (bp) replication (Fig. 4f). Interestingly, our estimate was within the range of error rates of polymerase γ (POLG), the mitochondrial genome’s DNA polymerase56,57. The converged rate reassures that (1) endogenous mtDNA replication is the dominant process for mtDNA mutation acquisition in somatic cells31,58,59 and (2) both turnover models (and their turnover rates) are reliable. Given the ~750 mtDNA copies in a single somatic cell, our absolute mutation rate implies an average of 0.31 de novo PZsimple mtDNA alteration is acquired per daughter cell per cell division.Selective pressure of mtDNA mutations in normal cellsTo understand the selective pressure on PZsimple mutations, we calculated the dN/dS ratio60,61,62. The ratio of missense or truncating mutations to synonymous mutations was not substantially higher than mtDNA mutations randomly generated according to the mtDNA mutational signature, indicating general neutrality in mutation acquisition (Fig. 5a). However, truncating mutations exhibited lower clone-VAFs than synonymous mutations in all three cell types, with no mutations exceeding 90% clone-VAFs, suggesting constrained expansion of mtDNAs carrying inactivating mutations due to functional disadvantage when reaching homoplasmy (P = 0.0211, 0.0017 and 0.0013 for the colon epithelium, fibroblasts and HSPCs, respectively, two-sided Fisher’s exact test; Fig. 5b). These observations were consistent with previous observations in cancer tissues31,32.Fig. 5: Selection and transcription of mtDNA variants.a, dN/dS ratios for missense and truncating mutations in each tissue type with simulated null distributions. Error bars represent 25th and 75th percentiles from simulations (10,000 simulations for each donor). b, Clone-VAF distribution of synonymous, missense and truncating mutations. Low clone-VAF variants (<0.1) were not included. Two-sided Fisher’s exact test, *P < 0.05, **P < 0.01; NS, not significant. Exact P values are 0.0211, 0.0017 and 0.0013 for the colon epithelium, fibroblasts and HSPCs, respectively. c,d, log2-transformed fold changes of expression levels. Normalized read counts from a clone were compared to the average normalized read counts among wild-type (other) clones in HC13 (c; 22 clones) and HC17 (d; 14 clones). Red and blue diamonds represent clones with truncating and missense mutations, respectively. Black circles indicate wild-type clones. Boxplots illustrate log2-transformed fold change variation in wild-type clones with median values, IQRs and whiskers (1.5× IQR). Yellow box highlighting the gene with truncating mutations. e, Scatter plots delineating clone-VAFs in genome versus transcriptome sequences according to functional consequences of mutations. Mutations in rRNA and tRNA are further subcategorized based on the secondary structure of the RNA they impact (color-coded). The gray lines represent the diagonal line y = x. f,g, Violin plots illustrating how clone-VAF between the genome and transcriptome varies based on tRNA (f) and rRNA (g) secondary structures. The y axis represents the log2-transformed ratio of clone-VAF in RNA to clone-VAF in DNA. Mutations with clone-VAF lower than 0.01 were excluded. One-sided Wilcoxon test. IQRs, interquartile ranges.Despite the expansion constraint, 15 truncating mutations displayed high clone-VAFs among the clones (clone-VAF > 0.6), accompanied by upregulated RNA expression levels of mtDNA genes (Fig. 5c,d). This phenomenon is likely attributable to a compensatory response where transcript degradation is inhibited when the protein product is dysfunctional63,64. The similarity in clone-VAFs between genome and transcriptome sequences indicates that this inhibitory effect does not distinguish between wild-type and truncated mtDNA (Fig. 5e).We further compared the clone-VAFs of PZsimple mutations in genome and transcriptome sequences (Fig. 5e). Although most mtDNA mutations showed similar clone-VAFs in both, a subset of tRNA mutations exhibited elevated clone-VAFs in transcriptomes, which is consistent with a previous report65. In contrast, a subset of rRNA mutations showed reduced clone-VAFs in transcriptomes. These mutations were predominantly clustered within stem regions of tRNA and rRNA (P = 0.0158 and 0.0329 for tRNA and rRNA mutations, respectively, one-sided Wilcoxon test; Fig. 5f,g). We speculated that these mutations influence the stability and regulation of these RNAs, leading to tRNA accumulation and rRNA degradation65,66.mtDNA copy number and structural variations (SVs) in normal cellsThe average mtDNA copy number was ~750 per cell (per diploid nuclear genome), but large variations in mtDNA copy number were observed across clones, even in an individual (Fig. 6a). For example, mtDNA copy numbers among the clones of HSPCs from KX004 ranged from ~20 to 3,700. There was no apparent correlation between median mtDNA copy number and age (R = 0.127, P = 0.381, Pearson’s correlation). Notably, interclonal mtDNA copy number variations were less substantial in colorectal clones (Fig. 6a). Despite these variations, gene expression levels of mtDNA and nDNA genes were not substantially altered among the clones, suggesting that the mtDNA copy number is not a bottleneck for the transcription of mtDNA genes, at least at the resting stage (Extended Data Fig. 7a,b).Fig. 6: mtDNA copy number in somatic cells and mtDNA in cancer.a, mtDNA copy number distributions among 31 individuals, sorted by tissue type, then by age in ascending order. Black dots and red bars represent clones (n = 2,096) and mean values, respectively. b, Read-depth of the mitochondrial genome showing large deletions in two colorectal clones (HC06-14 and HC21-16). Yellow lines represent the deleted regions. c,d, Mutation number (c) and SVAF (d) in normal colorectal clones (blue) and matched colon cancer tissues (red) are correlated with the number of mitotic turnovers across 19 donors. The age of donors is shown in parentheses. Vertical lines indicate the range across clones in each donor. Red and blue lines represent regression lines. e, The proportion of truncating mtDNA mutations within normal colorectal clones and colorectal cancer tissues. Two-sided Fisher’s exact test. f, Distributions of mtDNA copy numbers in normal colorectal clones and matched cancer tissues among 19 individuals. Boxplots illustrate median values with IQRs and whiskers (1.5× IQR). g, The linear correlation between tumor cell fraction and mtDNA copy numbers per diploid nuclear genome in cancer tissues. The gray line and the shaded area represent the regression line and its 95% confidence interval, respectively. Copy number values at 0% and 100% tumor cell fractions are shown by extrapolation. Pearson’s correlation coefficient and P value are provided. Two-sided Pearson’s correlation. TME, tumor microenvironment.Two colorectal clones had notable SVs within their mtDNA (Fig. 6b and Extended Data Fig. 7c), with deletions of 10,951 bp and 3,389 bp, respectively, at approximately 45% heteroplasmy levels. As expected, gene expression levels in the deleted loci were lower than in the flanking regions (P < 0.05, Wald test; Extended Data Fig. 7d). Notably, these large deletions have been observed in cancers at a similar frequency32. Our findings illustrate that SVs can occur in normal clones67; however, these rare events involve only approximately 0.1% of normal cells.Accelerated mtDNA turnover in tumorigenesisIn 19 matched colorectal cancer tissues, we observed, on average, more detectable mutations (5.3 versus 3.8; P = 0.0301, Wilcoxon signed rank exact test; Fig. 6c) and higher SVAF values (P = 8.5 × 10−4, Wilcoxon signed rank exact test; Fig. 6d) than normal clones from the same donor. Our findings suggest an elevated mtDNA mutation rate, turnover rate or both during tumor initiation and clonal evolution68. Consistent with this speculation, in 12 clones established from MUTYH-associated adenomatous polyps6, homoplasmic mtDNA mutations were more frequently observed in lineages with more driver mutations (Extended Data Fig. 7e).We further investigated detectable PZsimple mutations in 70 colorectal carcinomas (19 matched and 51 unrelated colorectal cancers69; Supplementary Table 7). Qualitatively, colorectal cancers exhibited a notably higher prevalence of truncating mutations with >0.6 VAFs than normal clones (0.0203 versus 0.0026, P = 1.5 × 10−4, two-sided Fisher’s exact test; Fig. 6e). This finding suggests increased accumulation of deleterious mutations in colorectal cancers, as observed previously32.Finally, compared to the mtDNA copy numbers in normal clones, 19 matched colon cancer tissues demonstrated biased copy number changes (per diploid nuclear genome) toward either gain or loss of mtDNA copies at face value (Fig. 6f). To gain insights into the mtDNA copy numbers in pure colon cancer cells without co-existing tumor microenvironmental cells, such as infiltrating lymphocytes, we correlated mtDNA copy numbers of cancer tissues with their tumor cell fractions estimated from genome sequences70 and found a strong positive linear relationship (R = 0.715, P = 5.7 × 10−4, Pearson’s correlation; Fig. 6g). Extrapolation of the regression line suggested ~1,266 mtDNA copies per diploid nuclear genome at 100% tumor cell fraction, which is 70% higher than in normal colorectal clones. Indeed, we confirmed an mtDNA copy number increase in colon cancer cells by WGSs of 14 colon cancer organoids (100% tumor cell fraction; 1,224 mtDNA copies per diploid cancer cell; Extended Data Fig. 7f). The underlying reason for the mtDNA copy number gain in cancer cells is uncertain.Similarly, mtDNA copy numbers in cancer tissues were negatively correlated with the amount of infiltrating CD3+ T cells (Extended Data Fig. 7g). Genome sequencing of T cells sorted from the peripheral blood suggested that there were ~123 mtDNA copies per T cell (Extended Data Fig. 7f), which was close to the value extrapolated from the regression line (Fig. 6g).

Hot Topics

Related Articles