Fundamental equations linking methylation dynamics to maximum lifespan in mammals

EthicsWe utilized publicly available data as described in ref. 25. Below, we detail the species involved, institutions, and relevant ethics protocol numbers. Mammalian samples, including yellow-bellied marmots, were collected under the UCLA Institutional Animal Care and Use protocol (#2001-191-01, renewed annually) and with permission from Colorado Parks and Wildlife (TR917, renewed annually). Plains zebra samples were collected under a protocol approved by the Research Safety and Animal Welfare Administration, University of California, Los Angeles: ARC #2009-090-31, originally approved in 2009. General mammalian samples were approved by the Animal Welfare and Ethics Review Board, University of Rochester Committee on Animal Resources (UCAR), Animal Protocol #101939/UCAR-2017-033. Human samples were covered by the University of California, Los Angeles (IRB#15-001454, IRB#16-000471, IRB#18-000315, IRB#16-002028) and the Oxford Research Ethics Committee in the UK (reference 10/H0605/1). Voles were managed under the Institutional Animal Care and Use Committee (IACUC) of Cornell University (protocol #2013-0102), following NIH guidelines. Deer mice were handled by the Peromyscus Genetic Stock Center, University of South Carolina, with approval from the IACUC of the University of South Carolina (protocol #2356-101506-042720). Horses were covered under the UC Davis IACUC protocols (#19037, #20751, and #21455). The naked mole rat study was approved by the University of Rochester Committee on Animal Resources (protocol #2009-054). Beluga whale research was authorized under NMFS Research Permit 932-1905-00/MA-009526 and MMPA Permit #20465, issued by the National Marine Fisheries Service (NOAA). Bowhead whale studies were approved by Fisheries and Oceans Canada (DFO), LFSP S-19/20-1007-NU, and Animal Care approval (AUP) FWI-ACC-2019-14. Killer whale research was conducted under NMFS General Authorization No. 781-1725 and scientific research permits 781-1824-01, 16163, 532-1822-00, 532-1822, 10045, 18786-03, 545-1488, 545-1761, and 15616. Humpback whale research was approved under various permits, including NMFS permits (21485, 16325, 20465, 14245, 633-1483, 633-1778, 932-1905), the Canadian Department of Fisheries and Oceans, and IACUC #NWAK-18-02. For cats, the research was approved by the Clinical Research Ethical Review Board of the Royal Veterinary College (URN: 2019 1947-2). The elephant study was authorized by the management of each participating zoo, reviewed by zoo research committees where applicable, received IACUC approval (#18-29) at the National Zoological Park (Smithsonian’s National Zoo), and was endorsed by the Elephant Taxon Advisory Group and Species Survival Plan. Rat studies were approved by the Institutional Animal Ethics Committee of SVKM’s NMIMS University, Mumbai (approval no. CPCSEA/IAEC/P-6/2018), adhering to CPCSEA guidelines from the Government of India. Dog research was approved by the Animal Care and Use Committee of the National Human Genome Research Institute (NHGRI) at NIH (protocol #8329254). Bat research was conducted with approval from the University of Maryland IACUC (protocol FR-APR-18-16). Cattle research was approved by the University of Nebraska IACUC (approval #1560). Mice research was approved by the University of Texas Southwestern Medical Center (APN 2015-100925, renewed every 3 years). Apodemus mouse research was approved by the University of Edinburgh Ethical Review Committee (UK Home Office Project License PP4913586). Spiny mouse research was conducted under the approval of the University of Kentucky (protocol #2019-3254). Finally, shrews and other small species from the Museum of Biological Diversity at The Ohio State University were managed under The Ohio State University IACUC (protocol #2017A00000036).Statistics and reproducibilityWe do not present findings from specific experiments. Rather, DNA samples were collected opportunistically from available freezer-stored materials provided by our collaborators. Data collection and analysis were not conducted blind to variables such as tissue type. Methylation measurements were taken from different animals, ensuring that no animal was measured more than once. Our linear regression modeling and correlation tests assume normality. Severe outliers were excluded to arrive at a reliable rate of change estimates. To this end, we developed an outlier removal algorithm described below. Without it, the empirical data aligns poorly with the mathematical formulas (see Fig. 6). Below, we outline the quality control measures for our samples and the statistical methods used in each analysis. Additional details are provided in Supplementary Note 1.Selection of dog methylation dataWe analyzed methylation profiles from N = 742 blood samples derived from 93 dog breeds (Canis lupus familiaris). Primary characteristics (sex, age, average life expectancy) for the breeds utilized are presented in Supplementary Data 1. Standard breed weight and lifespan were aggregated from several sources as detailed in ref. 26. We created consensus values based on the American Kennel Club and the Atlas of Dog Breeds of the World. Lifespan estimates were calculated as the average of the standard breed across sexes, compiled from numerous publications consisting primarily of surveys of multi-breed dog ages and causes of death from veterinary clinics and large-scale breed-specific surveys, which are often conducted by purebred dog associations. Sources for median-lifespan per dog breed are reported in ref. 26. We calculated the maximum lifespan for dog breeds by multiplying the median-lifespan with a factor of 1.33, i.e., MaxLifespan = 1.33 ∗ MedianLifespan. Our results are qualitatively unchanged if other multipliers are used. Detailed values on the dog breeds are reported in Supplementary Data 1. Median lifespans of the 93 breeds ranged from 6.3 years (Great Dane, average adult breed weight = 64 kg) to 14.6 years (Toy Poodle, average adult breed weight = 2.3 kg). Median lifespan estimates were based on the combined findings of multiple large-scale breed health publications, utilizing the median and maximum ages for each breed.We identified 3 dog breeds (Otterhound, n = 4; Weimaraner n = 3; Saint Bernard Dog n = 2) as outlier strata for which the rate of change in methylation was not meaningfully estimated according to the following exclusion criteria. In addition to their small sample sizes, the age ranges are poor all with SD (R) < 0.1, resulting in extreme AROCM values >0.5 (Supplementary Data 1). The remainder of the dog breeds all had AROCM values no larger than 0.34. In summary, the following criteria should be considered for our dog data or other similar data in the future.

Small sample size, i.e., n < 3.

Low standard deviation in relative age, i.e., SD (R) < 0.1.

Bad linear regression fit of AROCM, i.e., R2 < 0.2.

Selection of mammalian species/tissue strataThe raw data included 249 species-tissue strata from 133 unique species (Fig. 6). We selected strata with sufficient sample sizes and no influential outliers. Similar to the dog data, we excluded strata for the following reasons.

Small sample size of n < 3.

Low standard deviation in relative age, i.e., SD (R) < 0.06, to avoid strata with constant ages.

Strata with AROCM values out of range (estimate < −1 or >10) were omitted if the values in derived/adjacent age intervals were not outliers. Toward this end, derived AROCMs were calculated for different age intervals within the same species/tissue stratum. For example, a severely outlying value AROCM[0, 0.1 ∗ L] was declared an outlier if both AROCM[0, 0.2 ∗ L] and AROCM[0, 0.3 ∗ L] fell within the range ( −1, 10) for the same stratum.

To obtain additional strata for the dog data, we opted for a more lenient SD(R) cutoff. Given the greater number of strata in the mammalian data compared to the dog data (229 versus 94), we selected a less strict SD(R) threshold of 0.06 instead of 0.1 to ensure sufficient strata for analysis.For many of the 133 unique mammalian species, several tissue types were available. The species characteristics such as maximum lifespan come from an updated version of the anAge data base7,25. We analyzed S = 229 different species/tissue strata defined on the entire age range [0, L] (Supplementary Data 3). Out of the 229 strata, 100 involved blood, 73 skin, 26 brain, and 15 liver. Fewer strata were available for other age ranges. For example, S = 128 for the young age group (defined by [0, 0.1 ∗ L]) and S = 221 for the old age group (defined by [0.1 ∗ L, L]).Methylation platformBoth dog and mammalian data were generated using the same HorvathMammalMethylChip40 platform, which offers high coverage of approximately 36K conserved CpGs in mammals28. To minimize technical variation, all data were generated by a single lab (Horvath) using a single, consistent measurement platform. Preprocessing and normalization were performed using the SeSaMe method to define beta values for each probe56. The chip manifest file is available on the Gene Expression Omnibus (GEO) platform GPL28271 and on our GitHub page28).Chromatin statesFollowing the pan-mammalian aging study of the Mammalian Methylation Consortium, we grouped the CpGs into 54 universal chromatin states that were covered by at least 5 CpGs each27. These 54 chromatin states encompass those associated with both constitutive and cell-type-specific activity across a variety of human cell and tissue types57. In their 2022 study, Vu and Ernst employed a hidden Markov model approach to generate a universal chromatin state annotation of the human genome. This was based on data from over 100 cell and tissue types sourced from the Roadmap Epigenomics and ENCODE projects. These chromatin states are characterized in relation to 30 histone modifications, the histone variant (H2A.Z), and DNase I hypersensitivity measurements. We and others have previously found that strong age-related gain of methylation can be observed in bivalent promoter states and other states that are bound by Polycomb group repressive complex 2 (PRC2 binding sites)27,41,42,43. To facilitate a detailed analysis of PRC2 binding, we split each chromatin state into 2 subsets denoted by StateName+ and StateName- according to PRC2 binding (+ for yes and – for no). For example, the BivProm2+ is the set of 552 CpGs that reside in bivalent chromatin state 2 and are bound by PRC2 (Supplementary Data 5).Adjusted rate of change and adjusted correlationThe relative age R, defined as the ratio of age to maximum lifespan, is crucial for disentangling the relationship between rates of change and maximum lifespan (see Methods). The standard deviation of relative age, SD (R), reflects the sample ascertainment, collection, and design. In many real datasets, SD (R) varies across species due to uneven sampling, which may primarily include young or old animals in some species. This variability in SD (R) can dilute the signal between the rate of change and maximum lifespan while affecting Cor(Methyl, Age), as evidenced in our simulation and empirical studies. Adjusting for SD (R) amplifies the inherent biological signal in both measures. Applying these formulas to the methylation data allowed us to present fundamental equations that link the rate of change in methylation in specific chromatin states (e.g. bivalent promoter regions) to maximum lifespan in mammals.Here we present a mathematical formalism that links three measurements: i) the rate of change in the biomarker across the life course, ii) the Pearson correlation between age and the biomarker, and iii) the standard deviation of relative age. In most empirical datasets, the standard deviation of relative age is correlated with the Pearson correlation between age and the biomarker (Supplementary Fig. 2), which reflects the idiosyncrasies of the sample collection. The standard deviation of age has a confounding effect on both the rate of change and the correlation between a biomarker and age. To study and eliminate this confounding effect, we introduce two concepts: adjusted rate of change and adjusted Pearson correlation. We present mathematical propositions describing the conditions under which strong relationships between the rate of change and lifespan can be observed.In the following, we derive general equations that link the rate of change (also known as gradient or slope) of any continuous biomarker of aging (denoted as M ∈ R) to the species maximum lifespan. For example, M could denote mean methylation in a particular chromatin state. Assume M = (M1, . . Mn) and A = (A1, . . , An) are two numeric vectors of n samples for the biomarker M and the Age variable. We will be using the following definitions surrounding the sample mean, sample variance and standard deviation, coefficient of variation, sample covariance, and Pearson correlation.$$\overline{{{{\bf{M}}}}} \doteq \frac{1}{n}{\sum }_{i=1}^{n}{M}_{i},\\ {{{\rm{Var}}}}\,({{{\bf{M}}}}) \doteq \frac{1}{n}{\sum }_{i=1}^{n}{({M}_{i}-\overline{{{{\bf{M}}}}})}^{2},\\ {{{\rm{SD}}}}\,({{{\bf{M}}}}) \doteq \sqrt{{{{\rm{Var}}}}\,({{{\bf{M}}}})}\\ {{{\rm{CoefVar}}}}\,({{{\bf{M}}}}) \doteq \frac{{{{\rm{SD}}}}\,({{{\bf{M}}}})}{\overline{{{{\bf{M}}}}}},\\ {{{\rm{Cov}}}}\,({{{\bf{M}}}},\, {{{\bf{A}}}}) \doteq \frac{1}{n}{\sum }_{i=1}^{n}({M}_{i}-\overline{{{{\bf{M}}}}})({A}_{i}-\overline{{{{\bf{A}}}}}),\\ {{{\rm{Cor}}}}\,({{{\bf{M}}}},\, {{{\bf{A}}}}) \doteq \frac{{{{\rm{Cov}}}}({{{\bf{M}}}},\, {{{\bf{A}}}})}{\sqrt{{{{\rm{Var}}}}\,({{{\bf{M}}}})\,*\, {{{\rm{Var}}}}\,({{{\bf{A}}}})}}.$$
(10)
Next, we define the rate of change, ROC(M; A), as the change in M resulting from a 1-year increase in age (calendar age in units of years). Statistically speaking, the rate of change, ROC(M; A), is the slope/coefficient β1 in the univariate linear regression model below,$${M}_{i}\,=\,{\beta }_{0}+{\beta }_{1}{A}_{i}+{\epsilon }_{i},$$where the index i refers to the i-th tissue sample and the expected value of the error term ϵi is assumed to be zero. The rate of change can be estimated by the least squares or the maximum likelihood estimator, ${\widehat{\beta }}_{1}$. Furthermore, it can be expressed in terms of the Pearson correlation coefficient and standard deviations as follows$${{{\rm{ROC}}}}({{{\bf{M}}}};{{{\bf{A}}}})\,\doteq \,{\widehat{\beta }}_{1}\,=\,\frac{{{{\rm{Cor}}}}({{{\bf{M}}}},\,{{{\bf{A}}}}){{{\rm{SD}}}}\,({{{\bf{M}}}})}{{{{\rm{SD}}}}\,({{{\bf{A}}}})}.$$
(11)
To arrive at a unit-less biomarker, which lends itself to comparisons with other biomarkers, we standardize M to have mean zero and standard deviation one, by scaling it as below,$$Scaled{M}_{i}\,\doteq \,\frac{{M}_{i}-\overline{{{{\bf{M}}}}}}{{{{\rm{SD}}}}\,({{{\bf{M}}}})}.$$In our dataset, we do not observe a significant correlation between SD (M) and lifespan (L), see Supplementary Fig. 16. Using SD (ScaledM) = 1, equation (11) becomes$${{{\rm{ROC}}}}({{{\bf{ScaledM}}}};{{{\bf{A}}}})\,\doteq\, {\widehat{\beta }}_{1}\,=\,\frac{{{{\rm{Cor}}}}({{{\bf{ScaledM}}}},\,{{{\bf{A}}}})}{{{{\rm{SD}}}}\,({{{\bf{A}}}})}\,=\,\frac{{{{\rm{Cor}}}}({{{\bf{M}}}},\,{{{\bf{A}}}})}{{{{\rm{SD}}}}\,({{{\bf{A}}}})}$$
(12)
where the latter equation used the fact that the Pearson correlation, Cor, is invariant with respect to linear transformations. To reveal the dependence on species maximum lifespan, it is expedient to define relative age as the ratio of age and maximum lifespan:$${R}_{i}=\frac{{A}_{i}}{L}.$$
(13)
Since the standard deviation is the square root of the variance, one can easily show that SD (A) = SD (R) ∗ L. Combining equations (12) and (13) results in$${{{\rm{ROC}}}}({{{\bf{ScaledM}}}};{{{\bf{A}}}})=\frac{{{{\rm{Cor}}}}({{{\bf{M}}}},\,{{{\bf{A}}}})/{{{\rm{SD}}}}\,({{{\bf{R}}}})}{L}.$$
(14)
Since Pearson’s correlation is scale-invariant, the following equality holds and we will use them interchangeably, Cor(M, A) = Cor(ScaledM, A) = Cor(M, R).
Proposition 1
Relationship between ROC and Lifespan If the following condition holds across all strata,$${{{\rm{Cor}}}}({{{\bf{M}}}},\, {{{\bf{A}}}})/{{{\rm{SD}}}}\,({{{\bf{R}}}})\,\approx\, constant,$$
(15)
then equation (14) implies$${{{\rm{ROC}}}}({{{\bf{ScaledM}}}};{{{\bf{A}}}})\,\approx \,\frac{constant}{L}.$$
(16)

Due to sampling bias and uneven distributions of relative age, the strong condition (15) is usually not satisfied (see, for example, Supplementary Fig. 2f). We propose a simple adjustment to formulate a weaker, more realistic assumption that leads to a conclusion similar to equation (16). To this end, we rewrite equation (14) as follows:$${{{\rm{ROC}}}}({{{\bf{ScaledM}}}};{{{\bf{A}}}})\times {{{\rm{SD}}}}\,{({{{\bf{R}}}})}^{1-p}=\frac{{{{\rm{Cor}}}}({{{\bf{M}}}},\,{{{\bf{A}}}})/{{{\rm{SD}}}}\,{({{{\bf{R}}}})}^{p}}{L},$$
(17)
which multiplies both sides by SD (R)1−p with a power parameter p. Next we define:$$Adj.{{{\rm{ROC}}}}({{{\bf{ScaledM}}}};{{{\bf{A}}}},\,p)\, \doteq \, {{{\rm{ROC}}}}({{{\bf{ScaledM}}}}| {{{\bf{A}}}})\times {{{\rm{SD}}}}\,{({{{\bf{R}}}})}^{1-p},\\ {{{\rm{Adj}}}}{{{\rm{.Cor}}}}({{{\bf{M}}}}| {{{\bf{R}}}},\,p)\, \doteq \, \frac{{{{\rm{Cor}}}}({{{\bf{R}}}},{{{\bf{M}}}})}{{{{\rm{SD}}}}\,{({{{\bf{R}}}})}^{p}}.$$
(18)
Note that if SD (R) remains constant across strata, indicative of a perfect design, the adjustment essentially involves multiplying or dividing by a constant, irrespective of the power p. This means the adjustment leaves the relationship between ROC and lifespan unchanged. On the other hand, if SD (R) fluctuates across strata-indicative of an imperfect study the adjustments have the potential to enhance the signal. Further, note that Adj.ROC becomes the standard definition of the ROC with p = 1. On the other hand, p = 0 implies that Adj.Cor(M∣R, p) = Cor(M, R). We introduce this terminology for several reasons. To begin with, equation (17) can be succinctly written as follows$$Adj.{{{\rm{ROC}}}}({{{\bf{ScaledM}}}};{{{\bf{A}}}},\,p)=\frac{{{{\rm{Adj}}}}{{{\rm{.Cor}}}}({{{\bf{M}}}}| {{{\bf{R}}}},\,p)}{L}.$$
(19)
The following material outlines the specific conditions required for the validity of the equation below:$$\,{{\mbox{Adj.ROC}}}\,\approx \frac{c}{L},$$where c is a constant. Here, the approximation sign ≈ indicates a strong linear correlation across strata when assessed on a logarithmic scale. We start with the log-transformed version of equation (19):$$\log (Adj.{{{\rm{ROC}}}}({{{\bf{ScaledM}}}};{{{\bf{A}}}},\,p))=\log ({{{\rm{Adj}}}}\cdot{{{\rm{Cor}}}}({{{\bf{M}}}}| {{{\bf{R}}}},\,p))-\log (L),$$
(20)
where we assume that the natural logarithm ($\log$) is applicable, i.e., the adjusted ROC and the adjusted correlation take on positive values.The above-mentioned definitions and equations apply to each stratum (e.g., each dog breed). Assuming there are S total strata, we introduce a superscript in various quantities, e.g., we write L(s), and Adj.Cor(M(s)∣R(s), p), where s = 1, 2, . . , S. Define the following 3 vectors that have S components each$${{{\bf{log.L}}}}=\left(\log ({L}^{(1)}),\log ({L}^{(2)}),..,\log ({L}^{(S)})\right)\\ {{{\bf{log.Adj.Cor}}}}(p)={\left(\log \left({{{\rm{Adj}}}}{{{\rm{.Cor}}}}({{{{\bf{M}}}}}^{(s)}| {{{{\bf{R}}}}}^{(s)},\,p)\right)\right)}_{1\le s\le S}\\ {{{\bf{log.Adj.ROC}}}}(p)={\left(\log \left(Adj.{{{\rm{ROC}}}}({{{{\bf{ScaledM}}}}}^{(s)};{{{{\bf{A}}}}}^{(s)},\,p)\right)\right)}_{1\le s\le S}.$$For each vector on the left-hand side, we can form the sample mean and sample variances across S strata,$$\overline{{{{\bf{log.Adj.Cor}}}}}= \frac{1}{S}{\sum }_{s=1}^{S}\log ({{{\rm{Adj}}}}{{{\rm{.Cor}}}}({{{{\bf{M}}}}}^{(s)}| {{{{\bf{R}}}}}^{(s)}))\\ {{{\rm{Var}}}}\,({{{\bf{log.Adj.Cor}}}})= \frac{1}{S}{\sum }_{s=1}^{S}{\left(\log ({{{\rm{Adj}}}}{{{\rm{.Cor}}}}({{{{\bf{M}}}}}^{(s)}| {{{{\bf{R}}}}}^{(s)}))-\overline{{{{\bf{log.Adj.Cor}}}}}\right)}^{2}$$We will present several propositions and outline their proofs. In some cases, we provide only a rough outline, as exact derivations would require more complex formalism. The following critical condition states that lifespan does not correlate with adjusted age correlation on the log scale:(C1)$${{{\rm{Cor}}}}({{{\bf{log.L}}}},\,{{{\bf{log.Adj.Cor}}}}(p))=0.$$
(21)
Condition Cor(log.L, log.Adj.Cor(p)) = 0 holds when species lifespan and the Adj.Cor(p) are independent across strata. Our methylation data suggest that this condition is approximately satisfied for certain chromatin states (Fig. 7).
Proposition 2
If (C1) holds, then$${{{\rm{Cor}}}}\left({{{\bf{log.L}}}},{{{\bf{log.Adj.ROC}}}}({{{\bf{p}}}})\right)=\frac{-1}{\sqrt{1+{{{\rm{Var}}}}\,({{{\bf{log.Adj.Cor}}}})/{{{\rm{Var}}}}\,({{{\bf{log.L}}}})}}$$

Proof
Denote vectors x = log.L and y = log.Adj.ROC. With equation (20) we find that the covariance$${{{\rm{Cov}}}}({{{\bf{x}}}},\,{{{\bf{y}}}})={{{\rm{Cov}}}}({{{\bf{log.L}}}},\,{{{\bf{log.Adj.Cor}}}})-{{{\rm{Cov}}}}({{{\bf{log.L}}}},\,{{{\bf{log.L}}}})$$By assumption, the first term is zero, which entails that$${{{\rm{Cov}}}}({{{\bf{x}}}},\,{{{\bf{y}}}})=-{{{\rm{Cov}}}}\left({{{\bf{log.L}}}},\,{{{\bf{log.L}}}}\right)=-{{{\rm{Var}}}}\,({{{\bf{x}}}})$$Similarly, Cov(log.L, log.Adj.Cor) = 0 implies that$${{{\rm{Var}}}}\,({{{\bf{y}}}})= {{{\rm{Var}}}}\,({{{\bf{log.L}}}})-2*{{{\rm{Cov}}}}({{{\bf{log.L}}}},\,{{{\bf{log.Adj.Cor}}}})+{{{\rm{Var}}}}\,({{{\bf{log.Adj.Cor}}}})\\= {{{\rm{Var}}}}\,({{{\bf{log.L}}}})+{{{\rm{Var}}}}\,({{{\bf{log.Adj.Cor}}}})$$Thus, the assumption implies that$${{{\rm{Cor}}}}({{{\bf{x}}}},\,{{{\bf{y}}}})= \frac{{{{\rm{Cov}}}}(x,\,y)}{\sqrt{{{{\rm{Var}}}}\,({{{\bf{x}}}}){{{\rm{Var}}}}\,({{{\bf{y}}}})}}\\= -\frac{{{{\rm{Var}}}}\,({{{\bf{x}}}})}{\sqrt{{{{\rm{Var}}}}\,({{{\bf{x}}}})\left({{{\rm{Var}}}}\,({{{\bf{x}}}})+{{{\rm{Var}}}}\,({{{\bf{log.Adj.Cor}}}})\right)}}\\= -\frac{1}{\sqrt{1+{{{\rm{Var}}}}\,({{{\bf{log.Adj.Cor}}}})/{{{\rm{Var}}}}\,({{{\bf{log.L}}}})}}.$$
The following proposition is a direct consequence of Proposition 2.
Proposition 3
If (C1) holds and the ratio$${{{\rm{Ratio}}}}(p)=\frac{{{{\rm{Var}}}}\,({{{\bf{log.Adj.Cor}}}}(p))}{{{{\rm{Var}}}}\,({{{\bf{log.L}}}})}\, \approx \, 0,$$
(22)
then$${{{\bf{log.Adj.ROC}}}}\,\approx \,\overline{{{{\bf{log.Adj.Cor}}}}}-{{{\bf{log.L}}}}.$$
Proposition 3 implies that Lifespan and Adj.ROC follows a nearly perfect inverse linear correlation on the log scale (Cor ≈ − 1) if Var (log.Adj.Cor) ≪ Var (log.L). The latter condition is typically satisfied in real data as the range of lifespans across strata is often much larger than the Adj.Cor values, which is the case for our data from the mammalian methylation consortium.
Proof
Proposition 2, combined with the assumption that Ratio(p) ≈ 0, leads to the conclusion that Cor(log.Adj.ROC, log.L) ≈ −1. Given that a Pearson correlation nearing negative one indicates an almost perfect linear relationship, this finalizes the proof.
We are now ready to state the main proposition.
Proposition 4
The linear relationship between log.Adj.ROC and log.L If (C1, equation (21)) holds and the squared coefficient of variation in Adj.Cor(p) is much smaller than the squared coefficient of variation in L, i.e.,$${{{\rm{Ratio}}}}(p)=\frac{{{{\rm{CoefVar}}}}{({{{\rm{Adj}}}}{{{\rm{.Cor}}}}(p))}^{2}}{{{{\rm{CoefVar}}}}{(L)}^{2}}\,\approx\, 0$$
(23)
then$${{{\bf{log.Adj.ROC}}}}\,\approx \,\overline{{{{\bf{log.Adj.Cor}}}}}-{{{\bf{log.L}}}}.$$
(24)

Proof
In the following, we will show that the assumption (equation (23)) implies equation (22) in Proposition 3. We will use the following Delta method approximation for computing the variance of f(X) of a random variable X,$${{{\rm{Var}}}}\,(f(X))\approx f^{\prime} {({{{\rm{E}}}}(X))}^{2}{{{\rm{Var}}}}\,(X),$$where Var (X) and E(X) denote the variance and expectation of X, respectively. With $f(x)=\log (x)$, $f{^\prime} (x)=1/x$ and X = Adj.Cor(p), the above approximation results in$${{{\rm{Var}}}}\,(\log ({{{\rm{Adj}}}}{{{\rm{.Cor}}}}(p)))\,\approx\, \frac{{{{\rm{Var}}}}\,({{{\rm{Adj}}}}{{{\rm{.Cor}}}}(p))}{{{{\rm{E}}}}{({{{\rm{Adj}}}}{{{\rm{.Cor}}}}(p))}^{2}}={{{{\rm{CoefVar}}}}}^{2}({{{\rm{Adj}}}}{{{\rm{.Cor}}}}(p))$$where CoefVar( ⋅ ) denotes the coefficient of variation. Analogously, we have$${{{\rm{Var}}}}\,(\log (L))\,\approx\, \frac{{{{\rm{Var}}}}\,(L)}{{{{\rm{E}}}}{(L)}^{2}}={{{{\rm{CoefVar}}}}}^{2}(L).$$Therefore, (23) implies (22) and concludes the proof.
Condition (23) is approximately satisfied in the mammalian data and the dog data: in the mammalian data, CoefVar(Adj.Cor(p)) = 0.28 and CoefVar(L) = 0.91 resulting in Ratio(p) = 0.095. In the dog breed data, CoefVar(Adj.Cor(p)) = 0.12 and CoefVar(L) = 0.16 resulting in Ratio(p) = 0.56. The judicious choice of the adjustment power p resulted in lower coefficients of variation, as can be seen in the comparison with the unadjusted values: CoefVar(Cor/SD ) = 0.68 for the mammalian data and 0.24 for the dog data.Exponentiating both sides of equation (24), we arrive at$${{{\rm{Adj}}}}.{{{{\rm{ROC}}}}}^{(s)}\,\approx\, \frac{c(p)}{{L}^{(s)}}$$
(25)
where $c(p)=\exp \left(\overline{{{{\bf{log.Adj.Cor}}}}}\right)$ is some constant. The choice of the parameter p will be discussed in the following.Criteria for choosing the power p in the adjustmentOur aforementioned equations utilize the parameter p, which underlies our definitions of the adjusted correlation and the adjusted ROC. Choosing p = 1 results in standard (non-adjusted) versions of the ROC, but opting for a lower value of p can be advantageous for the following three reasons: First, Proposition 4 states that a strong linear relationship between log.Adj.ROC and log. L holds if p is chosen to minimize the coefficient of variation function: $C(p)={{{\rm{CoefVar}}}}\left({{{\rm{Adj}}}}\cdot{{{\rm{Cor}}}}(p)\right)$. Since the coefficient of variation is sensitive to outliers, we find it expedient to use a robust alternative known as the quartile coefficient of dispersion (QCOD):$$\,{{\mbox{QCOD}}} \,(p)\, \doteq \, \frac{{Q}_{3}(p)-{Q}_{1}(p)}{{Q}_{3}(p)+{Q}_{1}(p)}.$$
(26)
where Q1(p) and Q3(p) denote the first and third quartile of the distribution of Adj.Cor(p). In our empirical studies, we chose p so that it minimized QCOD (equation (26)), i.e.,$${p}_{optimal}=\arg {\min}_{p}\,{\mbox{QCOD}}\,\left[\log ({{{\rm{Adj}}}}{{{\rm{.Cor}}}}(p))\right]$$
(27)
Using the QCOD-based criterion, we determined poptimal = 0.1 for our dog data and poptimal = 0.25 for our mammalian methylation dataset (refer to Supplementary Fig. 1). Had we employed the coefficient of variation in place of the QCOD, our choice of p would have been consistent across both datasets, as depicted in Supplementary Fig. 1. This alignment between the coefficient of variation and QCOD is well-documented in statistical literature, as cited in58,59.The second reason for choosing the power p relates to an undesirable correlation between the age correlation Cor(M, A) and the standard deviation of relative age, SD (R) (Supplementary Fig. 2). Our simulation studies suggest that this positive correlation results from an imperfect sample ascertainment/study design. This can be mitigated against by choosing p so that the adjusted age correlation Adj.Cor(M∣R, p) exhibits a weaker correlation with SD (R). In the mammalian data, p = 0.25 leads to a non-significant correlation between Adj.Cor(M∣R, p) and SD (R) (Supplementary Fig. 2g, h). Third, our simulation studies, designed to emulate our mammalian lifespan data, indicate that with large sample sizes per stratum, Adj.Cor(M∣R, p) converges to a value close to 1.0 for p = 0.25 (Supplementary Note 2). This value of 1.0 significantly simplifies the equations. We use simulations to study the relationship between the age correlation and the standard deviation of relative age as a function of the data ascertainment (Supplementary Fig. 17). Further, we explore the effect of the adjustment power p in Supplementary Fig. 18. The coefficient of variation displays a U-shape when the power increases, hence a minimum is achievable. The optimal adjustment power is achieved at 0.25 for most cases. Overall, these results suggest that p = 0.25 is a good choice for our mammalian methylation study.Relation between AROCMyoung and AROCMold
Here, we provide an outline on how to derive a relationship between the rate of change in young animals and that in older ones in the s-th species-tissue stratum, i.e.,$${{{{\rm{AROCM}}}}}_{young}^{(s)}\,=\,c\, * \, {{{{\rm{AROCM}}}}}_{old}^{(s)}.$$
(28)
where c denotes a constant. We start out by commenting on our definition of relative age. When dealing with prenatal samples (whose chronological ages take negative values), it can be advantageous to slightly modify the definition of relative age as $R=\frac{A+GT}{L+GT}$, by including gestation time (GT) to avoid negative relative ages. For simplicity, we will assume that our data only contains postnatal samples, allowing us to define relative age as $R=\frac{A}{L}$. Empirically, we find that the non-linear relationship between ScaledM and relative age in each stratum can be approximated using the following function:$${{{{\rm{ScaledM}}}}}_{i}^{(s)}= f(R;\gamma )\\= {\gamma }_{0}^{(s)}+{\gamma }_{1}^{(s)}g(R),$$
(29)
where ${\gamma }^{(s)}=({\gamma }_{0}^{(s)},{\gamma }_{1}^{(s)})$ are stratum-specific constants. Our empirical studies demonstrate that the following log-linear function fits the data quite well.$$g(R)=\left\{\begin{array}{ll}10R-1\quad &R \, \ge \, 0.1\\ \log (10R)\quad &R\, < \, 0.1\end{array}\right.$$
(30)
Note that the first derivative of g() is given by$$g^{\prime} (R)=\left\{\begin{array}{ll}10\quad &R\, \ge \, 0.1\\ 1/R\quad &R\, < \,0.1\end{array}\right.$$
(31)
Assuming a linear relationship between ScaledM and A (equation (3)) and a suitably chosen midpoint A0, one can approximate AROCM as follows$${{{\rm{AROCM}}}}= \frac{{{\Delta }}{{{\rm{ScaledM}}}}}{{{\Delta }}A}\\ \approx \frac{d}{dA}({{{{\rm{ScaledM}}}}}^{(s)}){| }_{{A}_{0}}\\= \frac{d}{dA}f(\frac{A}{L}){| }_{{A}_{0}}\\= \frac{d}{dR}f(R){| }_{{R}_{0}}\frac{1}{L}\\= {\gamma }_{1} * g^{\prime} ({R}_{0})\frac{1}{L}$$
(32)
where R0 = A0/L represents the relative age of a young or old individual, and we used the chain rule of calculus. We define the AROCM in young and old animals as the first derivative evaluated at Ayoung and Aold, respectively. These ages should be chosen so that the corresponding relative ages Ryoung and Rold take on values <0.1 and >0.1, respectively. With equations (31) and (32), we find$${{{{\rm{AROCM}}}}}_{young}\,= \,{\gamma }_{1}\times \frac{1}{{R}_{young}L}\\ {{{{\rm{AROCM}}}}}_{old}\,= \, {\gamma }_{1}\times \frac{10}{L}$$
(33)
With superscripts denoting the s-th species-tissue stratum, it implies the following linear relationship between the two aging rates$${{{{\rm{AROCM}}}}}_{young}^{(s)}=\frac{1}{10{R}_{young}^{(s)}}\times {{{{\rm{AROCM}}}}}_{old}^{(s)}.$$
(34)
Since the young groups for all strata are defined with the same cutoff of R = 0.1, ${R}_{young}^{(s)}$ would take similar values across all strata, which implies that ${{{{\rm{AROCM}}}}}_{young}^{(s)}=c\times {{{{\rm{AROCM}}}}}_{old}^{(s)}$. Empirically, we can verify the latter relationship (Fig. 4). Across species-tissue strata, we find that c has a mean value of 7.33 and a standard deviation of 6.8.There is an analogous relationship between AROCMyoung and AROCMold when a different function g is used. For instance, when the function ${g}_{2}(R)=\log (10R)$ (for all values of R) is used, we can derive the relationship$${{{{\rm{AROCM}}}}}_{young}^{(s)}=\frac{{R}_{old}^{(s)}}{{R}_{young}^{(s)}}\times {{{{\rm{AROCM}}}}}_{old}^{(s)}$$Consequently, ${{{{\rm{AROCM}}}}}_{young}^{(s)}$ is still proportional to ${{{{\rm{AROCM}}}}}_{old}^{(s)}$, provided that the ratio $\frac{{R}_{old}^{(s)}}{{R}_{young}^{(s)}}$ remains approximately constant across all strata. We compared g(R) and g2(R) in our mammalian data as shown in Supplementary Fig. 19. The median correlation across all species is the highest using g(R) (r = 0.76), compared to the original relative age (r = 0.73) and g2(R) (r = 0.74).Reporting summaryFurther information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Fundamental equations linking methylation dynamics to maximum lifespan in mammals

Transcriptional profiling in microglia across physiological and pathological states identifies a transcriptional module associated with neurodegeneration

Arkitekt: streaming analysis and real-time workflows for microscopy

AI’s international research networks mapped

Rise of ChatGPT and other tools raises major questions for research

Search and match across spatial omics samples at single-cell resolution

Hot Topics

Transcriptional profiling in microglia across physiological and pathological states identifies a transcriptional module associated with neurodegeneration

Arkitekt: streaming analysis and real-time workflows for microscopy

AI’s international research networks mapped

Related Articles

Balancing Act: Pregnancy and Bipolar Disorder

Cohesion at the cellular level: flexible yet stable

Gut bacteria influence responses to immunotherapy in patients with asbestos related cancer

Quick Links

Must Read

Transcriptional profiling in microglia across physiological and pathological states identifies a transcriptional module associated with neurodegeneration

Arkitekt: streaming analysis and real-time workflows for microscopy

AI’s international research networks mapped

Rise of ChatGPT and other tools raises major questions for research

Popular Articles

Transcriptional profiling in microglia across physiological and pathological states identifies a transcriptional module associated with neurodegeneration

Arkitekt: streaming analysis and real-time workflows for microscopy

AI’s international research networks mapped