Quantification of circadian rhythms in mammalian lung tissue snapshot data

Correlations between circadian expressed genes contain information about the phase differences between their oscillating transcript levelsFigure 1Circadian gene expression levels form circles in gene expression space (A) Circadian oscillations of PER3 (red), ARNTL (blue) and RORC (light pink). There is a 6-h phase difference between PER3 and RORC and a 12-h phase difference between PER3 and ARNTL. Blue bars mark subjective night, white bars mark subjective dawn, red bars mark subjective day and light pink bars mark subjective dusk. (B) Negative correlation between PER3 and ARNTL. (C) Positive correlation between PER3 and TEF. (D–F) Circular relationship between PER3, RORC and ARNTL. The colors in B-F indicate sample collection times. All plots show gene expression data of mouse lung tissue from Zhang et al. as log2(counts).In this study we compare four gene expression datasets (Table 1): two lung datasets from murine multi-organ studies25,26, one lung dataset from a multi-organ study of baboon27 and data of healthy and cancerous lung rissue from the TCGA-LUAD study. The datasets of Zhang, Wolff and Mure are gene expression time series of 12 (Mure and Wolff) or 24 (Zhang) lung tissue samples. These samples were collected throughout the day and are annotated with their sampling times. In contrast, the TCGA-LUAD study did not have a circadian focus and the samples do not have any time labels. It includes 51 normal samples and 58 tumor samples, each normal sample is matched with at least one tumor sample of the same patient.Figure 1A shows the circadian rhythms of PER3, ARNTL and RORC transcript levels over two days in mouse lung gene expression time series data from Zhang et al.25. The transcript levels of PER3, ARNTL and RORC oscillate with a 24-h period and their circadian transcript level oscillations have stable amplitudes. Within each 24-h period, there are fixed time differences, referred to as phase differences (\(\Delta \varphi\)), between the times of the transcript level peaks. The phase differences are gene pair specific. For example, there is an approximate 12-h phase difference between PER3 and ARNTL and an approximate 6-h phase difference between PER3 and RORC (Fig. 1A).The phase differences between circadian expressed genes are linked tightly to the Spearman rank correlation between their transcript levels (\(r_{S}\)) and this \(\Delta \varphi\)-\(r_{S}\)-relationship generally applies to pairs of sine curves (Supplementary Fig. S1). Circadian expressed genes whose transcript levels oscillate with an approximate 12-h phase difference, e.g., PER3 and ARNTL, have a negative linear relationship of their transcript levels (Fig. 1B) with a strong anti-correlation (\(r_{S} \approx\) − 0.97). Likewise, circadian expressed genes whose transcript levels oscillate with an approximate zero-hour phase difference, e.g., PER3 and TEF, have a positive linear relationship of their transcript levels (Fig. 1C) and a strong positive correlation (\(r_{S} \approx 0.98\)).There are also several circadian expressed genes whose transcript levels oscillate with intermediate phase differences between zero hours and 12 h. This results in ellipsoid transcript level relationships with low correlation values. In the special case of an approximate 6-h phase difference between the transcript level oscillations, e.g., PER3 and RORC or ARNTL and RORC, the pattern in their shared gene expression space is circular (Fig. 1D,E). Therefore, the correlations between the transcript levels of PER3 and RORC (\(r_{S} \approx 0.13\)) or ARNTL and RORC (\(r_{S} \approx\) − 0.03) are almost zero.For an intact circadian clock, the oscillating transcript levels of all circadian expressed genes form a high dimensional cycle in gene expression space. Figure 1F shows the three-dimensional cycle in the shared gene expression space of PER3, RORC and ARNTL. The cycle is colored for the collection times of the samples. Like in a phase portrait28, each location on this cycle in circadian gene expression space encodes another circadian time.Dataset-specific lists of phase-sorted circadian genes based on their transcript-level correlationsFigure 2Phase-sorted circadian genes in mouse lung (A) Normalized circadian oscillations of 100 genes that oscillate in-phase to PER3. (B) Normalized circadian oscillations of 100 genes that oscillate out-of-phase to PER3. (C) Normalized circadian oscillations of genes that oscillate with a 6-h phase difference to PER3. (D) Relationship between Spearman rank correlations and phase differences between the oscillations of PER3 and the genes in (A–C). (E,F) Relationship between the Spearman rank correlation, regression accuracies \(R^{2}\) and amplitudes of the genes from (A–C). All plots show gene expression data of mouse lung tissue from Zhang et al.The oscillating transcript levels of all circadian expressed genes within the same tissue form a high dimensional cycle in gene expression space (Fig. 1F shows a three-dimensional example from the Zhang dataset). Also gene expression samples that do not originate from gene expression time series data and are not annotated with collection times form this cycle in circadian gene expression space. The spatial position of the samples in the cycle gives information on their circadian time. We expect that the analysis of a high number of circadian expressed genes can give a good representation of the circadian cycle. For each mouse dataset and for the baboon dataset, we used the relationship of phase differences and transcript level correlations of circadian expressed genes to find species- and age-specific lists of circadian expressed genes that give a good representation of the circadian cycle.We chose PER3 as circadian starting gene because it has a reliable circadian expression19 and computed the Spearman rank correlations, \(r_{S} \in [-1, 1]\), between the transcript level of PER3 and the transcript levels of all other genes. Then we sorted all genes for \(r_{S}\) and chose the top 100 and the bottom 100 genes for dataset specific circadian gene lists. This results in two groups of circadian expressed genes in each list, which differ by their phase difference (\(\Delta \varphi\)) to PER3. The transcript levels of the bottom 100 genes oscillate out-of-phase to PER3 (\(r_{S} < -0.7\) and \(\Delta \varphi \approx 12\) h in Fig. 2B) and the transcript levels of the top 100 genes are in-phase to PER3 (\(r_{S} > 0.7\) and \(\Delta \varphi \approx 0\)h in Fig. 2A). Both gene groups contain known clock genes and genes who are known for being regulated by clock genes. The quality of the circadian rhythmicity of the transcript levels can be quantified by \(R^{2}\) of a harmonic fit, which shows that the transcript level oscillations of many genes in both gene groups are very circadian (\(R^{2} > 0.5\) for \(r_{S} \approx \pm 1\) in Fig. 2E). There are some circadian genes in the lists whose transcript level oscillations have high amplitudes (Fig. 2F).Unfortuntaly, the in-phase and out-of-phase gene groups are not sufficient for a complete representation of the different times of the circadian day29 in gene expression space. The transcript levels of the in-phase gene group peak during subjective day, while the out-of-phase genes have a transcript level trough. Thus, a temporally unlabelled gene expression sample with high transcript levels of the in-phase gene group and low levels of the out-of-phase gene group can be recognised as a sample from subjective day (red time bars in Fig. 1A). Similarly, the transcript levels of the in-phase gene group have a trough at subjective night, while the out-of-phase transcript levels peak, and a temporally unlabelled sample with this gene expression pattern can be assigned to subjective night (blue time bars in Fig. 1A). Finally, a sample with intermediate transcription levels of both gene groups can belong subjective dawn or subjective dusk. The gene expression pattern is not sufficient to distinguish between these two times of the day, which are 12 h apart from each other (white vs light pink time bars in Fig. 1A).The time assignment ambiguity can be resolved when we include additional circadian genes, whose oscillating transcript level peak 6 h (\(\Delta \varphi\) = 6h, light pink curve in Fig. 1A) after the peak in PER3 transcript level. These genes have their peak in transcript levels while the in-phase gene groups’ (\(\Delta \varphi\) = 0 h) transcript levels are falling, and they have a trough in transcript levels when the in-phase gene groups’ (\(\Delta \varphi\) = 0h) transcript levels are rising. A temporally unlabelled gene expression sample in which the transcript levels of the in-phase and out-of-phase genes are close to their mesors will have either a peak or a trough in the transcript levels of the 6-h phase difference genes and can therefore be unambiguously assigned to either subjective dawn or subjective dusk.As we have shown in Fig. 1, the addition of a 6-h-phase-difference gene group improves the representation of the circadian cycle and the circadian gene regulatory network in the dataset-specific circadian gene lists. Circadian genes whose transcript levels oscillate with a 6-h phase difference to each other have circular relationships in gene expression space, which result in almost zero correlations of the transcript levels (Fig. 1C,E). However, we cannot find genes with a six-hour phase difference to PER3 solely on the basis of a zero correlation (\(r_{S} \approx\) 0) between their transcript levels and the PER3 transcript level, because also non-circadian genes have vanishing correlations. This is why we find the 6-h-phase-difference gene group by choosing genes with low correlations of their transcript levels to the transcript levels of PER3 and ARNTL (\(-\,0.4< r_{S} < 0.4\) in Fig. 2D) and good circadian rhythm (\(R^{2} > 0.8\) in Fig. 2E) of these genes’ gene expression time series. The rhythmicity threshold excludes non-oscillatory expressed genes. The obtained \(\Delta \varphi\) = 6 h gene group consists of genes which have their peak in transcript level either 6 h before or 6 h after the peak of the PER3 transcript level (Fig. 2C,D). Figure 2D shows the generic relationships between rank correlation coefficients and phases of circadian genes. It confirms that we can infer phase relationships from the correlation patterns of those genes.We get five lists with \(\Delta \varphi\) = 0 h, \(\Delta \varphi\) = 12 h, and \(\Delta \varphi\) = 6 h gene groups for the five gene expression time series datasets of lung tissue of mouse and baboon (five supplementary Tables). Then we get two shorter circadian gene lists by choosing genes which are shared across these lists of lung genes. The short lists are then applied to study gene expression snapshots of human lung tissue.Phase differences between the transcript levels of Per3 and 12 other circadian genes are conserved in the lung tissues of mouse, baboon and humanFigure 3Comparison of conserved clock gene phase relationships in mouse and baboon lung to clock gene relationships in normal/cancerous human lung tissue (A) Relationship between the Spearman rank correlation and the phase difference between the oscillations of PER3 and our 13 circadian mammalian lung genes for gene expression time series data of mouse lung (Zhang, Wolff) and baboon lung (Mure). Marker symbols indicate gene expression datasets. (B) Spearman rank correlations of PER3 and our 13 circadian mammalian lung genes for mouse lung, baboon lung and healthy (left) and cancerous (right) human lung tissue from TCGA-LUAD. The human gene expression snapshots from TCGA-LUAD allow a calculation of Spearman rank correlations, but not of phase differences between circadian genes. Therefore the human Spearman rank correlations are plotted as lines that cross all possible phase differences. The data in (A) is repeatedly shown for comparison and the line colors correspond to the gene labels in (A).By applying our gene selection exemplified in Fig. 2, we obtained four specific lists of circadian expressed genes in mouse lung. There is one list for the data from25 and three lists for the data from26, with mice of three different age groups. For a comparison of the circadian gene regulatory networks of all mouse datasets we combined all four gene lists into a shared murine circadian gene list of 107 genes (Table 3). It includes all genes, which are present in all four murine circadian gene lists. The resulting murine circadian gene list still consists of three gene groups, whose transcript levels oscillate with three different phase differences (\(\Delta \varphi\) = 0 h, \(\Delta \varphi\) = 12 h, or \(\Delta \varphi\) = ± 6 h) to the oscillating transcript levels of PER3. In the 6-h phase difference gene group, only RORC and CRY1 were shared across all our murine circadian gene lists. The 12-h phase difference gene group contains 52 genes, which have their peak in gene expression 8.4–15.6 h before the peak in PER3 gene expression. In the zero-hour phase difference gene group are 53 genes, which have their peak in gene expression 3.3 h before to 2.5 h after the peak in Per3 gene expression (Fig. 4Supplementary Fig. S2).We wanted to get a circadian gene set, which can give information on the circadian clock also in non-murine mammalian lung tissue. Therefore, we constructed a shared mammalian circadian gene list of lung tissue of mice and baboon by choosing all homologous genes, which intersect between the shared murine circadian gene list and our circadian gene list of the baboon dataset. The resulting list consists of 13 genes (Table 3). There are eight genes (MTHFD1L, PIK3IP1, HLF, CIPC, TEF, KLF9, PFKFB3, DBP), whose transcript level peak 4.1 h before to 2.9 h after the peak in PER3 transcript level. Three genes (NPAS2, PDIA6 and ARNTL) have their peak in transcript level 7.5–18.7 h before the peak in PER3 transcript level. RORC is the only remaining gene of the 6-h phase difference gene group (Fig. 3A).Correlations between circadian genes in TCGA-LUAD snapshot data shows differences between healthy and cancerous TCGA-LUAD samplesWe compare now the mammalian circadian gene regulatory networks in mouse lung and baboon lung to the circadian gene regulatory networks of healthy and cancerous human lung tissue from TCGA. Unlike the gene expression time series data from mouse and baboon, the TCGA samples are temporally unlabelled gene expression snapshots. Without temporal annotation of the TCGA data it is impossible to compute any phase differences, \(\Delta \varphi\), of circadian transcript oscillations. However, it is still possible to compute Spearman rank correlations \(r_{S}\) between the transcript levels of the gene expression snapshots.We computed the Spearman rank correlations, \(r_{S}\), between the transcript level of PER3 and the transcript levels of the genes of the curated circadian gene list for healthy and cancerous human lung tissue samples. Then we marked these \(r_{S}\) as axis intersections in the \(r_{S}\)-\(\Delta \varphi\)-plot of the mouse data and the baboon data (Fig. 3B). When the \(r_{S}\)-line of a circadian gene is located closely to the \(r_{S}\)-\(\Delta \varphi\)-cluster of mouse and baboon of the same gene, it is likely that the \(\Delta \varphi\) of this gene is similar to the \(\Delta \varphi\) in mouse and baboon. This is the case for many \(r_{S}\)-lines of circadian genes in the healthy adjacent lung tissue samples from the TCGA-LUAD study (Fig. 3B).In the cancerous lung tissue of the TCGA-LUAD samples, the circadian clock is altered. There is a strong alteration in the circadian behaviour of ARNTL and NPAS2, which are both shifted closer towards \(r_{S} = 0\) (Fig. 3B). Additionally, RORC is shifted from \(r_{S} \approx 0.2\) to \(r_{S} \approx -0.4\). In contrast, the correlations between PER3, TEF, HLF and DBP are not altered in lung cancer (see also Supplementary Fig. S3C). If ARNTL or NPAS2 are chosen as reference gene instead of PER3, the \(r_{S}\) of all genes are densely located around \(r_{S} \approx 0\) in the cancerous TCGA-LUAD data (Supplementary Fig. S3A+B).Weak correlation pattern of circadian expressed genes in human lung cancerFigure 4Circadian gene correlations in mouse and baboon lung and normal/cancerous human lung tissue (A) Spearman rank correlations between all 107 genes of the shared murine circadian gene list for mouse lung gene expression data from Zhang et al. and Wolff et al. The complete caption of the horizontal and vertical axes is given in Table 3. (B,C) Spearman rank correlations between all 13 genes of the shared mammalian circadian gene list for gene expression data of mouse lung (from Zhang et al.), baboon lung (from Mure et al.) and TCGA-LUAD. (D) Cosine similarity of the correlation matrices of all aforementioned datasets for the shared murine circadian gene list and (E) the shared mammalian circadian gene list.Circadian correlation matrices (CCM)12 provide an overview of the pairwise correlations between the gene expressions of all genes of a given circadian gene list. As the correlations between the expressions of circadian genes are closely linked to the phase differences between their oscillations, they also provide insight into all the phase differences between these genes. The calculation of circadian correlation matrices is also possible for gene expression snapshot data, allowing us to compare the circadian expressed genes in mouse and baboon lung tissue to their homologs in human lung tissue from TCGA-LUAD. We computed circadian correlation matrices for the shared murine circadian gene list (Fig. 4A) and the shared mammalian gene list (Fig. 4B,C). We also used cosine similarities (Equation 4) to quantify the strength of the differences between the circadian correlation matrices (Fig. 4D) of the different datasets for the same gene list.The circadian gene expression matrices of the shared murine circadian gene list show a clear separation of the two large gene groups with \(\Delta \varphi\) = 0 h and \(\Delta \varphi\) = 12 h (Fig. 4A). There is a high conservation of this correlation pattern across all murine datasets. Accordingly, the cosine similarities of the CCMs of the shared murine circadian gene list are very high for all mouse datasets, while the cosine similarities between the CCMs of mouse and baboon and of mouse and human are much lower (Fig. 4D). This might indicate the presence of mouse specific circadian genes or mouse specific phase relationships with low variability between mouse datasets.For the shared mammalian circadian gene list, a pattern of distinct \(\Delta \varphi\) = 0 h, \(\Delta \varphi\) = 12 h, and \(\Delta \varphi\) = ± 6 h genes is seen in the circadian correlation matrices (Fig. 4B) of mouse and baboon, indicating a high similarity between their circadian rhythms. This pattern is similar, but altered in the circadian correlation matrix of the healthy human lung data (Fig. 4C). More specifically, the oscillatory gene expressions of MTHFD1L, KLF9 and PFKFB3 from the \(\Delta \varphi\) = 0 h gene group have potential phase shifts in healthy human lung, but the oscillations of the transcripts of PIK3IP1, HLF, CIPC, TEF, PER3, DBP, the \(\Delta \varphi\) = 12 h and the \(\Delta \varphi\) = ± 6 h genes might have similar phases as in mouse and baboon. Accordingly, the cosine similarities of the circadian correlation matrices of the shared mammalian circadian gene list are higher when healthy human lung tissue is compared to mouse and baboon than when cancerous human lung tissue is compared to mouse and baboon (Fig. 4E). However, the similarity of healthy human lung tissue to cancerous human lung tissue is higher than the similarities of healthy human lung tissue to mouse and baboon lung tissue. This can result from human-specific phase deviations of the circadian genes. Additionally, the healthy human lung tissue was collected in vicinity of the tumor tissue, which could also have increased its similarity to the tumor tissue.Overall, the phase differences between our 13 circadian genes from the gene regulatory network of the circadian clock in mammalian lung tissue have a high conservation in mouse and baboon. Many of these phase differences are also conserved in the healthy human lung tissue samples of the TCGA-LUAD study, but there is a potentially altered circadian clock in the cancerous human lung tissue samples of the TCGA-LUAD study.PCA based reconstruction of circadian gene expression rhythms in the TCGA-LUAD dataset shows altered circadian rhythms in human lung cancerFigure 5PCA-based reconstruction of circadian gene expression in mouse lung and normal/cancerous human lung tissue (A) Circular relationship between PC1 and PC2 after a PCA of the gene expressions depicted in Fig. 2. (B) Linear relationship between recorded time and \(\phi\). (C) Reconstructed and recorded oscillations of ARNTL (black line and circles) and PER3 (red line). (D) Relationship between PER3 and RORC in paired normal/tumor samples. (E) Reconstructed oscillations of ARNTL (black line and circles), PER3 (red line) and RORC (green line) in paired normal/tumor samples using the shared mammalian gene list. (A,B,C) show mouse lung data from Zhang et al. (D,E) show TCGA-LUAD data.We test the representation of the mammalian circadian clock gene regulatory network in our gene lists using Principal Component Analysis (PCA) based reconstructions of the circadian times of temporally annotated mouse and baboon gene expression samples. The principal components (PCs) of a multidimensional gene expression dataset are the combinations of gene expressions with the greatest variance.A PCA of the transcript levels of our selected dataset-specific circadian genes for the mouse dataset of25 results in a circular relationship of the first and second principal components (Fig. 5A). The spatial position of the gene expression samples on the circle is related to their collection time (colour coded in Fig. 5A+B). More specifically, there is a linear relationship between the collection time and the angle on the circle \(\phi = arctan\left( \dfrac{PC2}{PC1}\right)\) with \(\phi \in [0, 24]\)h. The reconstructed circadian rhythms of the transcript levels of ARNTL (black line in Fig. 5B) and PER3 (red line in Fig. 5B) match the recorded gene expression time series. A successful reconstruction of the circadian gene expression time series is also indicated by the fact that the reconstructed and recorded transcript levels of PER3 and ARNTL have the same phase difference \(\Delta \varphi\).We see the circular relationship of the first and the second principal component for the application of each of our circadian gene lists on the mouse datasets and the baboon dataset (Supplementary Fig. S4). The mouse data from26 includes groups of mice of three different ages. We combined them into one group to get a more heterogeneous dataset that might be more similar to human gene expression data collected in the clinic. Then we used the first and second principal component of the gene expression levels of the genes from our shared mammalian circadian gene list from this dataset to reconstruct its gene expression time series. There is a good time series reconstruction for this test case (Supplementary Fig. S5), which supports a good representation of the circadian gene regulatory network of the mammalian lung in this genelist.In order to quantify the strength of the circadian clock in lung cancer, we used the shared mammalian circadian gene list (Table 3) to assign potential circadian times to 51 temporally unannotated healthy human lung tissue samples from the TCGA-LUAD study. There are no time labels available to give a direct indication of the quality of the reconstruction. However, there is a circular relationship between the transcript levels of PER3 and RORC for an intact mammalian clock. We used \(\phi = arctan\left( \dfrac{RORC}{PER3}\right)\) as proxy for the true circadian time (left plot in Fig. 5C). The phase differences between the reconstructed gene expression time series of ARNTL (\(R^{2} \approx\) 0.76, blue line in Fig. 5D), PER3 (\(R^{2} \approx\) 0.61, red line in Fig. 5D) and RORC (\(R^{2} \approx\) 0.19, green circle in Fig. 5C) are consistent with the expected phase differences between ARNTL, PER3 and RORC for a healthy mammalian circadian clock (Supplementary Fig. S6).Next, we used the reconstructed circadian times of the normal samples from the TCGA data to temporally sort their 58 paired adjacent tumour samples. We observe that the circadian rhythms of the transcript levels of PER3 (\(R^{2} \approx\) 0.11), ARNTL (\(R^{2} \approx\) 0.11) and RORC (\(R^{2} \approx\) 0.02) are weakened in lung cancer (Supplementary Fig. S6). Additionally, the phase difference between the circadian rhythms of the transcript levels of PER3 and RORC is increased by 8 h in the cancerous lung tissue. This indicates an alteration of the gene regulatory network of the circadian clock in lung cancer.

Hot Topics

Related Articles