Identification of shared gene expression programs activated in multiple modes of torpor across vertebrate clades

StrokeofGenus enables cross-comparison of bulk RNA-seq torpor gene expression patterns across non-model speciesWe established an analysis pipeline able to reconstruct patterns of gene expression within bulk RNA-seq datasets, identify gene orthology across de novo transcriptomes, and compare torpor gene expression patterns across non-traditional model organisms (Fig. 1A). First, bulk RNA-seq datasets for each species were fed into the de novo transcriptome assembly program Trinity22. From here, the pipeline bifurcates. Along one track, we calculated gene expression for each sample using Trinity’s native RSEM capability and the de novo transcriptome23. Gene expression information for each species was then fed to the non-negative matrix factorization R package CoGAPS17 to reveal the underlying structure of the data, including time point and tissue-specific patterns of gene expression. Along the other track, to obtain ortholog information across species, the de novo transcriptome was translated using Transdecoder25, and each species’ predicted proteome was fed into the orthology program OMA standalone26. OMA standalone identified the most similar orthologs for every gene across the species in question. Last, we input the OMA orthology information and CoGAPS-identified gene expression patterns into the transfer learning R package ProjectR18 to quantify the presence and strength of gene expression patterns across species.Fig. 1Overview of project design. (A) Flowchart for a novel analysis pipeline to reconstruct gene expression patterns and identify shared gene expression across species. The key software used in each step of the analytic pipeline is listed above each arrow. Table of datasets included in the study showing (B) time points and (C) tissues contained in each dataset. Cells for IBA-associated time points have been greyed out in species that do not undergo IBA.Multiple approaches have been proposed to compare gene expression across species1,16,34 (Table 1). Unlike prior approaches, StrokeofGenus has both an adaptable pipeline, increasing usability, and has demonstrated utility to identify shared gene expression across species. It applies a de novo transcriptome assembly approach, enabling the analysis of non-model organisms with poor or missing reference annotations. Finally, StrokeofGenus is the only approach offering both dataset structure detection and orthology identification, and it is unique for its application of transfer learning to directly detect shared gene expression.Table 1 Comparison of usability and utility for approaches to compare gene expression across datasets from different species.StrokeofGenus is available from GitHub and includes a comprehensive vignette. The entire StrokeofGenus pipeline is initiated via the command line and can be completed using only three commands. Although each component of StrokeofGenus was independently developed, the pipeline automates filetype compatibility, file system organization, and output visualization across all tools incorporated in the pipeline, simplifying user experience.Publicly-available bulk RNA-seq datasets are available studying different forms of dormancy in species from multiple clades35,36,37. To identify shared patterns of gene expression throughout dormancy within vertebrate species, we selected a shortlist of eleven datasets in eight species with overlapping time points and tissues4,5,7,8,9,10,12,13,14,15,35 (Fig. 1B, C). The datasets represent both mammals and reptiles that undergo variations of torpor including hibernation, daily torpor, brumation, and aestivation. The sampled tissues were enriched for central nervous system tissues and metabolically important tissues such as the liver. In addition, to further facilitate intra-species comparisons using StrokeofGenus, we generated a new 13LGS RNA-seq dataset that includes tissues for which data are publicly available in three other 13LGS hibernation datasets4,10,35.Between 7,578 and 19,422 orthologous genes were identified between all species comparisons (Supplementary Table 1). The number of genes reconstructed by Trinity is driven by the number of sequencing reads provided to the program38. Accordingly, larger numbers of genes were reconstructed for species with more samples (Supplementary Table 1). Species with greater numbers of reconstructed genes also had more orthologs identified across comparisons than those with fewer reconstructed genes, although this also appears to be impacted by evolutionary distance (Supplementary Table 1).Nonnegative matrix factorization identifies tissue- and time point-specific patterns of gene expressionTo simultaneously uncover the distinct transcriptomic states and state-specific gene expression of the publicly-available hibernation RNA-seq datasets, StrokeofGenus applies the matrix factorization tool CoGAPS. CoGAPS deconvolves the patterns of coexpression that cumulatively compose the variation within a transcriptomic dataset. Each learned pattern represents phenomena that direct gene expression in a sample, including technical artifacts or biological processes (e.g. tissue type, age, or torpor state). A weight is calculated for each gene for each pattern, with a high weight indicating the gene’s expression greatly contributes to the pattern. Samples also receive a weight for each pattern, with a high weight meaning the pattern is driving a lot of expression in that sample. To attribute functionality to a pattern, we must rely on available metadata. For example, we infer that a pattern enriched in torpid samples over euthermic samples is a torpor pattern. Gene expression associated with that pattern represents torpor-specific gene expression programs.For almost all the datasets analyzed, CoGAPS was able to identify tissue and/or time point-specific patterns of gene expression (Fig. 2, S1). All datasets produced time point-specific patterns, except the monito del monte dataset, which only produced tissue-specific patterns (Fig. S1D). CoGAPS was also able to deconvolve which time points within each dataset represented distinct transcriptomic states. We directed CoGAPS to identify between two and ten patterns for each dataset’s gene expression matrix. The results for each number of patterns were visualized as a heatmap showing the enrichment for each pattern in each sample, which were used to identify the tissue and time point-specificity of each pattern and optimize pattern number. At low pattern numbers, samples were separated by tissue (Fig. S2A). As the number of patterns increased, time point and tissue/time point-specific patterns emerged (Fig. 2A). To determine how many patterns to derive from a dataset, we set the cutoff for hibernation-related patterns as the maximum number for which each pattern showed tissue and/or time point specificity but not sample specificity. If the number of patterns increased beyond this point, samples from the same time point and tissue would separate into sample-specific patterns (Fig. S2B). We reasoned that only patterns with greater signal than sample-specific gene expression noise were biologically meaningful and useful for the purposes of discovery.Fig. 2Identification of tissue and time point-specific gene expression patterns in torpor datasets. Heatmaps of tissue/time point-specific gene expression patterns in the (A) Chinese alligator dataset 1, (B) bearded dragon dataset, (C) 13LGS 4 dataset, and (D) 13LGS 2 dataset. Each row is a sample and each column is a CoGAPS-derived pattern. Select pattern-specific genes that align with prior analyses are arrayed next to the relevant patterns. All figure panels colored on the same scale with blue representing low pattern weight and red representing high pattern weight.After optimizing the number of found patterns, we determined that some samples obtained from apparently distinct phases of hibernation actually represent equivalent transcriptomic states. For instance, the period immediately following exit from torpor was not transcriptomically distinct from an euthermic time point further temporally separated from torpor in the Australian bearded dragon (Fig. 2B). We also found that the time points immediately preceding and following IBA— Arousal to IBA and Entrance to Torpor— are not distinct from Late Torpor in 13LGS (Fig. 2D). Our results corroborate the original publications’ findings, which were identified using principal component analysis (PCA) and random forest clustering techniques4,5,10, respectively. Our results demonstrate that the principal transcriptomic states characterizing hibernation in small mammals are euthermia, torpor, and IBA (Fig. 2C, D). Therefore, for following analyses, samples from Summer Active, Prehibernation, Posthibernation, and Spring Dark time points are considered euthermic and patterns found in those samples are considered euthermic patterns. Similarly, Arousal to IBA, Entrance to Torpor, and Late Torpor samples are considered torpid samples and patterns in those sample are considered torpid patterns.Tissue and time point specificity manifested differently across datasets. In some datasets, individual patterns show combined tissue and time point specificity (Fig. 2A, B). In other datasets, separate tissue-specific patterns were discovered along with distinct time point-specific patterns that were shared across tissues (Fig. 2D, S2A). We observe that the latter case occurred in 13LGS 2, a dataset composed of closely-related tissues, whereas the former occurred in datasets containing more distantly-related tissues. This is likely because more closely-related tissues share essentially the same gene expression changes while more distantly-related tissues have distinct gene expression programs during torpor.To identify genes with tissue- and state-specificity, we calculated the pattern marker genes for each pattern in each dataset (Supplementary Table 2, listed in order of pattern specificity). Each gene receives a pattern weight for each pattern. A high pattern weight for a gene reflects high expression in the samples enriched for that pattern. A gene may have high pattern weights in multiple patterns (Fig. 3B), but genes that are pattern markers only have high pattern weight in a single pattern (Fig. 3A). The top pattern-specific genes from our analysis aligned with those identified in prior analyses via pairwise comparisons, such as the circadian clock gene DBP upregulated in the Chinese alligator hypothalamus in summer8 and the transcriptional repressor TGIF2 upregulated in the torpid heart of the bearded dragon5 (Fig. 2).Fig. 3Shared gene expression patterns across datasets. (A) Diagram of the procedure of transfer learning to identify shared gene expression. Dot plots of tissue and time point-specific sharing in the 13LGS 1 dataset of patterns from the (B) 13LGS 3 (retina), (C) 13LGS 2 (brain), and (D) 13LGS 4 (liver) datasets. Dashed boxes surround shared and closely-related tissue types. Non-shared tissues are displayed at 25% opacity. Numbers to the right of each plot show the variance within that tissue. (E) Dot plots of tissue and time point-specific sharing in the 13LGS 2 dataset of patterns from the 13LGS 4 dataset. (F) Dot plots of time point-specific sharing in the 13LGS 4 dataset of patterns from the 13LGS 2 dataset. Each point represents a single sample. RPE = retinal pigment epithelium.Transfer learning reveals shared tissue and time point-specific gene expression across datasetsTo determine whether there are shared torpor gene expression patterns across datasets, StrokeofGenus applies the transfer learning R package ProjectR18. ProjectR takes as input gene weights from a pattern learned in one dataset and tests for their enrichment in the samples of a second dataset (Fig. 3A). A higher ProjectR score means a sample’s gene expression profile is similar to the tested pattern. If a pattern from a tissue and/or time point in one dataset shows enrichment in the equivalent samples of another dataset, this indicates that the datasets have shared gene expression programs. As proof of principle, we first applied ProjectR to different datasets generated from the same species with overlapping tissues. For this purpose, we used our 13LGS 1 dataset, which profiles multiple tissues, as a scaffold to compare with other 13LGS datasets that profile only one or a few tissues.We first explored shared time point- and tissue-specific patterns of gene expression between the 13LGS 3 and 13LGS 1 datasets (Fig. 3B, S4C). The 13LGS 3 includes only retinal tissue. Retinal euthermic Pattern 14 and torpid Pattern 11 found in the 13LGS 1 dataset (Fig. S1A), showed significant enrichment in the euthermic and torpid samples, respectively, in the 13LGS 3 dataset (Fig. S4C, Pattern 14 P = 3.3e-4, Pattern 11 P = 4.6e-3, student’s t-test). Similarly, torpid Pattern 2 from the 13LGS 3 dataset (Fig. S1G) demonstrated the greatest variance in neural-related tissues in 13LGS 1, highlighting the tissue-specificity of this pattern across datasets. It also shows significant separation of torpid samples in the retina (Fig. 3B, torpor/euthermia P = 2.5e-3, torpor/3 days post hibernation P = 5.7e-3, torpor/14 days post hibernation P = 7.7e-3), while the euthermic Pattern 1 showed enrichment in the euthermic Posthibernation retinal samples in 13LGS 1 (Fig. 3B days post hibernation/torpor P = 3.7e-3, 14 days post hibernation/torpor P = 8.4e-3). We found similar shared gene expression for both 13LGS 2 and 13LGS 4 with 13LGS 1.We further compared the 13LGS 2 dataset, which contains only brain tissue samples, with the 13LGS 1 dataset (Fig. 3C, S4D). Pattern 6 in the 13LGS 2 dataset, which is enriched in torpid samples (Fig. 2D), showed greater variance within the four neural-related tissues than the liver samples in the 13LGS 1 dataset (Fig. 3C). In the hypothalamus, Pattern 6 showed significant enrichment in torpid samples relative to euthermic time points (torpor/euthermia P = 0.014, torpor/3 days post hibernation P = 0.031, torpor/14 days post hibernation P = 0.022). Similarly, Pattern 5, which is enriched in euthermic samples (Fig. 2D), demonstrated greatest variance in neural-related tissues and showed significant segregation between euthermic Posthibernation time points and torpid time points in the 13LGS 1 hypothalamus samples (Fig. 3C days post hibernation/torpor P = 5.1e-4, 14 days post hibernation/torpor P = 3.5e-3). Pattern 7 from the 13LGS 1 dataset, which was enriched in the euthermic 14 days post hibernation hypothalamus samples (Fig. S1A), was also enriched in euthermic samples in the hypothalamus of the 13LGS 2 dataset, while the torpid Pattern 5 (Fig. S1A) showed enrichment in the torpid 13LGS 2 hypothalamus samples (Fig. S4D, Pattern 7 euthermia/torpor p < 2.2e-16, Pattern 5 euthermia/torpor p < 2.2e-16).The 13LGS 4, which includes only liver tissue, and 13LGS 1 datasets also demonstrate shared gene expression patterns (Fig. 3D, S4E). Pattern 2 from the 13LGS 4 dataset is associated with torpid samples (Fig. 2C). When projected into the 13LGS 1 dataset, this pattern showed greatest variance within liver samples and significant enrichment in torpor liver samples relative to euthermic time points (Fig. 3D, torpor/euthermia P = 3.2e-3, torpor/3 days post hibernation P = 2.7e-3, torpor/14 days post hibernation P = 1.1e-3), consistent with a shared torpor-specific gene expression program. Gene Pattern 1 of the 13LGS 4 dataset is enriched in euthermic time points (Fig. 2C). Projection of pattern 1 into the 13LGS 1 dataset also demonstrated greatest variance in liver samples and significant separation of active liver samples from torpid samples (Fig. 3D, euthermia/torpor P = 5.1e-6, 3 days post hibernation/torpor P = 2.8e-3, 14 days post hibernation/torpor P = 1.8e-3). Though ProjectR identified differences in gene expression between time points in the liver, CoGAPS analysis of the 13LGS 1 dataset only produced a general liver Pattern 20 that did not identify any time point-specific gene expression (Fig. S1A). This demonstrates that ProjectR is able to discern gene expression differences between samples when other sensitive methods are unable to do so. Interestingly, when pattern 20 of the 13LGS 1 dataset was projected into the 13LGS 4 dataset, it was enriched in euthermic time points (Fig. S4E, post hibernation/entrance to torpor P = 0.016). These results cumulatively demonstrate that StrokeofGenus can identify shared gene expression programs across multiple datasets generated from different laboratories.To demonstrate that the functionality of StrokeofGenus is not limited to mammals, we additionally compared two datasets from the Chinese alligator and found shared euthermic and torpor gene expression for each tissue shared between the two datasets (hypothalamus, skeletal muscle, and liver), though the limited number of replicates prevents statistical comparisons (Fig. S4A, B). These results further demonstrate the ability of ProjectR to identify shared gene expression programs despite disparate collection times.We also found shared gene expression patterns across tissues within the 13LGS. For example, the euthermic brain Pattern 5, torpid Pattern 6, and IBA Pattern 7 found in the 13LGS 2 (Fig. 2D) demonstrate significant enrichment in the matching time points of the 13LGS 4 (Fig. 3E, Pattern 5 euthermia/entrance to torpor P = 1.1e-4, Pattern 6 entrance to torpor/euthermia P = 1.3e-3, Pattern 7 IBA/entrance to torpor P = 6.2e-4). Similarly, projection of euthermic Pattern 1, torpid Pattern 2, and IBA Pattern 3 liver patterns into the 13LGS 2 dataset demonstrated coordinated upregulation in the corresponding time points in the forebrain (Fig. 3F, Pattern 1 euthermia/torpor P = 1.1e-4, Pattern 2 torpor/euthermia P = 1.6e-4, Pattern 3 IBA/torpor P = 2.1e-4). Thus, within the same species, patterns of gene expression in hibernation are shared not only across datasets but also across tissues.Torpor gene expression programs are shared across species and different forms of torporTransfer learning provides an opportunity to quantify the degree of sharing of torpor gene expression patterns across species and even across forms of torpor. Therefore, we used ProjectR in StrokeofGenus to test for gene expression conservation in the most common tissues across datasets–brain, liver, skeletal muscle, and white adipose–representing different taxonomic groups and forms of torpor.To determine whether we could identify shared torpor gene expression patterns between species, we compared liver torpor expression between the grizzly bear and the 13LGS (most recent common ancestor (MRCA) ~ 90 million years ago (MYA), Fig. S5A), both of which hibernate. Pattern 3 from the grizzly dataset showed significant enrichment in torpid liver samples, whereas Pattern 2 is enriched in euthermic liver samples (Fig. S1C). When projected into the 13LGS 4 dataset, these two grizzly patterns showed significant enrichment in the torpid and euthermic 13LGS samples, respectively (Fig. 4A, Pattern 3 entrance to torpor/euthermia P = 3.6e-3, Pattern 2 euthermia/entrance to torpor P = 1.8e-4). We performed the reciprocal comparison, projecting the 13LGS 4 dataset into the grizzly dataset, and found that Pattern 2, enriched in torpid time points, and Pattern 1, enriched in euthermic time points (Fig. 2C), showed enrichment in the corresponding grizzly liver samples (Fig. 4B, Pattern 2 euthermia/torpor P = 0.016, Pattern 1 euthermia/torpor P = 9.2e-4). Pattern markers from the source dataset with orthologs in the target dataset showed matching expression in the target dataset (Fig. S5B, C). Fig. 4Shared gene expression patterns across species. (A) Dot plots displaying time point-specific sharing in the 13LGS 4 dataset of torpor and euthermic patterns from liver samples in the grizzly dataset. (B) Dot plots displaying time point-specific sharing in the liver samples of the grizzly dataset of torpor and euthermic patterns from the 13LGS 4 dataset. (C) Dot plots displaying time point-specific sharing in the white adipose tissue (WAT) samples of the Chinese alligator 2 dataset of torpor and euthermic patterns from the WAT samples of the grizzly dataset. (D) Dot plots displaying time point-specific sharing in the WAT samples of the grizzly dataset of torpor and euthermic patterns from the WAT samples of the Chinese alligator 2 dataset. (E) Dot plots displaying time point-specific sharing in the hypothalamus samples of the Chinese alligator 2 dataset of torpor and euthermic patterns from the brain samples of the bearded dragon dataset. (F) Dot plots displaying time point-specific sharing in the brain samples of the bearded dragon dataset of torpor and euthermic patterns from the hypothalamus samples of the Chinese alligator 2 dataset. (G) Dot plots displaying time point-specific sharing in the skeletal muscle samples of the Chinese alligator 1 dataset of torpor and euthermic patterns from the skeletal muscle samples of the bearded dragon dataset. (H) Dot plots displaying time point-specific sharing in the skeletal muscle samples of the bearded dragon dataset of torpor and euthermic patterns from the skeletal muscle samples of the Chinese alligator 1 dataset. Each point represents a single sample. Dot plots displaying the enrichment of biological process gene sets in torpor (T) and euthermic (E) patterns in (I) liver in 13LGS 3 and grizzly, (J) in adipose in Chinese alligator 2 and grizzly, (K) in hypothalamus/brain in Chinese alligator 2 and bearded dragon, and (L) in muscle in Chinese alligator 1 and bearded dragon. The X-axis corresponds to the CoGAPS pattern and the Y-axis corresponds to the gene set. Dot size reflects the normalized effect size (NES) and the shade the -log(p-value) of enrichment.We also found that Pattern 3 and Pattern 2 from the grizzly dataset were enriched in the equivalent time points in bat liver (MRCA ~ 80 MYA, Fig. S5A, S6B), though small sample numbers prevented statistical comparison, and Patterns 4 and 5 in the bat showed equivalent enrichment in the liver samples of the grizzly (Fig. S6A, Pattern 4 euthermia/torpor P = 2.0e-8, Pattern 5 torpor/euthermia P = 0.014). This demonstrates the shared torpor gene expression across divergent mammalian species.To identify whether shared torpor gene expression patterns are detectable over greater taxonomic distances, we compared white adipose tissue in the Chinese alligator 2 to the grizzly (MRCA ~ 320 MYA, Fig. S5A). When projected into the Chinese alligator 2 dataset, euthermic adipose Pattern 4 from the grizzly (Fig. S1C) showed enrichment in euthermic adipose samples (Fig. 4C). Similarly, projection of the torpid grizzly Pattern 5 (Fig. S1C) into the Chinese alligator 2 dataset demonstrated segregation between torpid and euthermic replicates (Fig. 4C), though small sample numbers prevented statistical comparisons in either case. Programs associated with euthermic adipose tissue in the Chinese alligator 2 dataset, Pattern 15 (Fig. S1B), show enrichment in euthermic adipose tissue in grizzly (Fig. 4D, euthermia/torpor P = 0.012). Taken together, these results suggest shared gene expression programs in torpor across vertebrate classes and hundreds of millions of years of evolutionary divergence.To determine if distinct forms of torpor also share similar gene expression programs, we compared the Chinese alligator, which enters torpor in response to cold, to the bearded dragon, which enters torpor in response to extreme heat (MRCA ~ roughly 280 MYA, Fig. S5A). Euthermic and torpid brain patterns from bearded dragon showed enrichment in Chinese alligator 2 euthermic and torpid brain samples, respectively, though sample number precluded statistical comparisons (Fig. 4E). Similarly, euthermic and torpid hypothalamus patterns 4 and 5 from the Chinese alligator 2 dataset showed significant enrichment in the euthermic and torpid samples of the bearded dragon brain, respectively (Fig. 4F, Pattern 4 two days post torpor (2D)/torpor P = 0.017, two months post torpor/torpor P = 0.020, Pattern 5 torpor/2D, P = 0.048, torpor/2M P = 0.082). Shared gene expression was also found between Chinese alligator brain patterns 2 and 3 and bearded dragon brain, though the separation between time points did not rise to statistical significance (Fig. S6E). Shared torpor-specific gene expression between Chinese alligator and bearded dragon is also apparent in skeletal muscle (Fig. 4G, H, Chinese alligator 1 dataset Pattern 3 2D/torpor P = 3.1e-3, 2 M/torpor P = 1.2e-4, Pattern 4 torpor/2D P = 0.017, torpor/2M P = 3.1e-4). We found greater sharing for patterns where conserved genes had greater pattern weight, meaning comparisons can be hampered over large taxonomic distances (Fig. S5D, E). Not only is torpor gene expression shared across species and over large taxonomic distances, but across forms of torpor employed under opposite environmental conditions, such as extreme heat and cold.We further found that Patterns 1 and 2 from the Djungarian hamster hypothalamus, which undergoes daily torpor, showed time point-specific enrichment in the hypothalamus of Chinese alligator 1, which undergoes brumation, though the limited number of samples precludes statistical analysis (MRCA ~ 320 MYA, Fig. S5A, S6C). Though the Chinese alligator euthermic hypothalamus Pattern 1 did not show significant time point enrichment in hamster hypothalamus, the torpid hypothalamus Pattern 2 was significantly enriched in torpid over euthermic hamster hypothalamus (Fig. S6D, Pattern 2 torpor/euthermia P = 0.0276). Torpor gene expression is shared across forms of torpor with very different torpid bout lengths.To discern which biological processes comprise shared gene expression programs, we applied gene set enrichment analysis (GSEA). We found enriched biological processes that agree with known phenotypes in torpor. Enrichment for gene sets was calculated for gene expression patterns within each species (Supplementary Table 11), then gene sets with similar enrichment were identified across species. As expected, mRNA production was downregulated in torpor across species and tissues2 (Fig. 4I, J, L). Metabolic functions were upregulated in euthermic liver for both 13LGS and grizzly2 while immune cell development was selectively downregulated in torpid samples39 (Fig. 4I). Adipose function was upregulated during euthermia in both grizzly and Chinese alligator 22 (Fig. 4J). As has been found in the Yakut ground squirrel brain40, actin fiber assembly was upregulated in both grizzly and Chinese alligator torpid adipose samples (Fig. 4J). Similar to the results in liver and adipose, terms relating to tissue function were enriched in euthermic muscle in Chinese alligator 1 and bearded dragon (Fig. 4L). Torpid animals retain muscle mass despite long periods of inactivity2 and in both Chinese alligator 1 and bearded dragon genes controlling maintenance of cell number were upregulated in torpid muscle samples (Fig. 4L). Synapses retract during torpor before being rapidly regrown following a return to euthermia3. In both Chinese alligator 2 and bearded dragon brain samples, axon extension is upregulated in euthermic samples (Fig. 4K). Interestingly, oligodendrocyte differentiation is also enriched in the euthermic brain in both Chinese alligator 2 and bearded dragon (Fig. 4K). Demyelination of the hippocampus with the oligodendrocyte toxin cuprizone directs neurons to a dormant, axon-protective state in mice41. Not only do diverse species that employ different forms of torpor show shared torpor gene expression programs, but these genes also regulate shared torpor-related biological processes.Interestingly, metabolic aspects of the form of torpor reflect differences in gene expression. The grizzly bear is a fat-storing hibernator, meaning its white adipose tissue performs lipid catabolism throughout torpor. In contrast, the Syrian hamster is a food-storing hibernator, whose white adipose tissue performs both lipid anabolism and catabolism during torpor. Differences in gene expression in white adipose tissue between food- and fat-storing hibernators has previously been found12. To assess whether these gene expression differences constitute programmatic differences, we compared torpor gene expression patterns between Syrian hamster and grizzly bear white adipose tissue (MRCA ~ 90 MYA, Fig. S5A). Though the Syrian hamster euthermic Pattern 1 showed no time point enrichment in the grizzly adipose, the torpid Pattern 3 showed enrichment in euthermic time points over torpid time points (S6F, euthermia/torpor P = 0.0041). Similarly, the Grizzly euthermic adipose Pattern 4 shows no time point enrichment in Syrian hamster, but the Grizzly torpid adipose Pattern 5 showed enrichment in euthermic hamster time points (S6G, euthermia/torpor P = 0.17, IBA/torpor P = 0.0025). In sum, we found opposite pattern enrichment between torpor and euthermic time points in grizzly bear and Syrian hamster white adipose. The Syrian hamster dataset only covers white adipose tissue, so we cannot determine whether gene expression programs are shared between fat- and food-storing hibernators in other tissues.Reference genome annotation produces qualitatively equivalent results to de novo transcriptomeThe rate of production of high-quality reference genomes for nontraditional model organisms has rapidly increased42,43, though some torpor model organisms still lack a reference genome44,45,46,47. StrokeofGenus includes the functionality to use a reference genome rather than generate a de novo transcriptome, which we demonstrate in 13LGS (GCF_016881025.1). An average of 33% more orthologs were identified between 13LGS and classical model organisms using the genome (Table S1). We hypothesize that this is because the genome does not have the same constraints surrounding tissue number and read depth as the de novo transcriptome. Nonnegative matrix factorization identified equivalent gene expression patterns for each dataset tested and transfer learning between 13LGS dataset 2 and the grizzly bear liver found similar shared pattern enrichment between genome and de novo transcriptome (Fig. 2D, E, Fig. S1G, Fig. S7A-E).Artificially-induced torpor demonstrates shared gene expression with natural torporMultiple groups have developed approaches to induce torpor in non-heterotherm species48,49. Li et al. induced torpor in the MCF7 human cell line by inhibiting the chloride channel cystic fibrosis transmembrane regulator (CFTR) and analyzed the effects of their treatment using RNA-seq. We applied StrokeofGenus to interrogate whether torpor induced by CFTR inhibition phenocopies naturally-occurring torpor.We identified three patterns corresponding to untreated, 1-hour, and 24-hour GlyH-101-treated MCF7 samples (Fig. 5A). The MCF7 cell line is derived from breast ductal adenocarcinoma cells48. Breast ductal cells derive from stromal adipocyte progenitor cells50, so we searched for shared gene expression between the MCF7 dataset and adipose samples from the grizzly dataset. Grizzly Pattern 4, found in euthermic adipose, showed significant enrichment in untreated MCF7 samples over 24 h-treated samples (P = 0.0080), and grizzly Pattern 5, found in torpor adipose, showed significant enrichment in 24 h-treated MCF7 samples over untreated (P = 0.013) (Fig. 5B). Similarly, MCF7 Pattern 1, found in untreated samples, showed significant enrichment in euthermic over torpid grizzly adipose (Pattern 1 euthermia/torpor P = 3.5e-5), and MCF7 Pattern 3, found in 24-hour treated samples, showed significant enrichment in torpid over euthermic grizzly adipose (Pattern 3 torpor/euthermia P = 6.5e-5) (Fig. 5C). These findings cumulatively demonstrate that torpor induced by CFTR inhibition in MCF7 cells shows shared gene expression with natural torpor in grizzly adipose, suggesting that CFTR inhibition successfully phenocopies in vivo torpor.Fig. 5Shared gene expression between artificially-induced and natural torpor. (A) Heatmap of treatment-specific gene expression patterns in the MCF7 dataset. Each row is a sample and each column is a CoGAPS-derived pattern. Blue represents low pattern weight and red represents high pattern weight. (B) Dot plots displaying treatment-specific sharing in the MCF7 dataset of torpor and euthermic patterns from adipose samples in the grizzly dataset. (C) Dot plots displaying time point-specific sharing in adipose samples in the grizzly dataset of torpor and euthermic patterns from the MCF7 dataset. Each point represents a single sample.Limitations of the studyStrokeofGenus relies on the identification of one-to-one gene orthology. This approach simplifies the identification of shared gene expression, but may lead to loss of signal and nuance especially in circumstances of species with genome expansion events.StrokeofGenus was designed to use de novo transcriptomes. In this approach, gene reconstruction and subsequent ortholog discovery are impacted by sample tissue type diversity and sequencing depth. However, StrokeofGenus is also able to use reference genomes which will not be impacted by those same factors.Shared expression pattern discoverability by StrokeofGenus is heavily dependent on study design. The determination of the correct number of patterns can be impacted when there are limited biological replicates. For instance, the Chinese alligator 2 dataset has only two replicates per condition and some sample-specific patterns were found before all condition-specific patterns were determined (Fig. S1B). Best practice is to generate outputs for a few patterns beyond when the first sample-specific pattern is identified. If more condition-specific patterns resolve, the sample-specific pattern can be disregarded because it is independent of condition-specific patterns and therefore does not impact the identification of condition-specific genes. Confidence in a pattern’s identity can be amplified by including a robust number of biological replicates (Fig. 2, Fig. S1).Genes that are specifically downregulated in a transcriptomic state can be extracted from pattern weight outputs of CoGAPS. However, CoGAPS does not include a function to specifically generate a list of significant pattern-specific downregulated genes, so users would have to manually extract them from the CoGAPS outputs.Also, StrokeofGenus has no function to directly determine which genes are driving the sharing of patterns across species. A strong inference can, however, be made by first identifying the global sharing of a pattern in the target dataset, and then identifying pattern markers whose expression in the target dataset aligns with that sharing (Fig. S5B, C). This limitation could be eliminated in the future by the development of computational techniques to discern projection drivers. Overall, however, these limitations of StrokeofGenus are readily surmounted with robust study design.

Hot Topics

Related Articles