Evolution and stress response potential of the plant splicing factor U1C

Sequence identification and phylogenetic analysis of U1C in plantsA BLAST search using the Arabidopsis U1C protein sequence was performed to identify other plant U1C genes in Phytozome. After filtering for sequences without the U1C (C2H2-type zinc finger) motif or truncated sequences, the remaining 114 putative U1C sequences from 72 plant species were obtained, including eight algae, one basal angiosperm, three bryophytes, 14 monocots, and 46 eudicots. Three species contained four copies of the U1C gene, six species contained three copies of the U1C gene, 21 species contained two copies of the U1C gene, and 42 species obtained one U1C candidate gene (Table S1). We also constructed phylogenetic trees for the U1A and U1-70 K genes of these 72 species for comparison with U1C (Tables S2, S3).No more than four copies of the U1C gene were identified in any of the species. Notably, Four copies of U1C were obtained in Kalanchoe laxiflora (Milky Widow’s thrill) and Gossypium hirsutum (Upland cotton), which coincide with recent polyploidy events17. Three copies of this gene have been observed in two Brassicaceae species: cabbage and turnip mustard. One or two copies of the gene have been observed in common plant species, including bryophytes (moss and liverwort), monocots (switchgrass, sorghum, and maize), and dicots (sunflower, cashew, carrot, apple, potato, tomato, cassava, medicago, chickpea, cowpea, flax, cotton, and olive trees). As expected, all algal species contained only one U1C gene. Other common plants with a single copy include monocots (rice, millet, and barley) and dicots (lettuce, Arabidopsis, Capsella, papaya, cucumber, castor bean, common bean, red clover, cacao, strawberry, orange, peach, Populus, grape). Unlike U1C, the U1A and U1-70 K genes have no more than three copies. There were differences in the copy numbers of U1A and U1-70 K genes in each species compared with U1C. For example, in algae, Botryococcus braunii has two copies of the U1A gene, Porphyra umbilicalis has two copies of the U1-70 K gene, rice has only one copy of the U1C gene and two copies of the U1A gene, and barley has only one copy of the U1C gene and two copies of the U1-70 K gene. This suggests that only U1C has uniquely different copies compared to other U1 proteins, including U1A or U1-70 K. Phylogenetic trees were constructed for U1C, U1A, U1-70 K. All sequences clustered into five major clades exhibiting general plant phylogenetic classifications: algae (yellow), bryophytes (green), monocots (pink), basal angiosperms (purple), and dicots (purple). The phylogenetic tree displays a clear topology and high bootstrap values for each clade (dark pink) (Figs. S1–S3 and the leftmost panel in Fig. S4). In addition, the yellow branch (algae) was the base part of the phylogenetic tree and was far from the remaining three branches, indicating that it was distant from the U1 family of genes in other plants. One sequence each from Arabidopsis halleri and Arabidopsis thaliana formed a small clade at the top of the tree, along with other Brassicaceae species (Brassica and Capsella), consistent with their close phylogenetic relationships. In contrast, the U1 family of genes in the model monocotyledon (Oryza sativa, rice) clustered with other monocotyledons. All sequences with two or three sequences from the same species tended to cluster closely together in the same subclade.Comparative analysis: gene structure and DNA motifsAnalysis of the evolutionary path of the plant U1C gene and determination of its conserved function require a comparison of its gene structure and conserved gene motifs. The gene structure model and the corresponding conserved motifs were connected to a phylogenetic tree (Fig. S4). To facilitate observation of gene structure, we constructed the longest coding sequence of the U1C gene. Surprisingly, most of the U1C sequences exhibited four-exon-three intron organization (Fig. S4, middle panel), indicating that their gene structure is highly conserved in plant genomes. This suggests that U1C genes are functionally conserved in most species. However, a few exceptions were observed. Two maize U1C sequences contained eight and five exons, and one sorghum U1C sequence contained six exons. However, Brassicaceae has the smallest U1C gene structure among plant species. The largest variation among the subclade sequences was observed among algae. The Porphyra umbilicalis U1C sequence has one exon and no introns. In contrast, the U1C sequences of Chlamydomonas reinhardtii and Volvox carteri contained seven exons, and the U1C sequence of Dunaliella salina contained eight exons.Variation in gene structure was minimal among U1C genes. Therefore, we attempted to determine whether the motif compositions of the cDNA sequences reflected any differences. Maximisation for motif elicitation (MEME) analysis showed that approximately 100 U1C genes in plants have similar sequence characteristics, and eight or nine of the first ten motifs were located in similar sequence positions (Fig. S4, right panel), with little difference among basal angiosperms, monocots, and dicots. However, sequences with two or three motifs were exclusively observed in the clades of algae and Bryophyta, even in sequences with gene structures seemingly similar to those of other plant species. Thus, there is no necessary link between gene structure and the appearance of motifs.Conservation of protein domains and genomic synteny analysis of plant U1CsA phylogenetic tree was constructed by analysing protein domains and conserved amino acid (aa) motifs (Fig. 1). All peptides contained a U1C C2H2-Zinc finger domain at the amino terminus (Fig. 1, middle panel). MEME was used to annotate the ten most prominent aa motifs, which are presented as coloured boxes (Fig. 1, right panel). Again, the motifs of algal and bryophyte species were the least abundant among all observed species. Monocots possess an intermediate number of motifs. This suggests a divergence of plant U1C proteins among clades. In addition, more motifs were identified during evolution. As expected, large variations in gene structure may lead to a reduction in the number of conserved motifs. This analysis suggest that algal protein sequences may be less conserved than those of other plant species.Figure 1Protein structure organization and identification of conserved amino acid motifs among plant U1C genes. (A) Protein structure (middle panel), conserved amino acid motifs from MEME analysis, the top ten conserved motifs are indicated by different coloured boxes (right panel), and the vertical phylogenetic tree (left panel). In the middle panel represent U1 zinc finger. The small colored box represents different motifs in the right panel. (B) In the MEME analysis, the first ten conserved amino acid motifs are indicated by different coloured boxes. The logos and detailed information of ten amino acid motifs from MEME are shown below. The relative size of the symbols indicates their frequency in the sequence. The height of each symbol is proportional to the frequency of occurrence of the corresponding base at that position. Black vertical lines in the phylogenetic analysis represent the break at that particular branch.Synteny analysis of the exons revealed that the U1C gene structures of 46 plant species were highly conserved (Fig. S5). The middle part of the gene structure of all members was highly similar. However, this match was diminished at both ends of the gene, suggesting that although the regulation of U1C may vary owing to differences in the end part, its regulation of gene expression is somewhat similar across gene families because of the low similarity between coding sequences.Clustering associated with synteny followed the phylogenetic tree, with the exception of the bryophyte Physcomitrella patens, which clustered with the dicots at the top of the tree.Analysis of promoter and tissue/stimulus-specific expressionThe PlantCARE webtool was used to scan a 2-kb upstream sequence (5′-flanking) of U1C genes (Table S4)18, and the expression profiles of these genes were analysed using data retrieved from the eFP browser, Phytozome, RiceXPro, Road v2 database (Figs. 2, 3 and Figs. S6–S9). An extremely low representation of cis-elements was detected in sequences with only five cis-elements (Fig. 4 and Table S4). As expected, the CAT-box (for meristem expression) was the most represented and was detected in front of the transcription start sites. Owing to their CAT-box abundance, U1Cs may play a crucial role in newly formed tissues and are expected to fulfil the growing demand for transcription and downstream biological processes. HD-Zip 1 was found in five sequences, RY elements in five sequences, and motif 1 in one sequence. The only noticeable presence of the GCN4_motif in Brassicaceae species was surprising, and may be related to its importance in seed development and germination19. In the dicot Arabidopsis, U1C is highly expressed in developing embryos, maturing siliques, stigma, ovaries, seeds, and shoot apex, implying a potential role for the CAT-box and GCN_4 motifs of this gene in driving shoot-tip expression during flowering (Figs. 4 and S6). Notably, its expression was downregulated in flower and leaf tissues at seedling and adult plant stages. However, in P. trichocarpa, seedlings had a high transcript abundance (Fig. S7). Figure 2 shows the expression of the two U1C genes in the tissues of the three species. Glyma.02G152300 was the most abundant in new tissues. Solyc01g067830.2 in tomato, is highly expressed in fruits, and PGSC0003DMP400038064 in potatoes, is highly expressed in tubers. Another U1C gene, Glyma.10G021900, was expressed at lower levels in soybean than the former, consistent with the results obtained from the Phytozome database (Fig. S8). The expression of another U1C gene, Solyc06g00860.2 was similarly reduced in tomatoes. PGSC0003DMP400039624 was expressed at high levels in potatoes (Fig. 2). In monocots, one maize U1C gene was exclusively upregulated in the tassel and ear (male and female florets), implying a role in fertilisation and floral development. In rice, the highest expression was observed in the newly developing shoot apical meristem tissues and young leaves. In Brachypodium, shoot tissues showed the highest expression levels in very young and older plants (Fig. 3). In addition, based on RiceXPro, we found that the rice U1C gene was highly expressed in the embryo and endosperm 42 days after flowering (Fig. S8). In summary, U1C genes are mainly expressed in organs that act as a source of energy and nutrients for future growth or in processes that prepare such organs.Figure 2Tissue expression of representative U1Cs in selected dicots. Expression data downloaded from plant eFP browser and transformed by log conversion (base = 2) is presented as heatmap.Figure 3Tissue expression of representative U1Cs in selected monocots. Expression data downloaded from plant eFP browser and transformed by log conversion (base = 2) is presented as heatmap.Figure 4Summary of promoter motifs identified in plant U1C genes that putatively confer tissue specificity. Five motifs are represented by orbiculars of different colors and labelled along the 2-kb upstream promoter region (straight line) for each plant U1C gene. CAT-box: cis-acting regulatory element related to meristem expression; GCN4 motif: cis-regulatory element involved in endosperm expression; HD-Zip1: element involved in differentiation of palisade mesophyll cells; RY-element: cis-acting regulatory element involved in seed-specific regulation.Plants are sessile organisms that rely on gene expression to tolerate and adapt to changing environments20. To identify the most significant internal and external stimuli responsible for regulating plant U1Cs, we explored the stimulus-specific motifs in these sequences (Fig. 5). Twenty motifs were identified in the promoter region of plant U1Cs. The most abundant stimulus-specific cis-elements in the U1C sequence were the TGA element motifs, which are auxin-responsive motifs, followed by drought-responsive (ABRE) and temperature response (LTR) elements (Fig. 5). Correspondingly, Arabidopsis U1C expression was induced by auxin treatment (Fig. S6A). These results provide insights into the shoot-specific response of Arabidopsis U1C to drought, oxidative stress, wounding, and osmotic stress. In contrast, heat stress and Phytophthora infection upregulated this gene in all species (Fig. S9A). The expression of Arabidopsis AT4G03120 obtained from the eFP browser under different abiotic stimuli (time series from 0 to 24 h)21 showed early induction under heat stress (at 1–3 h, which subsided at 4 h in shoots but remained induced in roots), but slight shoot-specific downregulation under ROS-generating stress conditions (e.g. osmosis, drought, oxidative, genotoxic and wounding) and root-specific downregulation under salt stress (Fig. S9A). In rice, we obtained genetic changes in the U1C gene in response to drought, cold, high temperature, and salt stress from the ROAD v2 database (Fig. S9B) and U1C gene expression fluctuated. Heat and salt stress showed a trend of upregulation followed by downregulation. Cold stress showed an early induction of expression. In addition, we verified the expression of U1C gene (LOC_Os02g16640) in rice under different abiotic stimuli (0–12 h time series) using qPCR. U1C gene was induced early in shoots and then gradually weakened under low temperature (1–3 h) and salt stress (1–6 h) but was induced again after 12 h in roots. There were similar trends (at 1–6 h, early induction) in shoots and roots under drought stress, and the Cd treatment showed steady induction (at 1–12 h) in shoots. The opposite trend was observed, except that there was no significant difference between 6 h and the control in the roots under the Cd treatment (Fig. S9C).Figure 5Summary of promoter motifs identified in plant U1C genes that putatively confer response to internal and external stimuli. Twenty identified motifs are represented by symbols of different colors and are labelled along the 2-kb upstream promoter region (straight line) for each plant U1C gene. ABRE: ABA binding response element; ARE: Anaerobic response element; AT-rich sequence: cis-element for maximal elicitor-mediated activation; AuxRR-core: cis-acting regulatory element involved in auxin responsiveness; CGTCA-motif: cis-acting regulatory element involved in MeJA-responsiveness; GARE-motif: gibberellin-responsive element; GC-motif: enhancer-like element involved in anoxic specific inducibility; LTR: cis-acting element involved in low-temperature responsiveness; MBS: MYB binding site involved in drought-inducibility; P-box: gibberellin-responsive element; SARE: Synaptic Activity-Responsive Element; TATC-box: cis-acting element involved in gibberellin-responsiveness; TCA-element: cis-acting element involved in salicylic acid responsiveness; TC-rich repeats: cis-acting element involved in defense and stress responsiveness; TGACG-motif: cis-acting regulatory element involved in MeJA-responsiveness; TGA-element: auxin-responsive element; WUN-motif: wound-responsive element. Black vertical lines represent the break at that particular branch.Biotic treatments of leaf tissues with pathogens, including Botrytis (no change), Phytophthora infestans (significant induction), Pseudomonas (no change) and bacterial elicitors such as Flg22 (no change) for AT4G01320 suggest a possible pathogen-specific role in biotic responses. Hormone time-series data did not suggest any significant upregulation by any hormone, at least at the seedling stage, perhaps because of finer regulation at the cellular than the organ level. The eFP data suggested strong induction during embryonic development in cotyledons and roots (strongest), especially in the torpedo meristem region. Another notable strong induction was observed in stigma and ovaries (Fig. S6B), similar to the tassel and ear induction in maize U1C. Therefore, the corresponding cis-elements predicted in their promoter regions can be used as hypothetical targets for transcriptional regulation, mainly in seeds, through osmotic or osmotic-like stress and heat stress (Fig. 4).In conclusion, significant differences were observed in the motif composition of U1Cs in plants, indicating the complexity of transcriptional regulation. Additionally, each promoter can be regulated by a combination of internal and external cues.AS profile analysis and splicing isoformsA comparison of gene structures of a few representative U1Cs, U1As, U1-70Ks across algae, bryophytes, monocots, and dicots revealed that alternative transcripts did not occur frequently in the U1C, U1A, U1-70 K gene families (Figs. 6, S10, S11). However, some species, including Physcomitrella patens (4), Hordeum vulgare (9), and Glycine max (3) in U1C and Hordeum vulgare (11) in U1-70 K, contain multiple transcript genes. Even for these splice isoforms, gene structures did not differ significantly from representative transcript forms, and most sequence differences were observed in the UTR. Thus, U1 family genes may not undergo large splicing changes and must act uniformly on substrates with conserved zinc-finger motifs across isoforms. Splicing-induced functional differences in U1C, U1A, U1-70 K were assumed to result in minor changes in substrate gene expression.Figure 6AS profile analysis of U1C. Summary of annotated alternatively spliced transcript isoforms for identified U1C genes. Gene models were screened from the Phytozome v13.0 website.Interaction network and homology modellingWe also investigated the role of U1C in biological regulation. Using U1C protein of Arabidopsis, rice, humans, mouse and yeast (YLR298C) as parameters, the U1Cs protein interaction network was constructed using WebTool String (Fig. S12A and Table S5). Two plant species (Arabidopsis and rice), two animal species (Homo sapiens and mice), and a yeast (Saccharomyces cerevisiae) were included. Interestingly, humans and mice share nine of the ten proteins, whereas yeast U1C appears to bind to a different batch of spliceosomal proteins in addition to the familiar U1A. Furthermore, compared with humans, the two plant species shared only two of the eleven protein interactors, suggesting that these two species have different protein networks for their own U1Cs. However, further validation is required to elucidate specific molecular functions.U1C is the central component of the U1 snRNP and interacts with several other snRNP proteins. Understanding the protein structure is crucial for subsequent biochemical and functional analyses. A 3-D model of plant U1C proteins was constructed based on the template and crystal structures of human U1 snRNPs using homology modelling (Figs. S12B and S13). Although the crystal structure of human U1C is incomplete, the RRM domain containing the zinc finger motif (residues 1–38 in Arabidopsis) has been resolved in this crystal structure. Therefore, this result is sufficient to explain the existence of RRM domains in plants. Furthermore, conservation of the RRM domain was analysed. We observed a ConSurf Grade of 26 (49.1%) residues over 7, a ConSurf Grade of 12 (22.6%) residues over 9, and an amino acid conservation level of more than 80% among 51 (96.23%) sites (Fig. S14). RNA-binding sites were conserved, except for Arg3 and Tyr12 (Fig. S12B). Thr11, Thr14, His15, Ser19, Gln23, and Tyr28 form hydrogen bonds with RNA and play crucial roles in RNA binding. Arg3 interacts with RNA in almost all species. However, it was replaced by Leu, Gly, Val, and Ser in Miscanthus sinensis, Sorghum bicolor, Hordeum vulgare, and Brassica oleracea capitata, thereby influencing electrostatic interactions with RNA (Fig. S14). Tyr12 is highly conserved in all spermatophytes, except Hordeum vulgare. In summary, the RNA-binding domain of plant U1C is highly conserved, except for rare genetic mutations in some species.

Hot Topics

Related Articles