Characterization and comparative analysis of sericin protein 150 in Bombyx mori

Identification of SP150 gene in B. mori
To identify the homolog of the major sericin gene P150 described previously in G. mellonella and E. kuehniella, we performed a BLAST search against the B. mori genome. We identified two adjacent homologous regions in the genomic sequence that were predicted to be parts of the two B. mori genes. Most homologous sequences belonged to LOC101737213, and the remaining C-terminus was predicted to belong to a separate gene, LOC119629229. Remarkably, the B. mori putative protein sequence shared 47.5% identity with G. mellonella SP150 across 70 C-terminal amino acid residues. We hypothesized that the predicted gene models was incorrect, and that the sequences of both B. mori genes were part of a single large SP150 gene.To test our hypothesis, we prepared cDNA libraries from the anterior, middle and posterior silk glands (ASG, MSG, and PSG) of last-instar larvae and aligned the silk gland-specific RNA-seq data with the reference genome. Alignment revealed the absence of an intergenic region between LOC101737213 and LOC119629229 (Fig. 1A). To verify that the two putative B. mori genes constituted a single gene, we designed intron-spanning RT-PCR primers to link the last two exons of LOC101737213 to the second exon of LOC119629229 (Fig. 1B). As shown in Fig. 1C, the amplified cDNA fragments supported our hypothesis of a single SP150 gene.Fig. 1The revised gene model for the B. mori SP150 gene consisting of two putative genes, LOC101737213 and LOC119629229. (A) The RNA-seq reads from B. mori middle silk glands were mapped to the genomic region of Chromosome 12 (NC_051369.1: 1,270,000–1,280,000, strand flipped). The number of reads bridging exon junctions are indicated at the midpoint of the arcs. The absence of bridging reads shown between LOC101737213 and LOC119629229 indicates that the previously identified intergenic region is incorrect. (B) Predicted exon–intron structure of the revised SP150 gene model (green), consisting of four exons and three introns. Gene models from current NCBI annotation are shown for comparison: LOC101737213 (pink); LOC119629229 (blue); MAD (Mothers against dpp; gray); and the KWMTBOMO06993 gene model from SilkBase (brown). The bottom figure shows an approximately 2.2-kb region with chromosomal coordinates and the positions of the primer pairs used in this study. (C) Validation of the 3′ region of the revised SP150 gene model by RT-PCR and agarose gel electrophoresis. The expected product sizes specific for primer pairs: #1F-#1R, #2F-#2R, and #3F-#3R were 1003 bp, 313 bp, and 93 bp, respectively. The electrophoretogram contains a 1 kb ladder (lane 1) and 100 bp ladder (lane 5).The resulting gene model of B. mori SP150, shown in Fig. 1A, spans approximately 20 kb and comprises four exons and three introns (Fig. 1A). The first two exons encode a signal peptide and are part of a short, non-repetitive N-terminal sequence. The third exon is notably large and comprises 94% of the open reading frame (ORF), featuring two central repetitive regions flanked by unique sequences. The last exon contains a short ORF that ends with a stop codon. Altogether, the gene encodes a protein of 4552 amino acids, including a 19-amino acid signal peptide.Putative SP150 proteinThe predicted protein product of the SP150 gene in B. mori is a large protein weighing 467 kDa and comprising 4552 amino acid residues. It begins with a 19-amino acid signal peptide, followed by a 616-amino acid non-repetitive central segment. This is followed by 45 repeats of a 30-amino acid motif (repeat 1), a 34 amino acid non-repetitive linker, and 73 repeats of a 35-amino acid motif (repeat 2). The protein ends with a non-repetitive C-terminus spanning 388 amino acids. The complete amino acid sequence is listed in Supplementary Text S1. As shown in Fig. 2, SP150 consisted of two types of highly conserved threonine-rich repeat blocks (Supplementary Table S4). The SP150 protein is relatively highly hydrophilic (hydropathy index = − 0.672 compared to − 1.118 of Ser1). The B. mori SP150 contains more than 27% threonine, 14% serine, and 12% alanine residues. The C-terminus (encoded by the last exon) contains a short, conserved three-cysteine motif, CXCXCX (Supplementary Table S5).Fig. 2Amino acid sequence logos showing conservation pattern in two repeat types of the B. mori SP150 protein. The hydrophobicity of amino acids is indicated by color: hydrophilic (blue; RKDENQ); neutral (dark gray; SGHTAP); hydrophobic (orange; YVMCLFIW). The height of each letter indicates the degree of conservation at that position.Compared with the SP150 proteins of G. mellonella and E. kuehniella, the SP150 of B. mori was almost three times larger, less hydrophilic, and contained fewer serine residues. Except for the C-terminal amino acids, there were minimal similarities among the SP150 proteins of these species (Supplementary Table S6).
SP150 mRNA is specifically expressed in MSGTo determine the specific expression of B. mori SP150 in the silk glands, we used RNA-seq data from three SG-specific cDNA libraries and quantified transcript abundance using the kallisto software (see “Materials and methods”). As shown in Table 1, the highest transcript abundance of SP150 was found in MSG, but this level remained substantially lower than that of sericins 1–3. In addition, we reanalyzed publicly available RNA-seq data for silk gene expression from previous experiments24,35 using a similar approach to calculate the transcript levels. Consistent with our results, publicly available data also indicated that the highest TPM (transcripts per million) for SP150 was detected in the MSG in its anterior region (Table 1).
Table 1 Comparison of RNA-seq data from three studies. Transcript quantification of silk genes in anterior (ASG), middle (MSG), and posterior (PSG) silk glands was computed by the software kallisto in transcript per million (TPM) units. A-MSG denotes anterior MSG, M-MSG denotes middle part of MSG, P-MSG denotes rear part of MSG, “–” denotes “no detection”. Sources of the RNA-seq data include this study, BioProject PRJNA559726, and BioProject PRJDB8614 (see Supplementary Table S1 for details).To further confirm the tissue specificity of SP150 expression, we isolated mRNAs from different parts of the silk glands and control tissues (intestine, integument, fat body, head, testis, and ovary) from day 3–5 last-instar larvae. We then prepared cDNA and performed qPCR. As shown in Fig. 3, we also analyzed the expression of MAD, a gene adjacent to SP150, and three genes encoding sericins, ser1, ser2 and ser3. The results showed that B. mori SP150 was specifically expressed in the middle silk glands, whereas MAD was ubiquitously expressed.Fig. 3Quantitative PCR (qPCR) analysis of silk gland-specific gene expression in nine larval tissues of B. mori. The expression levels of the genes ser1, ser2, ser3 and SP150 were measured as well as the MAD gene, which is located adjacent to SP150, for comparison. Statistical differences were evaluated using Student’s t-test (see Supplementary Table S3). The error bars indicate the standard deviation. The results show that B. mori SP150 is specifically expressed in middle silk glands, whereas MAD is ubiquitously expressed.Finally, we isolated RNA from several tissues of the last-instar larvae (wandering stage) and performed northern blotting. As shown in Fig. 4, the amount of SP150 transcript was very low, close to the detection limit. The SP150 RNA transcript was notably large in length and primarily localized in the middle parts of the SG, similar to the patterns observed for ser1.Fig. 4Northern blot analysis of the tissue-specific expression of SP150. Total RNA samples from various tissues were analyzed with 32P-labeled cDNA fragments specific to SP150. A Ser1 probe was used as a control. FB fat body, INT integument, GUT intestine, PSG posterior SG, MS2 posterior part of middle SG, MSG1 anterior part of middle SG, ASG anterior SG.Quantitative proteomic analysis of silk samplesTo investigate the presence of SP150 in B. mori cocoons, we performed mass spectrometry (MS) proteomic analyses of wild-type cocoon silk. Data were analyzed using the Andromeda search engine integrated into the MaxQuant software25,26, and the relative protein abundances were determined through label-free quantification. We identified 118 proteins, with a false discovery rate (FDR) of 1% for protein identification. The consistency of protein intensities between biological replicates was robust, as summarized in Supplementary Table 7.To evaluate the effect of solvent on peptide identification, we dissolved the cocoons in 5% protein solution in saturated LiSCN and compared the peptide yields. As shown in Table 2, overall, LiSCN yielded slightly more peptides. However, the number of specific peptides attributed to SP150 remained low. Our proteomic analysis revealed that SP150, similar to Ser2 and Muc-12, was detected in cocoon silk at a low level, close to the detection limit of our instruments. In contrast (Fig. 4A), Ser1 and Ser3 were identified as the most abundant components of cocoon silk, with concentrations at least five orders of magnitude higher than SP150 (Fig. 5A).
Table 2 Comparison of the results of proteomic analysis of cocoon samples from B. mori. Counts of detected peptides from silk proteins in cocoon samples dissolved in 8 M Urea and LiSCN. “–” indicates “no detection”.Fig. 5Proteomic analysis of B. mori silk proteins (A) Analysis of wt cocoons as previously described22; and (B) Data from individual cocoon layers obtained from a public repository24. Label-free quantification (LFQ) of silk proteins from cocoons was calculated using MaxQuant. LFQ intensities were log2-transformed. Relative protein contents in cocoon silk were analyzed using MaxQuant/Andromeda (eight experiments). Error bars indicate the standard deviation. The proteomic analysis confirmed that Ser1 and Ser3 were the most abundant silk components. The other proteins, including SP150, Ser2, and Muc-12, were detected in cocoon silk at low levels.To further confirm protein abundance in the cocoons, we reanalyzed existing proteomics data from the public repository24 using our new SP150 annotation. The abundance of SP150, Muc-12, and Ser2 in the cocoons is shown in Fig. 5B. All three proteins were present at very low levels, with SP150 and Ser2 showing highest levels in the innermost cocoon layer (layer 1). Overall, these results confirm that SP150 is present at low levels in cocoons, comparable to the levels observed for Ser2 and Muc-12.Synteny in regions coding for SP150 genes across Lepidoptera
A previous study on the pyralid moths G. mellonella and E. kuehniella showed that all known sericin genes, except SP150, are located within a cluster of orthologous genes in the same chromosomal region14. In addition, the results revealed some local rearrangements and duplications in this region, including the increase in copy number of several sericin genes in G. mellonella compared to related moth species14.Our data showed that similar microsynteny was also observed for SP150 between B. mori and G. mellonella or E. kuehniella (Fig. 6). The SP150 gene is located on a different chromosome than other sericin genes in a conserved region between genes encoding metalloprotease 1 and croquemont 1. As shown in Fig. 6, the region on chromosome 12 of B. mori has well-conserved synteny, comprising more than 40 genes, except for an inversion that positions the SP150 region in the reverse orientation relative to the adjacent genes.Fig. 6Microsynteny maps of SP150 and their flanking genes across species. Horizontal color blocks indicate chromosomal segments in each species. Homologous genes and gene orientation are represented by left- and right-pointing triangles with homologous pairs connected by lines. SP150 homologs are highlighted in red.SP150 may be related to Muc-12To investigate the evolution of SP150, we performed a BLAST search for homologous sequences in insect genomes, using the conserved sequence encoding the C-terminal protein end as a query. We found no obvious orthologs in non-lepidopteran insects, suggesting that SP150 is a Lepidoptera-specific gene. Furthermore, there were no SP150 orthologs in members of the superfamily Papilionoidea.SP150 proteins are highly divergent, making it difficult to align homologous proteins from different lepidopteran families, with the exception of the conserved C-terminus. The most prominent conserved motif is the three-cysteine sequence (CXCXCX), which is located 12–29 amino acids away from the C-terminus.Interestingly, another silk gland-specific protein also contains three cysteine (CXCXCX) motif, namely mucin-12, which also resembles SP150 due to its size and repetitive structure. To better understand the relationship of both proteins, we constructed a dendrogram using the C-termini of SP150 and Muc-12 from representatives of different lepidopteran families (Fig. 7). The resulting phylogenetic tree was robust and clearly distinguished the SP150 and Muc-12 clades, although the sequences from most primitive species showed less clear separation (Fig. 6). The sequence alignment and consensus sequences are shown in Fig. 6B.Fig. 7Relationship between lepidopteran SP150 and Muc-12 sequences. (A) Maximum likelihood phylogenetic tree based on the alignment of the C-terminal amino acid sequences of SP150 and Muc-12 homologs from selected lepidopteran species. The Nesw_1 transcript from N. swammerdamellus, (Incurvarioidea), the most primitive in this group, was selected for tree rooting. See Supplementary Table S8 for sequence details. (B) Alignment of the C-terminal regions of SP150 proteins from representative lepidopteran species. Sequences include the characteristic CXCXCX region, which is well conserved between species.

Hot Topics

Related Articles