In-silico analysis of cattle blood transcriptome to identify lncRNAs and their role during bovine tuberculosis

Quality filteringThe data of 48 samples was downloaded from the EBI-ENA database. The 48 samples are of 12 Holstein calves at 8- and 20-WPI. The quality control by Fastp showed that, on average, all 48 samples were passed with a value of 99.28%. Approximately 0.38% of reads were identified as low quality, while 0.31% were deemed short (refer to Supplementary Table 2). The average GC content was determined to be 46.84%, and following the filtering process, the Q30 base content reached 97.18%.Mapping and assemblyThe pre-processed reads underwent mapping against the cow genome version ARS-UCD1.2, obtained from NCBI. The average mapping percentage across all samples was determined to be 98.17%, falling within a range of 97.83% to 98.37%. Subsequently, the mapped reads from all samples were assembled, revealing an average of 54,481 transcripts, with individual samples ranging from 53,883 to 55,120. After merging all the transcripts from all samples and reassembling the samples using Stringtie, the number of transcripts per sample was refined to 38,050. Detailed mapping and assembly results can be found in Supplementary Table 3. This process aims to ensure comprehensive coverage and accuracy in capturing the transcriptional landscape, facilitating downstream analyses and interpretations.Identification of long noncoding RNAsAfter merging, a total of approximately 420,938 transcripts were acquired and annotated with 16 distinct class codes. As illustrated in Fig. 1A, nearly half (49.93%) of the transcripts were assigned Class code “J”, denoting multi-exon transcripts with at least one junction match. Furthermore, 20.82% of the transcripts bore Class code “C”, indicating transcripts contained in the reference (intron compatible). The remaining class codes were linked to less than 10% of the transcripts each. Table 1 provides further details, indicating that approximately 74,162 transcripts of class codes I, U, and X were extracted.Figure 1Identification of lncRNAs—(A) Pie chart showing the proportions of sequences of each class code. (B) Pie chart showing the proportions of each type of lncRNAs. (C) Figure showing the distribution of identified long noncoding RNAs across the chromosomes of cow.Table 1 Table showing the number of sequences eliminated and retained in each step during the extraction of long noncoding RNAs.Following analysis using length filter, ORF filter, Pfam RPSBLAST, and CPC2 annotation, 51,812 transcripts successfully passed all filters, being considered potential lncRNAs. As depicted in Fig. 1B, among these, 18,373 transcripts were of class code I, 24,001 transcripts were of class code U, and 9,438 transcripts were of class code X. BLAST analysis against the NONCODE database v6 identified similarities in 21,708 out of 51,812 sequences. Among these, around 489 sequences showed both 100% similarity and coverage. It’s noteworthy that there were no hits with 100% similarity in the miRBase (hairpin and mature) and tRNA databases. The chromosomal localization of the identified lncRNAs, plotted using the phenogram tool43, is depicted in Fig. 1C. This figure illustrates that chromosome 10 contained 6.77% of the identified lncRNAs. On average, chromosomes 5, 18, and 19 harboured approximately 5.69% of the lncRNAs, while chromosomes 27 and MT had the fewest number of identified lncRNAs.Differential expression analysisDifferential expression analyses were conducted for both genes and lncRNAs under two conditions: (I) between infected and uninfected states and (II) between different time points. Statistically, the genes and lncRNAs showing an FDR value < 0.05 were considered as significant. In infection-based analysis, about 6 genes and 87 lncRNAs were identified to be differentially expressed at 8 WPI, while about 6 genes and 30 lncRNAs were found to be differentially expressed at 20 WPI. In timepoint-based analysis, 25 genes and 73 lncRNAs were found to be differentially expressed in infected 8 verses 20 WPI and 199 genes and 137 lncRNAs were found to be differentially expressed in uninfected 8 verses 20 WPI. Surprisingly, uninfected 8 vs 20 WPI showed highest number of DEGs and DElncRNAs. As per the original paper4, this difference is due to the changes in age of the animals during the experiment. The number of genes and lncRNAs differentially expressed at each condition were mentioned in Table 2.
Table 2 Table showing the number of differentially expressed genes (DEGs) and differentially expressed lncRNAs (DElncRNAs) obtained in (I) between infected and uninfected and (II) between different time points.The Circos plots, illustrated in Fig. 2A and B, provide a visual representation of the chromosomal localization of differentially expressed genes and lncRNAs under various conditions. In the visualization, the outermost black circle signifies the chromosomes of cow. The inner concentric circles depict the positions of differentially expressed genes and lncRNAs on chromosomes in the order infected 8 vs 20 WPI, uninfected 8 vs 20 WPI, infected vs uninfected 8 WPI and infected vs uninfected 20 WPI. Lines extending outward from the central line signify up-regulation, while lines moving inward denote down-regulation of the genes and lncRNAs. The figure shows that in comparison to genes, more lncRNAs were differentially expressed in two analysis conditions. Time-based analysis showed more differentially expressed genes and lncRNAs, while infected and uninfected based analysis showed fewer genes and lncRNAs. In the infected and uninfected-based analysis, the number of genes remained constant between the two time points, while the count of differentially expressed lncRNAs was higher at 8 WPI compared to 20 WPI. These visualizations provide a clear overview of the genomic distribution and expression patterns of genes and lncRNAs under different conditions and timepoints.Figure 2Figure showing the chromosomal localisation of (A) differentially expressed genes and (B) long noncoding RNAs. The outermost black circle signifies the chromosomes of cow. The inner concentric circles depict the positions of DEGs and DElncRNAs identified in different conditions in the order infected 8 vs 20 WPI, uninfected 8 vs 20 WPI, infected vs uninfected 8 WPI and infected vs uninfected 20 WPI.Gene ontology analysisThe gene ontologies were represented in Fig. 3A with level 2 Gene Ontologies (GOs) including biological process, molecular function and cellular component. In the analysis of biological process level 2 GOs, a total of 12 categories were identified. Notably, 20.33% (159) of the genes were associated with cellular processes (GO:0009987), followed by 17.9% (140) annotated with metabolic processes (GO:0008152). Immune system processes (GO:0002376) were represented by 3.2% (25) of the genes. Figure 3B illustrates the annotation of DEGs with 4 distinct molecular function level 2 GOs, with 42.03% (116) of the genes annotated with binding (GO:0005488) and 7.25% (20) with molecular transducer activity (GO:0060089). Figure 3C displays the annotation of DEGs with 2 cellular component level 2 GOs, where 61.66% (156) were associated with cellular anatomical entities (GO:0110165), and 38.34% (97) were linked to protein-containing complexes (GO:0032991). Figure 4 shows the diversity of the pathways of the differentially expressed genes across different categories of the KEGG and Reactome databases. The KEGG database pathway analysis showed that most pathways were in Organismal systems, followed by the environmental information processing category, while very few were in the genetic information processing category. As per the Reactome database, the highest number of pathways were in the signal transduction category, followed by the metabolism of proteins category. In contrast, only one pathway was found in protein localisation, DNA replication, programmed cell death and muscle contraction categories.Figure 3Figure showing the pie charts representing the Level 2 Gene ontologies (GOs) of the differentially expressed genes of all conditions—(A) biological process GOs, (B) molecular function GOs and (C) cellular component GOs.Figure 4Figure showing the distribution of the categories of pathways annotated using KEGG and Reactome databases.Apart from this, a total of 8 different genes were annotated with biological process Immune system process (GO:0002376) and only 1 gene was annotated with pathways in the Reactome immune system category, which were annotated with different biological process GOs including amide biosynthetic process (GO:0043604), peptide biosynthetic process (GO:0043043) and chromatin remodelling (GO:0006338). Supplementary Table 4 offers detailed information on gene ontologies, covering biological process, molecular function, and cellular component, EGGNOG annotation, KEGG/Reactome pathways, and pathway categories of all the differentially expressed genes identified in various conditions. These results enhance our understanding of the functional roles and pathways associated with the differentially expressed genes under diverse conditions.Co-expression analysisIn the co-expression analysis using Weighted Gene Correlation Network Analysis (WGCNA), modules were identified under two conditions: (I) between infected and uninfected states and (II) between different time points. No outliers were detected in both infection and timepoint-based analyses. The parameters like soft power and minimum module size were determined based on the data. For certain conditions, when the scale-free topology fit index failed to reach values above 0.8 for reasonable powers, the soft power was selected as 9, following guidance from the WGCNA tutorial and detailed in Supplementary Table 5. In infected-uninfected-based analysis, at 8 WPI, three modules were identified with module sizes ranging from 17 to 58. At 20 WPI, four modules were identified with module sizes ranging from 5 to 16. In timepoint-based analysis, four modules were identified, with module sizes ranging from 15 to 48, representing different time points in infected individuals. In infected-uninfected-based analysis, at 8 WPI, five interactions between differentially expressed (DE) genes and lncRNAs were identified with a threshold value of 0.1. At 20 WPI, 30 interactions were identified with a threshold value of 0.01, indicating weak interactions. In the timepoint-based analysis between different time points of infected individuals, 41 interactions between DE genes and lncRNAs were identified with a threshold value of 0.1. These findings provide insights into the co-expression patterns and interactions between DE genes and lncRNAs under different conditions and time points, shedding light on potential regulatory networks associated with the host’s response to infection.Cis–trans analysisIn the co-expression analysis of lncRNAs and gene pairs, the cis and trans analyses were conducted. In infection-based analysis, at 8 WPI, no cis-acting pairs were identified. At 20 WPI, two cis-acting pairs were found in which lncRNAs were located within the genes. In timepoint-based analysis between different timepoints of infected individuals, a total of 9 cis-acting pairs were identified. Out of these, one pair had the lncRNA downstream of the gene. Four pairs had the lncRNA upstream of the gene. Four pairs had the lncRNA within the gene. These findings, detailed in Table 3, provide information about the spatial relationships between co-expressing lncRNAs and genes under different conditions and time points.
Table 3 Table showing the number of trans, cis and different categories of cis co-expressing interactions between differentially expressed genes and long noncoding RNAs—(I) between infected and uninfected and (II) between different time points.Gene–transcription factor interaction analysisIn the analysis of transcription factors, 10 motifs were identified within the differentially expressed genes in both conditions. Three motifs (Motif 1, 2 and 9) with a p-value cut-off of 0.05 were related to the transcription factors. Six different transcription factors were identified to match the three motifs; among them, two belong to the ZNF263 family, and the other four transcription factors were identified as SP2, SP1, NFKB1, and NFATC2. The transcription factors SP2 and SP1 were found to match Motif-1, NFKB1 and NFATC2 were found to be matching to Motif-9 and ZNF263 was found to be matching to both Motif-1 and Motif-2. Supplementary Table 6 provides detailed information about the identified transcription factors, including their associations with specific motifs.Gene miRNA analysisIn the miRNA analysis, 125 microRNAs were found to target approximately 48 differentially expressed genes in both analysis conditions. In infected and uninfected based analysis, miRNA bta-mir-326, bta-mir-328, bta-mir-1291, bta-mir-2364, bta-mir-2447, bta-mir-2455, bta-mir-2897, and bta-mir-3431 were found to be targeting one differentially expressed gene at 8 WPI. No miRNAs were found to target differentially expressed genes at 20 WPI in this specific analysis.Network visualisationUsing the Cytoscape tool with the yfiles organic layout, the interaction network of differentially expressed lncRNAs, co-expressing genes with transcription factors, and associated pathways or biological process GOs was visualized. Figures 5 and 6 depict plots illustrating the comparisons between infected and uninfected conditions and between different time points. In these figures, blue diamond-shaped nodes represent differentially expressed genes, black circular nodes represent differentially expressed lncRNAs, red triangular nodes signify transcription factors interacting with the differentially expressed genes, and green rectangular nodes represent pathways or biological process GOs. In Fig. 5A, represents network obtained data of infected versus uninfected at 8 WPI. This figure shows 3 DEGs, 3 DElncRNAs and 5 TFs. Of the 3 DEGS, gene H3FA was annotated with nucleic acid binding GO and co-expressing with 3 DElncRNAs, gene B2M was annotated with immune system (Reactome) pathway and co-expressing with 1 DElncRNA and gene TMSB4X was annotated with Haemostasis (Reactome) and co-expressing with 1 DElncRNA. In Fig. 5B, network obtained data of infected versus uninfected at 20 WPI. This figure shows 4 DEGs, 12 DElncRNAs and 6 TFs. Of the 4 DEGS, gene Loc101903126 was annotated with Disease (Reactome) and co-expressing with 10 DElncRNAs, gene MSTRG.15670 was annotated with Reverse transcriptase pathway and co-expressing with 1 DElncRNA and gene MSTRG.15980 was annotated with peptidyl-dipeptidase inhibitor activity and co-expressing with 9 DElncRNAs and MSTRG.5660 was annotated with Cell cycle (Reactome) and co-expressing with 8 DElncRNAs. In Fig. 6, network obtained data of infected 8 versus 20 WPI. This figure shows 9 DEGs, 39 DElncRNAs and 6 TFs. Of the 9 DEGS, gene ADGRG1 was unannotated and co-expressing 2 DElncRNAs, gene MSTRG.12261 was unannotated and co-expressing with 1 DElncRNA, gene MSTRG.13282 was annotated with Transport of small molecules (Reactome) and co-expressing with 6 DElncRNAs, gene MSTRG.1679 was annotated with Signal Transduction (Reactome) and co-expressing with 1 DElncRNA, gene MSTRG.17885 was annotated with Disease (Reactome) and co-expressing with 8 DElncRNAs, gene MSTRG.20895 was annotated with Disease (Reactome) and co-expressing with 1 DElncRNA, gene MSTRG.3665 was annotated with Organismal Systems (KEGG) and co-expressing with 7 DElncRNAs, gene MSTRG.4064 was unannotated and co-expressing with 12 DElncRNAs and gene MSTRG.5840 was annotated with Disease (Reactome) and co-expressing with 1 DElncRNA. These visualizations provide a comprehensive overview of the regulatory network, showcasing the relationships between differentially expressed genes, lncRNAs, transcription factors, and associated pathways or biological processes. The yfiles organic layout, based on the force-directed layout paradigm, helps in understanding the structural organization and potential interactions within the complex network. The different shapes and colours of nodes add clarity to the visualization, making it easier to interpret the roles of various elements in the network.Figure 5Figure showing the Network diagrams of differentially expressed genes with co-expressing long noncoding RNAs and transcription factors identified in (A) between infected and uninfected at 8 WPI and (B) between infected and uninfected at 20 WPI.Figure 6Figure showing the Network diagrams of differentially expressed genes, with co-expressing long noncoding RNAs, Transcription factors identified between different time points of infected (8 vs 20 WPI).

Hot Topics

Related Articles