The alternative splicing landscape of infarcted mouse heart identifies isoform level therapeutic targets

Animal care and ischemic reperfusion modelThe C57BL/6 J mice utilized in this study were procured from the animal center at the Institute of Zoology, Chinese Academy of Sciences (Beijing, China). All experimental procedures involving animals were ethically conducted, having received approval from the Institutional Care and Ethical Committee of the Institute of Zoology, Chinese Academy of Sciences (animal permit number: IOZ20190057). The study adhered to the principles outlined in the Guide for the Care and Use of Laboratory Animals (NIH publication 8th Edition, 2011).The IR model involved mice being anesthetized with isoflurane, followed by a left thoracotomy to expose the heart, and a transient ligation of the left main descending coronary artery (LCA) for 40 min, succeeded by reperfusion for 72 hr. Conversely, the MI model encompassed the permanent ligation of the left anterior descending coronary artery and ended at 72 hr post-surgery. The Sham group underwent a similar surgical procedure but without LCA ligation. Subsequent to surgical interventions, mice were accommodated in well-ventilated cages, provided with autoclaved diets, and maintained in a specific pathogen-free facility with a 12-hr light cycle. Euthanasia was executed through isoflurane inhalation followed by cervical dislocation.Library preparation and sequencingTotal RNA was isolated from the samples using Trizol reagents (Invitrogen, 15596018CN), where infarct core and peripheral regions were used in the MI and MISham groups, and infarct core tissue was used in the IR and IRSham groups. We assessed the RNA integrity of all samples using Agilent 2100 to exclude possible errors caused by their degradation. All samples showed good RNA quality, with RIN (RNA Integrity Number) of 8.5 or higher, with most samples greater than 9 (Supplementary Figure S2). The SQK-PCB109 (Oxford Nanopore Technologies) kit was used to prepare a full-length transcriptome sequencing library. In detail, 2 μM VN Primer (VNP) and 10 mM dNTPs were added and incubated at 65 °C for 5 min. Then snap cooled the samples and add 5xRT Buffer, RNaseOUT, and 10 μM Strand-Switching Primer (SSP) at 42 °C for 2 min. Next, 1 μl of Maxima H Minus Reverse Transcriptase was added, which allowed the full-length transcript to be reversely transcribed into cDNA in vitro for further PCR. Sequencing uses PromethION (FLO-PRO002) to generate fast5 files and the sequencing was scheduled to run for 72 hr or stop early when there are less than 10 active pores. Reads with quality score (Q score) < 7 were filtered out to ensure the quality of sequencing. Next-generation sequencing (Illumina)-based RNAseq was applied to the same samples. In total, 16 samples were sequenced for four groups (MI, IR, MI-Sham, and IR-Sham) with four samples in each group.Basecalling and full-length transcript identificationFast5 files were basecalled by Guppy (version 5.0.11) to generate fastq files. Pychopper (version 2.5.0) was used to identify full-length sequences from the filtered reads and NanoFilt (version 2.8.0)15 was used to remove low-quality (Q < 7) and relatively short sequences (L < 500) to ensure the accuracy of novel transcripts. After these filters, the retained reads are called High-confidence high-quality reads (HQ-FL, 50% of all FL data in terms of bp were retained, Supplementary Table S3). The sequence quality and length are visualized using NanoPlot (version 1.38.1)15. GENCODE M26 (GRCm39) transcript reference and genome reference (primary assembly) with the main annotation were used in sequence alignment and quantification12.Novel transcript identificationTo achieve full-length-transcript identification of transcribed genes, we established a bioinformatic workflow (Fig. 1A–C) that utilizes full-length transcript sequences from ONT to obtain all isoforms (Fig. 1B); we additionally performed Illumina sequencing for benchmarking and correction of sequencing errors in ONT reads. The discovery of novel transcripts is achieved through alignment and collapse with FLAIR (version 1.5)16 and SQANTI3 (version 4.2)17. Firstly, the clean fastq files from the Nanopore RNA-Sequencing were merged and aligned to genomic reference by Minimap2 (version 2.21)18 with the parameters “-ax splice -uf -k14–secondary = no” to genomic reference. Secondly, the alignment result is further processed to correct the splice junction via FLAIR using the given annotation files. Then we obtained reliable and high-quality transcripts after collapsing redundant sequences. Finally, we used SQANTI3 to identify the ORF region of transcripts, polyA tails, and other information.Fig. 1Overview of our studies and sequencing results. (A) The overall framework of our pipeline can be roughly divided into four parts, in which the identification of novel transcripts can be further subdivided in (B), and the operations and software involved in these steps are described in (C). (D) Percentage of sequences identified to VPNs and SSPs (Primers found) and chimeric sequences recovered by the software (Rescue) as well as removed sequences (Unusable) in the Oxford nanopore data. (E) Percentage of novel genes versus known genes resulting from the identification of novel transcripts. (F) Number of extended reference genomes after fusion of novel transcripts with known transcripts in GENCODE.We then performed stringent length filtering (minimum of 500 bp) and aligned the reads to the reference genome sequence (GRCm39) to obtain all junctions. We further filtered the reads using FLAIR to remove sequence redundancy and intrinsic errors in ONT reads, in particular filtering out low-abundance, low-quality regions (Fig. 1C). After annotating the transcripts via SQANTI3, a total of 33,335 high-quality non-redundant transcripts (isoforms) were obtained. The transcripts were mapped to a total of 21,702 non-overlapping chromosomal regions, of which 9,874 (45.5%) do not contain annotated exons of genes, but belong to either intergenic regions (8,235, 37.95%), anti-sense regions (1,637, 7.54%), or introns of annotated genes (2, 0.009%), which are hereafter referred to as ‘novel genes’ (novelGene). Our discovery of novel transcripts expands the number of known transcriptomes in mice by 16.5%, from the original 140,530 transcripts in GENCODE M26 to a total of 163,713 (Fig. 1F), in which 9,874 of the 63,318 genes (15.59%) identified were novelGene (Fig. 1E).We next focused particularly on the proportion of AS categories containing protein-coding transcripts since proteins function as effector molecules and factors responding to environmental changes. A total of 16,853 (53.35%) transcripts were detected by ONT sequencing encoded proteins (Fig. 2D) and were classified into seven categories: NIC, ISM, Intergenic, Antisense, Genic, Fusion and NNC. The number of novel transcripts coding for proteins identified in each category was significantly higher in the experimental group (MI and IR) compared to the control group (Sham) (Mann-Whitney test p < 0.05 for intergroup comparisons across categories of AS, as denoted by an asterisk in Fig. 2L). The formation of mature mRNA in vertebrates involves the cleavage and polyadenylation of precursor mRNA, and the resulting polyadenylation requires the recognition of an A-rich hexamer called the polyA signal (PAS)19, which tends to be located at an uneven distance of 10–30 bp upstream of the polyA. Among the main similar features of the transcripts was the median length, which was similar across categories (Supplementary Figure S1C). However, there was a significant difference in the PAS-to-polyA distance for NIC and Intergenic compared to known transcripts (FSM) (FSM = −19.89 ± 7.71 bp, NIC = −19.32 ± 8.07 bp, t-test p-value < 0.0001; and Intergenic = −19.39 ± 12.92 bp, t-test p value = 0.0041, negative sign represents PAS is located upstream of polyA) (Supplementary Figure S1D).Fig. 2Identification of novel transcripts and extension of reference annotations. (A) Schematic representation of the differences between different types of transcripts. (B) Sequence length distribution of different types of transcripts. (C) Percentage of each transcript type in different sequence length intervals. (D) Percentage of protein-coding and non-coding transcripts in different types. (E) Percentage of transcripts with only one exon (Mono-Exon) and multiple-exon transcripts (Multi-Exon). (F) Distribution of the number of transcripts of different lengths. The number of all transcripts (G), novel transcripts (H), and protein-coding novel transcripts (I) detected in MI/IR and IR-Sham/MI-Sham. (J) Number of transcripts detected in different groups of samples (Mann-Whitney test, asterisks indicate p < 0.05). (K) The number of different categories of novel transcripts. (L) The number of different categories of novel protein-coding transcripts (Mann-Whitney test, asterisks indicate p < 0.05).Merging of original annotation and novel transcript informationThe position information of the identified novel transcripts was added to the transcript annotation file acquired from GENCODE M26 by AGAT20 suite (version 0.8.1) to get an updated transcripts annotation for transcript quantification. We used the script “agat_sp_merge_annotations.pl” from the AGAT suite to merge two transcript annotations, and then manually filled in the missing annotation field and removed the redundant transcripts.Characteristics of alternative splicing in the mouse heartFurthermore, the alternative splicing landscape of myocardial infarction and myocardial ischemic reperfusion tissue was obtained by annotating non-redundant transcripts. Transcripts that could be mapped to annotated genes were divided into multiple categories using SQANTI3 (Fig. 2A). The largest category was Full Splice Match (FSM), which represents perfect matches to known transcripts (11,097 or 35.13%). This was followed by Intergenic, which refers to transcripts detected in the intergenic region (7,769 or 24.59%). Additionally, Novel in Catalog (NIC) was identified, which contains annotated splice junctions or new combinations formed by annotated exons (5,527, 17.50%). Incomplete Splice Match (ISM), which corresponds to transcripts matching a subsection of known transcripts (4,987 or 15.79%). Antisense, which does not overlap with reference genes but is antisense to annotated genes (1,537, 4.87%). In contrast, 629 (2.00%) genic transcripts were identified with a mixture of annotated introns and exons. Other categories were even rarer, including Fusion, Novel Not in Catalog (NNC), and Genic Intron. It is noteworthy that among the novel transcripts, a large proportion (67.09%) was multi-exon (Fig. 2E).We then analyzed the AS patterns (composition of different classes of novel isoforms) between the two groups of mice and found that there were significant differences between the MI and the MI-Sham group, as well as between the IR and the IR-Sham group (Asterisks imply Mann-Whitney test p < 0.05; Fig. 2K,L). This suggests isoforms can be thought of as disease-specific markers. A significantly higher number of transcripts was found in the MI and IR groups compared to the two Sham groups (Fig. 2J, Mann-Whitney test p = 0.0286), including 19,258 unique transcripts in the MI group (of which 2,111 are novel isoforms); and 13,501 unique transcripts in the IR group (of which 1,667 are novel isoforms; Fig. 2G,H).Gene and transcript quantificationFor the alignment of short-read data, we applied Hisat2 (version 2.2.1)21 and Samtools (version 1.7)22 was used to convert sam files to bam files. Next, we used featureCounts (version 2.0.1)23 to quantify gene expression (with parameter “-g gene_id”) and excluded rare transcripts with the sum of the reads below 10. Transcripts (either annotated or novel) were quantified using kallisto (version 0.50.1)24.Differential expression analysis of genes and transcriptsTaking the gene count matrix as input, differential expression analysis was performed using DESeq 2 (version 1.34.0)25 to calculate the log2 fold change (log2FC) and adjusted p-value of each annotated gene and an adjusted p-value < 0.1 was considered statistically significant. In contrast, transcripts were analyzed using the Swish method (with fishpond version 2.4.1)26, and a p-value < 0.05 was significantly different.We then quantitatively analyzed differentially expressed genes (DEGs) between groups. Taking advantage of the extended reference, we quantified the transcripts that could be mapped to known exonic regions of genes in each group. A total of 11,378 genes had a significantly different expression in MI (|FC| > 1.5, adjusted p-value < 0.1) compared to MI-Sham, including 5,809 up-regulated genes (of which 492 were novelGene), and 5,569 down-regulated genes (of which 781 were novelGene; Fig. 3E). We also found that a total of 8,796 genes (of which 891 were novelGene) had a significantly different expression in the IR compared to IR-Sham (|FC| > 1.5, adjusted p-value < 0.1), including 4,402 up-regulated genes (of which 340 were novelGene) and 4,394 down-regulated genes (of which 551 were novelGene; Fig. 3H).Fig. 3Gene level expression differences and functional enrichment. (A) Principal Component Analysis (PCA) of samples based on top500 genes. (B) Distance matrix of samples. (C) The number of genes identified versus NGS reads data, it can be seen that IR/MI can identify the same number of genes with less number of reads relative to two Sham groups. (D) Genes up regulated by MI relative to MI-Sham and their GO enrichment to terms. (E) Volcano plot of differential genes in the MI relative to IR-Sham. (F) Results of GO enrichment analysis of differential genes in MI relative to MI-Sham. (G) Genes up regulated by IR relative to IR-Sham and their GO enrichment to terms. (H) Volcano plot of differential genes in the IR relative to IR-Sham. (I) Results of GO enrichment analysis of differential genes in IR relative to IR-Sham.We then performed transcript-level analysis with a special focus on differentially expressed isoforms. We found that, compared to MI-Sham, MI had 1,735 transcripts with significantly down-regulated isoforms (of which 346 were novel transcripts,) and 2,447 transcripts with significantly up-regulated expression (of which 249 were novel transcripts; Fig. 4A). In the case of IR, the corresponding numbers were 696, 155, 835, and 82, respectively (Fig. 4C)Fig. 4Differential expression analysis at the transcript level with a focus on MI- and IR-specific transcripts. (A) MA plot of differentially expressed transcripts of MI relative to MI-Sham, triangles represent novel transcripts. (B) GO enrichment results for genes corresponding to differentially expressed transcripts in MI relative to MI-Sham. (C) MA plot of differentially expressed transcripts of IR relative to IR-Sham, triangles represent novel transcripts. (D) GO enrichment results for genes corresponding to differentially expressed transcripts in IR relative to IR-Sham. (E) MA plot of differentially expressed novel transcripts of MI relative to MI-Sham, triangles represent novel protein-coding transcripts. (F) Changes in the proportion of types of differentially expressed transcripts in protein-coding novel transcripts and novel transcripts. (G) MA plot of differentially expressed novel transcripts of IR relative to IR-Sham, triangles represent novel protein-coding transcripts. (H) Type distribution of differentially expressed protein-coding novel transcripts with fold change. (I) IR and MI group upregulation of differentially expressed novel protein-coding transcripts.

Hot Topics

Related Articles