SOSTAR – long-read RNA isoform annotator


Understanding the structure of messenger RNA (mRNA) transcripts is crucial for both scientific research and diagnosing diseases. mRNA serves as the blueprint for protein production, and its structure can reveal important information about how genes function. However, current methods for studying mRNA, such as short-read RNA sequencing and RT-PCR (reverse transcription polymerase chain reaction), often fall short when it comes to capturing the complex details of these transcripts.
The Challenge with Traditional Methods
Traditional techniques are like trying to read a book with only a few pages instead of the entire volume. They provide valuable information, but they can’t give us the full story, especially for genes that are not very active or expressed at low levels. This is a significant limitation for researchers trying to understand how certain genes might contribute to diseases, including hereditary breast and ovarian cancer.
Enter Long-Read Sequencing
To address these challenges, scientists have developed third-generation long-read sequencing technologies, such as those from Pacific Biosciences and Oxford Nanopore Technologies. These advanced methods allow researchers to read the entire sequence of mRNA directly, which can lead to a better understanding of its structure and function.
A New Method for Better Insights
To make the most of these long-read technologies, researchers at the François Baclesse Center have introduced a new targeted enrichment approach. They designed a special set of probes—tiny pieces of DNA that can latch onto specific RNA transcripts. This method focuses on a panel of genes known to be involved in hereditary breast and ovarian cancer syndrome. By capturing these specific transcripts, researchers can study them in greater detail.
Introducing SOSTAR
The researchers developed a versatile tool called SOSTAR (iSofOrmS annoTAtoR). This tool helps in assembling, quantifying, and annotating the different forms of mRNA (known as isoforms) from long-read sequencing data. Thanks to the improved capture of transcripts, SOSTAR allowed the team to identify 1,231 unique transcripts in the gene panel from just eight patients. This is a significant leap forward in understanding the various forms of mRNA that can arise from these genes.
Targeted long read RNA sequencing workflow

(A) Overview of the sequencing protocol from cell lines to isoform assembly (B) Description of the SOSTAR pipeline
Key Findings
Using this new approach, the researchers found:

Complete Annotation: They were able to annotate the structure of transcripts with precision, down to a single base pair compared to a reference transcript.
Alternative Splicing Events: All major alternative splicing events of the BRCA1 and BRCA2 genes were identified. Alternative splicing is a process that allows a single gene to produce multiple mRNA variants, which can lead to different proteins.
Identification of Abnormal Transcripts: The researchers successfully identified unusual transcripts in their control samples, which is critical for diagnosing genetic conditions.
Solving Genetic Mysteries: They even resolved a case of unexplained inheritance in a family with a history of breast and ovarian cancer by detecting an SVA retrotransposon in the BRCA1 gene. This finding helps in understanding how certain genetic variations can be linked to cancer.

Conclusion
The new protocol developed for enriching specific RNA transcripts marks a significant advancement in the field of molecular biology. By using targeted probes with long-read sequencing technologies, researchers can now fully explore the complexities of mRNA structure in a single experiment. This breakthrough not only enhances our understanding of gene function but also opens up exciting possibilities for improving molecular diagnostics in hereditary diseases.
In summary, the development of targeted enrichment approaches and tools like SOSTAR is paving the way for better insights into RNA structure, which could lead to more effective ways to diagnose and understand genetic disorders in the future.
Availability – The code of the pipeline is publicly available on the SOSTAR GitHub repository (https://github.com/LBGC-CFB/SOSTAR).

Hot Topics

Related Articles