Aletsch – accurate assembly of multiple RNA-seq samples

High-throughput RNA sequencing (RNA-seq) has revolutionized our ability to decode the activities of genes within cells. However, one persistent challenge has been reconstructing full-length transcripts accurately, especially from single-cell RNA-seq data where transcripts are often fragmented. Traditional single-sample assemblers struggle with this, and while multiple-sample assemblers exist, they too face various limitations.
Introducing Aletsch: A Game-Changing Assembler
Now, researchers at The Pennsylvania State University have introduced Aletsch, a novel assembler designed for both bulk and single-cell RNA-seq samples. Aletsch incorporates cutting-edge algorithmic innovations to address the fragmentation issue inherent in single-sample assemblies.
Key Innovations of Aletsch

Bridging System: Aletsch utilizes a unique “bridging” system that integrates information from multiple samples to reconstruct missed junctions in individual samples. This approach helps stitch together fragmented transcripts, providing a more complete view of gene expression.
Graph-Decomposition Algorithm: Aletsch employs a sophisticated graph-decomposition algorithm that leverages supporting information across multiple samples. This method guides the decomposition of complex vertices in the transcript assembly process, improving accuracy and completeness.
Random Forest Model: Aletsch incorporates a random forest model equipped with 50 specially designed features for scoring transcripts. This model enhances the precision and reliability of assembled transcripts by evaluating multiple characteristics simultaneously.

Workflow of Aletsch

(a) Constructing individual and combined splice graphs. Dashed green lines indicates junctions inferred from clipped reads. (b) Bridging paired-end reads to create enhanced phasing paths, shown as dashed green blocks. (c) Refining individual splice graphs. (d) Decomposing refined individual splice graphs and the combined graph, guided by enhanced phasing paths. (e) Grouping identical intron-chains (e.g. ττ3 and ττ4⁠) and scoring candidate transcripts with a random forest model to generate the final meta-transcripts.
Demonstrated Superiority of Aletsch
Researchers rigorously tested Aletsch on RNA-seq data from various protocols, chromosomes, datasets, and species.

Performance Metrics: Aletsch significantly outperformed existing meta-assemblers such as TransMeta and PsiCLASS. For instance, when evaluated using the partial area under the precision-recall curve (pAUC), Aletsch showed improvements ranging from 22.9% to an impressive 175.5% on human datasets.
Robust Adaptability: Aletsch demonstrated robust adaptability across different genomic contexts and species, highlighting its versatility and reliability in diverse biological settings.

Aletsch represents a significant leap forward in RNA sequencing assembly methodologies. Its innovative algorithms and robust performance across various datasets underscore its potential to reshape how we study gene activities at the transcript level. As Aletsch continues to evolve and be adopted by the scientific community, it promises to accelerate discoveries in genetics and molecular biology, ultimately paving the way for new insights into health and disease.
Availability – Aletsch is freely available at https://github.com/Shao-Group/aletsch.

Shi Q, Zhang Q, Shao M. (2024) Accurate assembly of multiple RNA-seq samples with Aletsch. Bioinformatics 40(Supplement_1):i307-i317. [article]

Hot Topics

Related Articles