GLDADec: marker-gene guided LDA modeling for bulk gene expression deconvolution

Understanding the composition of cells within tissues and tumors is crucial in fields like immunology and oncology. Researchers at The University of Tokyo have introduced a powerful new tool called guided LDA deconvolution (GLDADec) to decipher cell type proportions from bulk transcriptome data.
Traditionally, analyzing bulk transcriptome data involves averaging gene expression across all cells in a sample, obscuring insights into specific cell types. GLDADec revolutionizes this by leveraging cell type-specific marker genes to estimate the presence of different cell types within a sample more accurately.

Overview of GLDADec

The observed gene expression profiles are considered as bag-of-words. We extend the standard LDA generation process to incorporate semi-supervised learning, where the gene names specific to each cell (topic) serve as partial prior information to guide the process. By running GLDADec, we can obtain θ, which reflects the cell type proportions in each sample.

In benchmarking against blood-derived datasets, GLDADec demonstrated superior performance in estimating cell type proportions with robustness across varied conditions. This method not only outperforms existing techniques but also enhances biological interpretability by linking estimated cell types to enriched biological processes.
Beyond blood samples, GLDADec excels in analyzing heterogeneous tissue bulk data, enabling comprehensive cell type analysis in a data-driven manner. By applying GLDADec to The Cancer Genome Atlas (TCGA) tumor samples, the researchers successfully stratify tumor subtypes and perform survival analysis based on estimated cell type proportions. This breakthrough underscores GLDADec’s practical utility in clinical settings, offering new avenues for understanding disease progression and therapeutic responses.
In summary, GLDADec represents a significant advancement in transcriptome analysis, empowering researchers with a nuanced understanding of cellular compositions in complex biological samples. Its integration into clinical practice promises to accelerate personalized medicine by providing deeper insights into disease mechanisms and treatment outcomes.
Availability – GLDADec is available as an open-source Python package at https://github.com/mizuno-group/GLDADec.

Azuma I, Mizuno T, Kusuhara H. (2024) GLDADec: marker-gene guided LDA modeling for bulk gene expression deconvolution. Brief Bioinform 25(4):bbae315. [article]

Hot Topics

Related Articles