ZINBStein – shrinkage estimation of gene interaction networks in single-cell RNA sequencing data


Understanding how genes interact with one another is crucial for unraveling the complexities of biological processes. Think of gene interaction networks as complex webs where genes (the nodes) are connected by various interactions (the edges). These interactions can involve how genes regulate each other, how their protein products interact, or how they participate in metabolic pathways. To study these networks on a large scale, scientists often use a method called gene co-expression network analysis, which takes advantage of high-throughput gene expression data, such as that obtained from RNA sequencing.
The Power of Single-Cell RNA Sequencing
With advancements in sequencing technology, we now have the ability to analyze gene expression not just in bulk samples, but at the level of individual cells. This technique, known as single-cell RNA sequencing (scRNAseq), allows researchers to gain insights into how cells develop, differentiate, and function based on their unique transcriptomic profiles (the set of all RNA molecules in a cell). However, analyzing this type of data comes with its own set of challenges due to its high sparsity (many genes may not be expressed in all cells) and high-dimensionality (many genes are measured at once).
New Framework for Analysis
A recent study by researchers at the University of Surrey addresses these challenges by developing a new framework to analyze scRNAseq data. This framework focuses on estimating what is known as a sparse inverse covariance matrix, which helps researchers identify direct functional interactions between genes. In simpler terms, it helps to understand how genes communicate and interact with each other in a more direct way.
The study found that using a specific method called Stein-type shrinkage improves performance in analyzing high-dimensional scRNAseq data. This method allows for more accurate interpretations by reducing the impact of noise and ensuring that the analysis focuses on the most relevant gene interactions. The researchers also explored data transformation techniques that enhance the effectiveness of these shrinkage methods, especially in cases where the data doesn’t follow a typical Gaussian distribution (which assumes data is symmetrically distributed).
Zero-inflated covariance matrix shrinkage workflow

To account for presence of excessive zero counts in scRNAseq data, UMI count data is transformed into zero-inflated z scores for partial correlation matrix estimation. The zero-inflated z-score calculation step can be integrated in all shrinkage workflows, whereby a zero-inflated negative binomial is fitted to stratify zero counts from dropout event and “biological” zero counts. After the fitting, a non-detection rate (dij) is estimated in each cell for each gene. Thresholding (default t=0.5) is applied in which counts with dij≥t are considered missing or zero-inflated values. Z scores are calculated for all counts and scores of zero-inflated values are set to 0
Enhancing Analysis with Zero-Inflated Modeling
An interesting finding of the study is the use of zero-inflated modeling, which is particularly useful for scRNAseq data. In many cases, researchers encounter situations where there are many zeros in their datasets—this means that certain genes are not expressed in some cells. The researchers applied a negative binomial distribution model to improve the analysis of this zero-inflated data without disrupting the interpretation of counts where gene expression is present.
Conclusion
Overall, the new framework introduced in this study enhances the ability to analyze gene interaction networks using scRNAseq data. By providing flexibility in handling sparse count data resulting from dropout events (where some gene expressions are missed), it allows for a more accurate understanding of the complex interactions between genes. This advancement not only broadens the application of graphical models in scRNAseq analysis but also provides researchers with a powerful tool to unlock the mysteries of gene interactions in cellular biology.
Availability – Implementation of the framework is in a reproducible Snakemake workflow https://github.com/calathea24/ZINBGraphicalModel and R package ZINBStein https://github.com/calathea24/ZINBStein.

Hot Topics

Related Articles