starTracer – an accelerated approach for precise marker gene identification in single-cell RNA-Seq analysis


Single-cell RNA sequencing (scRNA-seq) is a powerful tool that allows researchers to dive deep into the diversity of cells within tissues. By examining gene expression at the level of individual cells, scientists can uncover the distinct cell types and functions in ways that weren’t possible with older methods. A key part of this process is identifying “marker genes,” which are specific genes that help distinguish different cell types and tell us about the cell’s current status.
Traditionally, researchers have relied on software like Seurat and Monocle to identify these marker genes. These tools work by comparing one group of cells to all other groups, then picking out the genes that stand out. While useful, this method tends to be inefficient, running into problems with speed and accuracy, especially when dealing with large datasets. These inefficiencies slow down the process and can sometimes lead to incorrect results.
starTracer, a new tool developed by researchers at Wuhan University, is designed to solve these challenges. StarTracer improves both the speed and accuracy of marker gene identification in single-cell RNA-seq analysis. Instead of the old method of comparing clusters of cells one by one, starTracer uses a more efficient algorithm to prioritize the most important genes right from the start.
Schematic of starTracer

A The structure of the expression matrix and annotation matrix of single-cell sequencing. B starTracer provides 2 options: de-novo “searchMarker” and in-conjunction “filterMarker”. “searchMarker” requires a cell annotation matrix and an expression matrix/averaged expression matrix from a single cell experiment or a Seurat object. “searchMarker” performs max-normalize the average expression matrix, calculates the molecular index for each gene passing a threshold set by the user and outputs a matrix with marker genes. C “filterMarker” takes an output matrix form “FindAllMarkers” function, assigns genes into clusters and re-arranges them by measuring the  Ti“>Ti for each gene. Time elapsed of “searchMarker” is much shorter than that of “FindAllMarkers” and “filterMarker”.
One of the major benefits of starTracer is its flexibility. It can take in a variety of data formats and still perform well. After running the analysis, starTracer produces a “marker matrix,” which is essentially a ranked list of genes. The best potential marker genes—those that most clearly define a specific cell type—are listed at the top, saving researchers time and effort.
Compared to Seurat, starTracer is 100 to 1,000 times faster, as shown by tests on several datasets. This speed advantage becomes even more significant as the size of the dataset increases. Plus, starTracer excels in smaller, more specific groups of cells, which is important when studying rare or hard-to-detect cell types.
In summary, starTracer provides researchers with a faster, more accurate, and flexible way to identify key marker genes, making it an essential tool for advancing our understanding of complex tissues and diseases. Whether working with large datasets or rare cell types, starTracer’s performance ensures that scientists can continue to push the boundaries of what we know about the cellular landscape.
Availability – The source code of the R package starTracer is available to be downloaded from our GitHub repository. (https://github.com/JerryZhang-1222/starTracer)

Hot Topics

Related Articles