McGill researchers introduce a scalable, cost-effective solution for single-cell profiling


An innovative computational tool, scSemiProfiler, makes powerful single-cell sequencing technology more accessible for health research
Single-cell sequencing is a breakthrough technology in biological research and personalized medicine. It can provide information at the individual cell level, allowing for increased understanding of cellular complexity, as well as identification and characterization of cellular subpopulations in patient samples, biomarker discovery and personalized therapies. However, the prohibitive costs of traditional methodologies substantially limit the application of this technology in health research and personalized medicine, particularly in large-scale studies.
Researcher Jun Ding, PhD, who holds a prestigious FRQS award in Artificial Intelligence and Health, is changing this paradigm. With his team at the Research Institute of the McGill University Health Centre (RI-MUHC), he has led the development of an innovative computational tool known as scSemiProfiler. This new tool, described in a recent Nature Communications publication and highlighted by the editors as among the 50 best papers in the field, will make single-cell sequencing technology more accessible for research. It combines deep generative artificial intelligence (AI) and active learning to create detailed single-cell data profiles, achieving high-quality results at just a fraction of the cost of traditional single-cell sequencing.
Overview of the scSemiProfiler method

a, Initial Setup: Bulk sequencing is first performed on the entire cohort, with subsequent clustering analysis of this data to pinpoint representative samples, typically those closest to the cluster centroids.
b, Representative Profiling: The identified representatives are then subjected to single-cell sequencing. The data obtained from this sequencing is further processed to determine gene set scores and feature importance weights, enriching the subsequent analysis steps.
c, Deep Generative Inference: This phase uses a VAE-GAN-based model to estimate single-cell data for a target sample. In its three-stage training, the model initially reconstructs the representative cells, and then produces target cells by analyzing the differences between the two samples as indicated by the bulk data.
d, Representative Selection Decision: Decisions are made on selecting additional representatives, considering budget limits and current representative effectiveness. An active learning algorithm, leveraging bulk data and the generative model insights, identifies new optimal representatives. These are then sequenced (b) and serve as and integrated as new references in the single-cell inference process (c). This active learning step is optional if the user prefers the all-in-one “global mode”.
e, Comprehensive Downstream Analyses: This final panel highlights the extensive analyses possible with semi-profiled single-cell data. It underscores the model’s ability to yield deep, diverse insights, demonstrating the full potential and broad applicability of the semi-profiled data.

“This breakthrough tool makes it feasible to extend single-cell sequencing to broader research applications and complex disease cohorts,” says Prof. Ding, a junior scientist in the Translational Research in Respiratory Diseases Program at the RI-MUHC and Assistant Professor in the Department of Medicine at McGill University. “According to 2023 estimates from the McGill University Health Centre, sequencing 20,000 cells can cost approximately $6,000, making it impractical for large-scale research projects. But with scSemiProfiler, we can change that.”

Developing a “semi-profiling” approach
Until now, researchers have used different methods to draw inferences from more affordable bulk sequencing data. While useful, these methods lacked the detailed resolution needed for single-cell level analyses, crucial for understanding complex diseases.
In contrast, the RI-MUHC team designed a method to “semi-profile” disease cohorts at the single-cell level accurately and efficiently. The scSemiProfiler tool leverages bulk data and single-cell templates from representative samples. The researchers have shown that scSemiProfiler produces datasets that are highly accurate and consistent with real, fully profiled data, allowing them to leverage the information from bulk data while providing more cost-effective single-cell data.

“We can broadly categorize the entire framework into two major parts, both of which are critical for delivering accurate and effective semi-profiling,” says Jingtao Wang, PhD candidate in the Computational Biology program in Experimental Medicine at McGill University’s Faculty of Medicine and Health Sciences and first author on the publication. “The first part is what we call “representative selection,” in which we choose the best representative single-cell samples. The second part is called “in silico inference,” which refers to the process of inferring the target single-cell data from the bulk.”

The deep generative learning model first reconstructs the single-cell reference data, then introduces the target sample’s information by requiring the generated single-cell data to have an average value similar to the target sample’s bulk data.

“Single-cell sequencing is a crucial tool for dissecting the cellular intricacies of complex diseases,” says Prof. Ding. “We are pleased that our tool may circumvent its prohibitive cost, particularly for expansive biomedical studies.”

The next steps for the work are the improvement, maintenance and democratization of this computational tool, say the researchers. They have already made the scSemiProfiler tool accessible to the research community on the GitHub platform. Next, a cloud service will be established to facilitate adoption of the tool by researchers who lack access to extensive computational resources.

“We are really honoured that the editors at Nature Communications have decided to feature our publication about scSemiProfiler in their recent Editor’s highlights section,” adds Prof. Ding. “This recognition underscores the importance of cost-effective tools in advancing research on the cellular complexities of complex diseases.”

Availability – the scSemiProfiler tool is available at: https://github.com/mcgilldinglab/scSemiProfiler/tree/main

Hot Topics

Related Articles