Transformers in single-cell omics: a review and new perspectives

Zhu, C., Preissl, S. & Ren, B. Single-cell multimodal omics: the power of many. Nat. Methods 17, 11–14 (2020).Article
CAS
PubMed

Google Scholar
Kashima, Y. et al. Single-cell sequencing techniques from individual to multiomics analyses. Exp. Mol. Med. 52, 1419–1427 (2020).Article
CAS
PubMed
PubMed Central

Google Scholar
Method of the Year 2013. Nat. Methods 11, 1 (2013).Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. In Proc. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (eds Burstein, J., Doran, C. & Solorio, T.) 4171–4186 (2019).Vaswani, A. et al. Attention is all you need. Adv. Neural Inf. Process. Syst. 30, 5998–6008 (2017). This work introduced the transformer architecture, originally designed and evaluated on NLP tasks.
Google Scholar
Bommasani, R. et al. On the opportunities and risks of foundation models. Preprint at https://doi.org/10.48550/arXiv.2108.07258 (2021). This study introduced the nowadays popular notion of a foundation model.Dosovitskiy, A. et al. An image is worth 16×16 words: transformers for image recognition at scale. In Proc. 9th International Conference on Learning Representations (eds Hofmann, K. et al.) (2021).Radford, A. et al. Robust speech recognition via large-scale weak supervision. PMLR (ed. Lawrence, N.) 202, 28492–28518 (2023).
Google Scholar
Wen, Q. et al. Transformers in time series: a survey. In Proc. 32nd International Joint Conference on Artificial Intelligence (eds Stone, P. et al.) 6778–6786 (2023).Avsec, Ž. et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nat. Methods 18, 1196–1203 (2021).Article
CAS
PubMed
PubMed Central

Google Scholar
Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).Article
CAS
PubMed

Google Scholar
Abramson, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024).Article
CAS
PubMed
PubMed Central

Google Scholar
Lopez, R., Regier, J., Cole, M. B., Jordan, M. I. & Yosef, N. Deep generative modeling for single-cell transcriptomics. Nat. Methods 15, 1053–1058 (2018).Article
CAS
PubMed
PubMed Central

Google Scholar
Heimberg, G. et al. Scalable querying of human cell atlases via a foundational model reveals commonalities across fibrosis-associated macrophages. Preprint at bioRxiv https://doi.org/10.1101/2023.07.18.549537 (2023).Amodio, M. et al. Exploring single-cell data with deep multitasking neural networks. Nat. Methods 16, 1139–1145 (2019).Article
CAS
PubMed
PubMed Central

Google Scholar
Eraslan, G., Simon, L. M., Mircea, M., Mueller, N. S. & Theis, F. J. Single-cell RNA-seq denoising using a deep count autoencoder. Nat. Commun. 10, 390 (2019).Article
CAS
PubMed
PubMed Central

Google Scholar
Theodoris, C. V. et al. Transfer learning enables predictions in network biology. Nature 618, 616–624 (2023). This work proposed the first single-cell transformer that has successfully predicted candidate therapeutic targets.Article
CAS
PubMed
PubMed Central

Google Scholar
Brown, T. et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 1877–1901 (2020).
Google Scholar
Rives, A. et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc. Natl Acad. Sci. USA 118, e2016239118 (2021).Article
CAS
PubMed
PubMed Central

Google Scholar
Cui, H. et al. scGPT: toward building a foundation model for single-cell multi-omics using generative AI. Nat. Methods https://doi.org/10.1038/s41592-024-02201-0 (2024). This work proposed a single-cell transformer architecture that has been used for a wide range of tasks, including perturbation response prediction and multiomic data integration.Shen, H. et al. A universal approach for integrating super large-scale single-cell transcriptomes by exploring gene rankings. Brief. Bioinform. 23, bbab573 (2022). This work introduced the first gene-ranking-based single-cell transformer. It was also the first single-cell transformer pretrained on a large dataset of over 10 million cells.Article
PubMed

Google Scholar
Shen, H. et al. Generative pretraining from large-scale transcriptomes for single-cell deciphering. iScience 26, 106536 (2023).Article
CAS
PubMed
PubMed Central

Google Scholar
Levine, D. et al. Cell2Sentence: teaching large language models the language of biology. In Proc. 41st International Conference on Machine Learning (eds Salakhutdinov, R. et al.) (2024).Yang, F. et al. scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data. Nat. Mach. Intell. 4, 852–866 (2022).Article

Google Scholar
Chen, J. et al. Transformer for one stop interpretable cell type annotation. Nat. Commun. 14, 223 (2023).Article
CAS
PubMed
PubMed Central

Google Scholar
Yang, X. et al. GeneCompass: deciphering universal gene regulatory mechanisms with knowledge-informed cross-species foundation model. Preprint at bioRxiv https://doi.org/10.1101/2023.09.26.559542 (2023).Wen, H. et al. CellPLM: pre-training of cell language model beyond single cells. In International Conference on Learning Representations (eds Kim, B. et al.) (2024).Wen, H. et al. Single cells are spatial tokens: transformers for spatial transcriptomic data imputation. Preprint at https://doi.org/10.48550/arXiv.2302.03038 (2023).Dao, T., Fu, D. Y., Ermon, S., Rudra, A. & Ré, C. FlashAttention: fast and memory-efficient exact attention with IO-awareness. Adv. Neural Inf. Process. Syst. 35, 16344–16359 (2024).
Google Scholar
Choromanski, K. M. et al. Rethinking attention with performers. In Proc. 9th International Conference on Learning Representations (eds Hofmann, K. et al.) (2021).Roy, A., Saffar, M., Vaswani, A. & Grangier, D. Efficient content-based sparse attention with routing transformers. Trans. Assoc. Comput. Linguist. 9, 53–68 (2021).Article

Google Scholar
Jaegle, A. et al. Perceiver: general perception with iterative attention. In Proc. 38th International Conference on Machine Learning (eds Balcan, N. et al.) 4651–4664 (2021).Rasley, J., Rajbhandari, S., Ruwase, O. & He, Y. DeepSpeed: system optimizations enable training deep learning models with over 100 billion parameters. In Proc. 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (eds Gupta, R. et al.) 3505–3506 (2020).Serrano, S. & Smith, N. A. Is attention interpretable? In Proc. 57th Annual Meeting of the Association for Computational Linguistics (eds Màrquez, L. et al.) 2931–2951 (2019).Bian, H. et al. scMulan: a multitask generative pre-trained language model for single-cell analysis. In Research in Computational Molecular Biology. RECOMB 2024. Lecture Notes in Computer Science (ed. Ma, J.) Vol. 14758, 479–482 (Springer, Cham, 2024).Liu, T., Li, K., Wang, Y., Li, H. & Zhao, H. Evaluating the utilities of large language models in single-cell data analysis. Preprint at bioRxiv https://doi.org/10.1101/2023.09.08.555192 (2023). This study introduced the first multitask benchmark of multiple single-cell transformers trained on large-scale data.CZI Single-Cell Biology Program et al. CZ CELL×GENE Discover: a single-cell data platform for scalable exploration, analysis and modeling of aggregated data. Preprint at bioRxiv https://doi.org/10.1101/2023.10.30.563174 (2023). This work introduced a single-cell data platform with over 50 million cells that has been used for training many single-cell transformers.Domcke, S. & Shendure, J. A reference cell tree will serve science better than a reference cell atlas. Cell 186, 1103–1114 (2023).Article
CAS
PubMed

Google Scholar
Hao, M. et al. Large scale foundation model on single-cell transcriptomics. Nat. Methods https://doi.org/10.1038/s41592-024-02305-7 (2024).Oh, G., Choi, B., Jung, I. & Ye, J. C. scHyena: foundation model for full-length single-cell RNA-seq analysis in brain. Preprint at https://doi.org/10.48550/arXiv.2310.02713 (2023).Boiarsky, R., Singh, N., Buendia, A., Getz, G. & Sontag, D. A deep dive into single-cell RNA sequencing foundation models. Preprint at bioRxiv https://doi.org/10.1101/2023.10.19.563100 (2023).Tang, W. et al. Single-cell multimodal prediction via transformers. In Proc. 32nd ACM International Conference on Information and Knowledge Management 2422–2431 (2023).Tung, P.-Y. et al. Batch effects and the effective design of single-cell gene expression studies. Sci. Rep. 7, 39921 (2017).Article
CAS
PubMed
PubMed Central

Google Scholar
Fiers, M. W. E. J. et al. Mapping gene regulatory networks from single-cell omics data. Brief. Funct. Genomics 17, 246–254 (2018).Article
CAS
PubMed
PubMed Central

Google Scholar
Gong, J. et al. xTrimoGene: an efficient and scalable representation learner for single-cell RNA-seq data. Adv. Neural Inform. Process. Syst. 36, 69391–69403 (2024).
Google Scholar
Roohani, Y., Huang, K. & Leskovec, J. Predicting transcriptional outcomes of novel multigene perturbations with GEARS. Nat. Biotechnol. 42, 927–935 (2023).Tarashansky, A. J. et al. Mapping single-cell atlases throughout Metazoa unravels cell type evolution. eLife 10, e66747 (2021).Article
CAS
PubMed
PubMed Central

Google Scholar
Badia-I-Mompel, P. et al. Gene regulatory network inference in the era of single-cell multi-omics. Nat. Rev. Genet. 24, 739–754 (2023).Article
CAS
PubMed

Google Scholar
Claringbould, A. & Zaugg, J. B. Enhancers in disease: molecular basis and emerging treatment strategies. Trends Mol. Med. 27, 1060–1073 (2021).Article
CAS
PubMed

Google Scholar
Liu, X. et al. Pathformer: a biological pathway informed transformer for disease diagnosis and prognosis using multi-omics data. Bioinformatics 40, btae316 (2024).Article
PubMed
PubMed Central

Google Scholar
Liu, L., Li, W., Wong, K.-C., Yang, F. & Yao, J. A pre-trained large generative model for translating single-cell transcriptome to proteome. Preprint at bioRxiv https://doi.org/10.1101/2023.07.04.547619 (2023).Heumos, L. et al. Best practices for single-cell analysis across modalities. Nat. Rev. Genet. 24, 550–572 (2023).Luecken, M. D. et al. Benchmarking atlas-level data integration in single-cell genomics. Nat. Methods 19, 41–50 (2022).Article
CAS
PubMed

Google Scholar
Tyler, S. R., Guccione, E. & Schadt, E. E. Erasure of biologically meaningful signal by unsupervised scRNAseq batch-correction methods. Preprint at bioRxiv https://doi.org/10.1101/2021.11.15.468733 (2023).Rosen, Y. et al. Universal Cell Embeddings: a foundation model for cell biology. Preprint at bioRxiv https://doi.org/10.1101/2023.11.28.568918 (2023).Kingma, D. P. & Welling, M. Auto-encoding variational Bayes. In International Conference on Learning Representations (eds Bengio, Y. et al.) (2014).Lotfollahi, M. et al. Mapping single-cell data to reference atlases by transfer learning. Nat. Biotechnol. 40, 121–130 (2022).Article
CAS
PubMed

Google Scholar
Ji, Y., Lotfollahi, M., Wolf, F. A. & Theis, F. J. Machine learning for perturbational single-cell omics. Cell Syst. 12, 522–537 (2021).Article
CAS
PubMed

Google Scholar
Lotfollahi, M. et al. Predicting cellular responses to complex perturbations in high-throughput screens. Mol. Syst. Biol. 19, e11517 (2023).Article
CAS
PubMed
PubMed Central

Google Scholar
Köhler, N. D., Büttner, M., Andriamanga, N. & Theis, F. J. Deep learning does not outperform classical machine learning for cell-type annotation. Preprint at bioRxiv https://doi.org/10.1101/653907 (2021).Domínguez Conde, C. et al. Cross-tissue immune cell analysis reveals tissue-specific features in humans. Science 376, eabl5197 (2022).Article
PubMed
PubMed Central

Google Scholar
Fischer, F., Fischer, D. S., Biederstedt, E., Villani, A.-C. & Theis, F. J. Scaling cross-tissue single-cell annotation models. Preprint at bioRxiv https://doi.org/10.1101/2023.10.07.561331 (2023).Schaar, A. C. et al. Nicheformer: a foundation model for single-cell and spatial omics. Preprint at bioRxiv https://doi.org/10.1101/2024.04.15.589472 (2024). This work proposed the first single-cell transformer pretrained on a large-scale spatial dataset of over 53 million spatially resolved cells.Marx, V. Method of the year: spatially resolved transcriptomics. Nat. Methods 18, 9–14 (2021).Article
CAS
PubMed

Google Scholar
Moses, L. & Pachter, L. Museum of spatial transcriptomics. Nat. Methods 19, 534–546 (2022).Article
CAS
PubMed

Google Scholar
OpenAI. GPT-4 technical report. Preprint at https://doi.org/10.48550/arXiv.2303.08774 (2023).Bubeck, S. et al. Sparks of artificial general intelligence: early experiments with GPT-4. Preprint at https://doi.org/10.48550/arXiv.2303.12712 (2023).Hou, W. & Ji, Z. Assessing GPT-4 for cell type annotation in single-cell RNA-seq analysis. Nat. Methods https://doi.org/10.1038/s41592-024-02235-4 (2024).Schaefer, M. et al. GPT-4 as a biomedical simulator. Comput. Biol. Med. 178, 108796 (2024).Article
CAS
PubMed

Google Scholar
Chen, Y. T. & Zou, J. GenePT: a simple but effective foundation model for genes and cells built from ChatGPT. Preprint at bioRxiv https://doi.org/10.1101/2023.10.16.562533 (2024).Tu, T. et al. Towards generalist biomedical AI. NEJM AI https://doi.org/10.1056/AIoa2300138 (2024).Mao, H. et al. Position: graph foundation models are already here. In International Conference on Machine Learning (eds Salakhutdinov, R. et al.) (2024).Liu, J. et al. Towards graph foundation models: a survey and beyond. Preprint at https://doi.org/10.48550/arXiv.2310.11829 (2023).Hetzel, L., Fischer, D. S., Günnemann, S. & Theis, F. J. Graph representation learning for single-cell biology. Curr. Opin. Syst. Biol. 28, 100347 (2021).Article
CAS

Google Scholar
Fischer, D. S., Schaar, A. C. & Theis, F. J. Modeling intercellular communication in tissues using spatial graphs of cells. Nat. Biotechnol. 41, 332–336 (2022).Article
PubMed
PubMed Central

Google Scholar
Chowdhery, A. et al. PaLM: scaling language modeling with pathways. J. Mach. Learn. Res. 24, 240:1–240:113 (2022).
Google Scholar
Kedzierska, K. Z., Crawford, L., Amini, A. P. & Lu, A. X. Assessing the limits of zero-shot foundation models in single-cell biology. Preprint at bioRxiv https://doi.org/10.1101/2023.10.16.561085 (2023).Khan, S. A. et al. Reusability report: learning the transcriptional grammar in single-cell RNA-sequencing data using transformers. Nat. Mach. Intel. 5, 1437–1446 (2023).Alsabbagh, A. R. et al. Foundation models meet imbalanced single-cell data when learning cell type annotations. Preprint at bioRxiv https://doi.org/10.1101/2023.10.24.563625 (2023).Chen, T. & Guestrin, C. XGBoost: a scalable tree boosting system. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (eds Krishnapuram, B et al.) 785–794 (2016).Pearson, K. LIII. On lines and planes of closest fit to systems of points in space. Lond. Edinb. Dublin Philos. Mag. J. Sci. 2, 559–572 (1901).Article

Google Scholar
Wang, A. et al. GLUE: a multi-task benchmark and analysis platform for natural language understanding. In International Conference on Learning Representations (eds Sainath, T. et al.) (2019).Luecken, M. D. et al. Defining and benchmarking open problems in single-cell analysis. Preprint at Research Square https://doi.org/10.21203/rs.3.rs-4181617/v1 (2024).Regev, A. et al. The Human Cell Atlas. eLife 6, e27041 (2017).Article
PubMed
PubMed Central

Google Scholar
Börner, K. et al. Anatomical structures, cell types and biomarkers of the Human Reference Atlas. Nat. Cell Biol. 23, 1117–1128 (2021).Article
PubMed
PubMed Central

Google Scholar
Tabula Muris Consortium. A single-cell transcriptomic atlas characterizes ageing tissues in the mouse. Nature 583, 590–595 (2020).Article

Google Scholar
Cao, J. et al. A human cell atlas of fetal gene expression. Science 370, eaba7721 (2020).Article
CAS
PubMed
PubMed Central

Google Scholar
Sikkema, L. et al. An integrated cell atlas of the lung in health and disease. Nat. Med. 29, 1563–1577 (2023).Article
CAS
PubMed
PubMed Central

Google Scholar
Vandereyken, K., Sifrim, A., Thienpont, B. & Voet, T. Methods and applications for single-cell and spatial multi-omics. Nat. Rev. Genet. 24, 494–515 (2023).Article
CAS
PubMed

Google Scholar
Moor, M. et al. Foundation models for generalist medical artificial intelligence. Nature 616, 259–265 (2023).Article
CAS
PubMed

Google Scholar
Zhou, Y. et al. A foundation model for generalizable disease detection from retinal images. Nature 622, 156–163 (2023).Article
CAS
PubMed
PubMed Central

Google Scholar
Huang, Z., Bianchi, F., Yuksekgonul, M., Montine, T. J. & Zou, J. A visual-language foundation model for pathology image analysis using medical Twitter. Nat. Med. 29, 2307–2316 (2023).Article
CAS
PubMed

Google Scholar
Guo, L. L. et al. EHR foundation models improve robustness in the presence of temporal distribution shift. Sci. Rep. 13, 3767 (2023).Article
CAS
PubMed
PubMed Central

Google Scholar
Fleck, J. S., Camp, J. G. & Treutlein, B. What is a cell type? Science 381, 733–734 (2023).Article
CAS
PubMed

Google Scholar
Polychronidou, M. et al. Single-cell biology: what does the future hold? Mol. Syst. Biol. 19, e11799 (2023).Article
PubMed
PubMed Central

Google Scholar
Zhao, S., Zhang, J. & Nie, Z. Large-scale cell representation learning via divide-and-conquer contrastive learning. Preprint at https://doi.org/10.48550/arXiv.2306.04371 (2023).Xiong, L., Chen, T. & Kellis, M. scCLIP: multi-modal single-cell contrastive learning integration pre-training. In NeurIPS 2023 AI for Science Workshop (eds Welling, M. et al.) (2023).Bahdanau, D., Cho, K. & Bengio, Y. Neural machine translation by jointly learning to align and translate. In International Conference on Learning Representations (eds Bengio, Y. et al.) (2015).He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (eds Bajcsy, R. et al.) 770–778 (2016).Ba, J. L., Kiros, J. R. & Hinton, G. E. Layer normalization. In NIPS 2016 Deep Learning Symposium (eds Fitzgibbon, A. et al.) (2016).

Transformers in single-cell omics: a review and new perspectives

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis

Chemistry wordoku #062 | Puzzle

Multi-output prediction of dose–response curves enables drug repositioning and biomarker discovery

Hot Topics

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis

Related Articles

Balancing Act: Pregnancy and Bipolar Disorder

Cohesion at the cellular level: flexible yet stable

Gut bacteria influence responses to immunotherapy in patients with asbestos related cancer

Quick Links

Must Read

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis

Chemistry wordoku #062 | Puzzle

Popular Articles

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis