Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Bell, E. L. et al. Biocatalysis. Nat. Rev. Methods Primers 1, 46 (2021).Mesbahuddin, M. S., Ganesan, A. & Kalyaanamoorthy, S. Engineering stable carbonic anhydrases for CO2 capture: a critical review. Protein Eng. Des. Sel. 34, gzab021 (2021).Article 

Google Scholar 
Stourac, J. et al. FireProtDB: database of manually curated protein stability data. Nucleic Acids Res. 49, D319–D324 (2020).Article 

Google Scholar 
Arnold, F. H. Design by directed evolution. Acc. Chem. Res. 31, 125–131 (1998).Article 

Google Scholar 
Yang, K. K., Wu, Z. & Arnold, F. H. Machine-learning-guided directed evolution for protein engineering. Nat. Methods 16, 687–694 (2019).Article 

Google Scholar 
Wu, Z., Kan, S. B. J., Lewis, R. D., Wittmann, B. J. & Arnold, F. H. Machine learning-assisted directed protein evolution with combinatorial libraries. Proc. Natl Acad. Sci. USA 116, 8852–8858 (2019).Article 

Google Scholar 
Wittmann, B. J., Johnston, K. E., Wu, Z. & Arnold, F. H. Advances in machine learning for directed evolution. Curr. Opin. Struct. Biol. 69, 11–18 (2021).Article 

Google Scholar 
Yang, Y. et al. ProTstab—predictor for cellular protein stability. BMC Genomics 20, 804 (2019).Article 

Google Scholar 
Jung, F., Frey, K., Zimmer, D. & Mühlhaus, T. DeepSTABp: a deep learning approach for the prediction of thermal protein stability. Int. J. Mol. Sci. 24, 7444 (2023).Article 

Google Scholar 
Tsuboyama, K. et al. Mega-scale experimental analysis of protein folding stability in biology and design. Nature 620, 434–444 (2023).Article 

Google Scholar 
Broom, A., Trainor, K., Jacobi, Z. & Meiering, E. M. Computational modeling of protein stability: quantitative analysis reveals solutions to pervasive problems. Structure 28, 717–726.e3 (2020).Article 

Google Scholar 
Broom, A., Jacobi, Z., Trainor, K. & Meiering, E. M. Computational tools help improve protein stability but with a solubility tradeoff. J. Biol. Chem. 292, 14349–14361 (2017).Article 

Google Scholar 
Frenz, B. et al. Prediction of protein mutational free energy: benchmark and sampling improvements increase classification accuracy. Front. Bioeng. Biotechnol. 8, 55824 (2020).Article 

Google Scholar 
Hernández, I. M., Dehouck, Y., Bastolla, U., López-Blanco, J. R. & Chacón, P. Predicting protein stability changes upon mutation using a simple orientational potential. Bioinformatics 39, btad011 (2023).Article 

Google Scholar 
Fang, J. A critical review of five machine learning-based algorithms for predicting protein stability changes upon mutation. Brief. Bioinform. 21, 1285–1292 (2019).Article 

Google Scholar 
Sanavia, T. et al. Limitations and challenges in protein stability prediction upon genome variations: towards future applications in precision medicine. Comput. Struct. Biotechnol. J. 18, 1968–1979 (2020).Article 

Google Scholar 
Rigoldi, F., Donini, S., Redaelli, A., Parisini, E. & Gautieri, A. Review: Engineering of thermostable enzymes for industrial applications. APL Bioeng. 2, 011501 (2018).Article 

Google Scholar 
Alford, R. F. et al. The rosetta all-atom energy function for macromolecular modeling and design. J. Chem. Theory Comput. 13, 15499626 (2017).Article 

Google Scholar 
Diaz, D. J. et al. Stability Oracle: a structure-based graph-transformer framework for identifying stabilizing mutations. Nat. Commun. 15, 6170 (2024).Article 

Google Scholar 
Jarzab, A. et al. Meltome atlas-thermal proteome stability across the tree of life. Nat. Methods 17, 495–503 (2020).Article 

Google Scholar 
Meier, J. et al. Language models enable zero-shot prediction of the effects of mutations on protein function. Adv. Neural Inf. Process. Syst. 34, 29287–29303 (2021).
Google Scholar 
Hsu, C. et al. Learning inverse folding from millions of predicted structures. In Proc. 39th International Conference on Machine Learning 8946–8970 (PMLR, 2022).Yang, K. K., Zanichelli, N. & Yeh, H. Masked inverse folding with sequence transfer for protein representation learning. Protein Eng. Des. Sel. 36, gzad015 (2023).Article 

Google Scholar 
Dauparas, J. et al. Robust deep learning-based protein sequence design using ProteinMPNN. Science 378, 49–56 (2022).Article 

Google Scholar 
d’Oelsnitz, S. et al. Biosensor and machine learning-aided engineering of an amaryllidaceae enzyme. Nat. Commun. 15, 2084 (2024).Article 

Google Scholar 
Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).Article 
MathSciNet 

Google Scholar 
Elnaggar, A. et al. Ankh: optimized protein language model unlocks general-purpose modelling. Preprint at https://arxiv.org/abs/2301.06568 (2023).Rao, R. M. et al. MSA Transformer. In Proc. 38th International Conference on Machine Learning 8844–8856 (PMLR, 2021).Notin, P. et al. Tranception: protein fitness prediction with autoregressive transformers and inference-time retrieval. In Proc. 39th International Conference on Machine Learning 16990–17017 (PMLR, 2022).Pucci, F., Bernaerts, K. V., Kwasigroch, J. M. & Rooman, M. Quantification of biases in predictions of protein stability changes upon mutations. Bioinformatics 34, 3659–3665 (2018).Article 

Google Scholar 
Caldararu, O., Blundell, T. L. & Kepp, K. P. Three simple properties explain protein stability change upon mutation. J. Chem. Inf. Model. 61, 1981–1988 (2021).Article 

Google Scholar 
Konopka, B. M., Marciniak, M. & Dyrka, W. Quantiprot—a Python package for quantitative analysis of protein sequences. BMC Bioinform. 18, 339 (2017).Article 

Google Scholar 
Kabsch, W. & Sander, C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22, 2577–2637 (1983).Article 

Google Scholar 
Touw, W. G. et al. A series of PDB-related databanks for everyday needs. Nucleic Acids Res. 43, D364–D368 (2015).Article 

Google Scholar 
Tokuriki, N. & Tawfik, D. S. Stability effects of mutations and protein evolvability. Curr. Opin. Struct. Biol. 19, 596–604 (2009).Article 

Google Scholar 
Hopf, T. A. et al. Mutation effects predicted from sequence co-variation. Nat. Biotechnol. 35, 128–135 (2017).Article 

Google Scholar 
Fersht, A. in Structure and Mechanism in Protein Science 2nd edn 508–536 (W. H. Freeman and Company, 1999).Hsu, C., Nisonoff, H., Fannjiang, C. & Listgarten, J. Learning protein fitness models from evolutionary and assay-labeled data. Nat. Biotechnol. 40, 1114–1122 (2022).Article 

Google Scholar 
Laine, E., Karami, Y. & Carbone, A. GEMME: a simple and fast global epistatic model predicting mutational effects. Mol. Biol. Evol. 36, 2604–2619 (2019).Article 

Google Scholar 
Høie, M. H., Cagiada, M., Beck Frederiksen, A. H., Stein, A. & Lindorff-Larsen, K. Predicting and interpreting large-scale mutagenesis data using analyses of protein stability and conservation. Cell Rep. 38, 110207 (2022).Article 

Google Scholar 
Biswas, S., Khimulya, G., Alley, E. C., Esvelt, K. M. & Church, G. M. Low-N protein engineering with data-efficient deep learning. Nat. Methods 18, 389–396 (2021).Article 

Google Scholar 
Wittmann, B. J., Yue, Y. & Arnold, F. H. Informed training set design enables efficient machine learning-assisted directed protein evolution. Cell Syst. 12, 1026-1045.e7 (2021).
Google Scholar 
Kirkpatrick, J. et al. Overcoming catastrophic forgetting in neural networks. Proc. Natl Acad. Sci. USA 114, 3521–3526 (2017).Article 
MathSciNet 

Google Scholar 
Eswar, N. et al. Comparative protein structure modeling using modeller. Curr. Protoc. Bioinform. 5, 5–6 (2006).
Google Scholar 
PDBe-KB consortium PDBe-KB: collaboratively defining the biological context of structural data. Nucleic Acids Res. 50, D534–D542 (2022).Article 

Google Scholar 
Steinegger, M. et al. HH-suite3 for fast remote homology detection and deep protein annotation. BMC Bioinform. 20, 473 (2019).Article 

Google Scholar 
Quan, L., Lv, Q. & Zhang, Y. STRUM: structure-based prediction of protein stability changes upon single-point mutation. Bioinformatics 32, 2936–2946 (2016).Article 

Google Scholar 
Pancotti, C. et al. Predicting protein stability changes upon single-point mutation: a thorough comparison of the available tools on a new dataset. Brief. Bioinform. 23, bbab555 (2022).Article 

Google Scholar 
Dehouck, Y. et al. Fast and accurate predictions of protein stability changes upon mutations using statistical potentials and neural networks: PoPMuSiC-2.0. Bioinformatics 25, 2537–2543 (2009).Article 

Google Scholar 
Ye, Y. & Godzik, A. FATCAT: a web server for flexible structure comparison and structure similarity searching. Nucleic Acids Res. 32, W582–W585 (2004).Article 

Google Scholar 
Reeves, S. & Kalyaanamoorthy, S. skalyaanamoorthy/PSLMs: PSLMs for thermostability prediction full release. Zenodo https://doi.org/10.5281/zenodo.12702047 (2024).Dehouck, Y., Kwasigroch, J. M., Gilis, D. & Rooman, M. PoPMuSiC 2.1: a web server for the estimation of protein stability changes upon mutation and sequence optimality. BMC Bioinform. 12, 151 (2011).Article 

Google Scholar 

Hot Topics

Related Articles