Zemla, A., Venclovas, C., Fidelis, K. & Rost, B. A modified definition of Sov, a segment-based measure for protein secondary structure prediction assessment. Proteins 34, 220–223. https://doi.org/10.1002/(sici)1097-0134(19990201)34:2%3c220::aid-prot7%3e3.0.co;2-k (1999).Article
CAS
PubMed
Google Scholar
Rost, B. & Sander, C. Jury returns on structure prediction. Nature 360, 540–540. https://doi.org/10.1038/360540b0 (1992).Article
ADS
CAS
PubMed
Google Scholar
Rost, B. & Sander, C. Prediction of protein secondary structure at better than 70% accuracy. J. Mol. Biol. 232, 584–599 (1993).Article
CAS
PubMed
Google Scholar
Rost, B. PHD: Predicting one-dimensional protein structure by profile based neural networks. Methods Enzymol. 266, 525–539 (1996).Article
CAS
PubMed
Google Scholar
Jones, D. T. Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol. 292, 195–202 (1999).Article
CAS
PubMed
Google Scholar
Rost, B. & Sander, C. Combining evolutionary information and neural networks to predict protein secondary structure. Proteins Struct. Funct. Genet. 19, 55–72 (1994).Article
CAS
PubMed
Google Scholar
Liu, J. & Rost, B. NORSp: Predictions of long regions without regular secondary structure. Nucleic Acids Res. 31, 3833–3835 (2003).Article
CAS
PubMed
PubMed Central
Google Scholar
Radivojac, P. et al. Protein flexibility and intrinsic disorder. Protein Sci. 13, 71–80 (2004).Article
CAS
PubMed
PubMed Central
Google Scholar
Schlessinger, A., Liu, J. & Rost, B. Natively unstructured loops differ from other loops. PLoS Comput. Biol. 3, e140 (2007).Article
ADS
PubMed
PubMed Central
Google Scholar
Schlessinger, A. & Rost, B. Protein flexibility and rigidity predicted from sequence. Proteins Struct. Funct. Bioinform. 61, 115–126 (2005).Article
CAS
Google Scholar
Punta, M. & Rost, B. PROFcon: Novel prediction of long-range contacts. Bioinformatics 21, 2960–2968 (2005).Article
CAS
PubMed
Google Scholar
Jones, D. T., Singh, T., Kosciolek, T. & Tetchner, S. MetaPSICOV: Combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins. Bioinformatics 31, 999–1006. https://doi.org/10.1093/bioinformatics/btu791 (2015).Article
CAS
PubMed
Google Scholar
Marks, D. S. et al. Protein 3D structure computed from evolutionary sequence variation. PLoS ONE 6, e28766. https://doi.org/10.1371/journal.pone.0028766 (2011).Article
ADS
CAS
PubMed
PubMed Central
Google Scholar
Michel, M. et al. PconsFold: Improved contact predictions improve protein models. Bioinformatics 30, i482-488. https://doi.org/10.1093/bioinformatics/btu458 (2014).Article
CAS
PubMed
PubMed Central
Google Scholar
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589. https://doi.org/10.1038/s41586-021-03819-2 (2021).Article
ADS
CAS
PubMed
PubMed Central
Google Scholar
Heinzinger, M. et al. Modeling aspects of the language of life through transfer-learning protein sequences. BMC Bioinform. 20, 723. https://doi.org/10.1186/s12859-019-3220-8 (2019).Article
CAS
Google Scholar
Elnaggar, A. et al. ProtTrans: Toward understanding the language of life through self-supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 44, 7112–7127. https://doi.org/10.1109/TPAMI.2021.3095381 (2022).Article
PubMed
Google Scholar
Alley, E. C., Khimulya, G., Biswas, S., AlQuraishi, M. & Church, G. M. Unified rational protein engineering with sequence-based deep representation learning. Nat. Methods 1, 1–8 (2019).
Google Scholar
Bepler, T. & Berger, B. Learning protein sequence embeddings using information from structure. https://doi.org/10.48550/ARXIV.1902.08661 (2019).Madani, A. et al. ProGen: Language modeling for protein generation. http://arXiv.org/2004.03497, https://doi.org/10.1101/2020.03.07.982272 (2020).Rao, R. et al. Evaluating protein transfer learning with TAPE. http://arXiv.org/1906.08230 (2019).Rives, A. et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc. Natl. Acad. Sci. 118, e2016239118. https://doi.org/10.1073/pnas.2016239118 (2021).Article
CAS
PubMed
PubMed Central
Google Scholar
Bernhofer, M. & Rost, B. TMbed: Transmembrane proteins predicted through language model embeddings. BMC Bioinform. 23, 326. https://doi.org/10.1186/s12859-022-04873-x (2022).Article
CAS
Google Scholar
Littmann, M., Heinzinger, M., Dallago, C., Weissenow, K. & Rost, B. Protein embeddings and deep learning predict binding residues for various ligand classes. Sci. Rep. 11, 23916. https://doi.org/10.1038/s41598-021-03431-4 (2021).Article
ADS
CAS
PubMed
PubMed Central
Google Scholar
Marquet, C. et al. Embeddings from protein language models predict conservation and variant effects. Hum. Genet. 141, 1629–1647. https://doi.org/10.1007/s00439-021-02411-y (2022).Article
CAS
PubMed
Google Scholar
Ilzhöfer, D., Heinzinger, M. & Rost, B. SETH predicts nuances of residue disorder from protein embeddings. Front. Bioinform. 2, 1 (2022).Article
Google Scholar
Stärk, H., Dallago, C., Heinzinger, M. & Rost, B. Light attention predicts protein location from the language of life. Bioinform. Adv. 1, 035. https://doi.org/10.1093/bioadv/vbab035 (2021).Article
Google Scholar
Weissenow, K., Heinzinger, M. & Rost, B. Protein language-model embeddings for fast, accurate, and alignment-free protein structure prediction. Structure 30, 1169–1177. https://doi.org/10.1016/j.str.2022.05.001 (2022).Article
CAS
PubMed
Google Scholar
Bernhofer, M. et al. PredictProtein—Predicting protein structure and function for 29 years. Nucleic Acids Res. https://doi.org/10.1093/nar/gkab354 (2021).Article
PubMed
PubMed Central
Google Scholar
Steinegger, M. & Söding, J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 35, 1026–1028. https://doi.org/10.1038/nbt.3988 (2017).Article
CAS
PubMed
Google Scholar
Dunker, A. K. et al. What’s in a name? Why these proteins are intrinsically disordered. Intrins. Disord. Proteins 1, e24157 (2013).Article
Google Scholar
Del Conte, A. et al. CAID prediction portal: A comprehensive service for predicting intrinsic disorder and binding regions in proteins. Nucleic Acids Res. https://doi.org/10.1093/nar/gkad430 (2023).Article
PubMed
PubMed Central
Google Scholar
Liu, J., Tan, H. & Rost, B. Loopy proteins appear conserved in evolution. J. Mol. Biol. 322, 53–64 (2002).Article
CAS
PubMed
Google Scholar
Schelling, M., Hopf, T. A. & Rost, B. Evolutionary couplings and sequence variation effect predict protein binding sites. Proteins 86, 1064–1074. https://doi.org/10.1002/prot.25585 (2018).Article
CAS
PubMed
Google Scholar
Tsirigos, K. D., Peters, C., Shu, N., Käll, L. & Elofsson, A. The TOPCONS web server for consensus prediction of membrane protein topology and signal peptides. Nucleic Acids Res. 43, W401–W407. https://doi.org/10.1093/nar/gkv485 (2015).Article
CAS
PubMed
PubMed Central
Google Scholar
Hayat, S., Peters, C., Shu, N., Tsirigos, K. D. & Elofsson, A. Inclusion of dyad-repeat pattern improves topology prediction of transmembrane β-barrel proteins. Bioinformatics 32, 1571–1573. https://doi.org/10.1093/bioinformatics/btw025 (2016).Article
CAS
PubMed
Google Scholar
Hendrickson, W. A. Atomic-level analysis of membrane-protein structure. Nat. Struct. Mol. Biol. 23, 464–467. https://doi.org/10.1038/nsmb.3215 (2016).Article
CAS
PubMed
PubMed Central
Google Scholar
Newport, T. D., Sansom, M. S. P. & Stansfeld, P. J. The MemProtMD database: A resource for membrane-embedded protein structures and their lipid interactions. Nucleic Acids Res. 47, D390–D397. https://doi.org/10.1093/nar/gky1047 (2019).Article
CAS
PubMed
Google Scholar
Varga, J., Dobson, L., Reményi, I. & Tusnády, G. E. TSTMP: Target selection for structural genomics of human transmembrane proteins. Nucleic Acids Res. 45, D325–D330. https://doi.org/10.1093/nar/gkw939 (2017).Article
CAS
PubMed
Google Scholar
Teufel, F. et al. SignalP 6.0 predicts all five types of signal peptides using protein language models. Nat. Biotechnol. 40, 1023–1025. https://doi.org/10.1038/s41587-021-01156-3 (2022).Article
CAS
PubMed
PubMed Central
Google Scholar
Nallapareddy, V. et al. CATHe: Detection of remote homologues for CATH superfamilies using embeddings from protein language models. Bioinformatics 1, 029. https://doi.org/10.1093/bioinformatics/btad029 (2023).Article
CAS
Google Scholar
Bepler, T. & Berger, B. Learning the protein language: Evolution, structure, and function. Cell Syst. 12, 654–669. https://doi.org/10.1016/j.cels.2021.05.017 (2021).Article
CAS
PubMed
PubMed Central
Google Scholar
Dass, R., Mulder, F. A. A. & Nielsen, J. T. ODiNPred: Comprehensive prediction of protein order and disorder. Sci. Rep. 10, 14780. https://doi.org/10.1038/s41598-020-71716-1 (2020).Article
ADS
CAS
PubMed
PubMed Central
Google Scholar
Haas, J. et al. Continuous automated model evaluation (CAMEO) complementing the critical assessment of structure prediction in CASP12. Proteins Struct. Funct. Bioinform. 86, 387–398. https://doi.org/10.1002/prot.25431 (2018).Article
CAS
Google Scholar
Weissenow, K., Heinzinger, M., Steinegger, M. & Rost, B. Ultra-fast protein structure prediction to capture effects of sequence variation in mutation movies. BioRxiv. https://doi.org/10.1101/2022.11.14.516473 (2022).Article
Google Scholar
Notin, P. et al. Tranception: Protein fitness prediction with autoregressive transformers and inference-time retrieval. https://doi.org/10.48550/ARXIV.2205.13760 (2022).Weile, J. & Roth, F. P. Multiplexed assays of variant effects contribute to a growing genotype–phenotype atlas. Hum. Genet. 137, 665–678. https://doi.org/10.1007/s00439-018-1916-x (2018).Article
CAS
PubMed
PubMed Central
Google Scholar
Fowler, D. M. & Fields, S. Deep mutational scanning: A new style of protein science. Nat. Methods 11, 801–807. https://doi.org/10.1038/nmeth.3027 (2014).Article
CAS
PubMed
PubMed Central
Google Scholar
Ashburner, M. et al. Gene Ontology: Tool for the unification of biology. Nat. Genet. 25, 25–29. https://doi.org/10.1038/75556 (2000).Article
CAS
PubMed
PubMed Central
Google Scholar
Zhou, N. et al. The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens. Genome Biol. 20, 244. https://doi.org/10.1186/s13059-019-1835-8 (2019).Article
CAS
PubMed
PubMed Central
Google Scholar
Rojano, E. et al. Assigning protein function from domain-function associations using DomFun. BMC Bioinform. 23, 43. https://doi.org/10.1186/s12859-022-04565-6 (2022).Article
CAS
Google Scholar
Littmann, M., Heinzinger, M., Dallago, C., Olenyi, T. & Rost, B. Embeddings from deep learning transfer GO annotations beyond homology. Sci. Rep. 11, 1160. https://doi.org/10.1038/s41598-020-80786-0 (2021).Article
CAS
PubMed
PubMed Central
Google Scholar
You, R. et al. GOLabeler: Improving sequence-based large-scale protein function prediction by learning to rank. Bioinformatics 34, 2465–2473. https://doi.org/10.1093/bioinformatics/bty130 (2018).Article
CAS
PubMed
Google Scholar
Abriata, L. A., Tamò, G. E., Monastyrskyy, B., Kryshtafovych, A. & DalPeraro, M. Assessment of hard target modeling in CASP12 reveals an emerging role of alignment-based contact prediction methods. Proteins Struct. Funct. Bioinform. 86, 97–112. https://doi.org/10.1002/prot.25423 (2018).Article
CAS
Google Scholar
Klausen, M. S. et al. NetSurfP-2.0: Improved prediction of protein structural features by integrated deep learning. Proteins Struct. Funct. Bioinform. 87, 520–527. https://doi.org/10.1002/prot.25674 (2019).Article
CAS
Google Scholar
Elnaggar, A. et al. Ankh: Optimized modelling protein language model unlocks general-purpose (2023).Kryshtafovych, A., Schwede, T., Topf, M., Fidelis, K. & Moult, J. Critical assessment of methods of protein structure prediction (CASP)—Round XIV. Proteins Struct. Funct. Bioinform. 89, 1607–1617. https://doi.org/10.1002/prot.26237 (2021).Article
CAS
Google Scholar
Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130. https://doi.org/10.1126/science.ade2574 (2023).Article
ADS
MathSciNet
CAS
PubMed
Google Scholar
Almagro Armenteros, J. J. et al. SignalP 5.0 improves signal peptide predictions using deep neural networks. Nat. Biotechnol. 37, 420–423. https://doi.org/10.1038/s41587-019-0036-z (2019).Article
CAS
PubMed
Google Scholar
Riesselman, A. J., Ingraham, J. B. & Marks, D. S. Deep generative models of genetic variation capture the effects of mutations. Nat. Methods 15, 816–822. https://doi.org/10.1038/s41592-018-0138-4 (2018).Article
CAS
PubMed
PubMed Central
Google Scholar
Laine, E., Karami, Y. & Carbone, A. GEMME: A simple and fast global epistatic model predicting mutational effects. Mol. Biol. Evol. 36, 2604–2619. https://doi.org/10.1093/molbev/msz179 (2019).Article
CAS
PubMed
PubMed Central
Google Scholar
Meier, J. et al. Language models enable zero-shot prediction of the effects of mutations on protein function. BioRxiv. https://doi.org/10.1101/2021.07.09.450648 (2021).Article
PubMed
PubMed Central
Google Scholar
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410. https://doi.org/10.1016/s0022-2836(05)80360-2 (1990).Article
CAS
PubMed
Google Scholar
Fox, N. K., Brenner, S. E. & Chandonia, J.-M. SCOPe: Structural classification of proteins—Extended, integrating SCOP and ASTRAL data and classification of new structures. Nucleic Acids Res. 42, D304–D309. https://doi.org/10.1093/nar/gkt1240 (2014).Article
CAS
PubMed
Google Scholar
Zhang, Y. & Skolnick, J. TM-align: A protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 33, 2302–2309. https://doi.org/10.1093/nar/gki524 (2005).Article
CAS
PubMed
PubMed Central
Google Scholar
Almagro Armenteros, J. J., Sønderby, C. K., Sønderby, S. K., Nielsen, H. & Winther, O. DeepLoc: Prediction of protein subcellular localization using deep learning. Bioinformatics 33, 3387–3395. https://doi.org/10.1093/bioinformatics/btx431 (2017).Article
CAS
PubMed
Google Scholar
Xia, Y., Huang, E. S., Levitt, M. & Samudrala, R. Ab initio construction of protein tertiary structures using a hierarchical approach. J. Mol. Biol. 300, 171–185 (2000).Article
CAS
PubMed
Google Scholar
Lin, Z. et al. Language models of protein sequences at the scale of evolution enable accurate structure prediction. BioRxiv. https://doi.org/10.1101/2022.07.20.500902 (2022).Article
PubMed
PubMed Central
Google Scholar
Steinegger, M., Mirdita, M. & Soding, J. Protein-level assembly increases protein sequence recovery from metagenomic samples manyfold. Nat. Methods 16, 603–606. https://doi.org/10.1038/s41592-019-0437-4 (2019).Article
CAS
PubMed
Google Scholar
Devos, D. & Valencia, A. Practical limits of function prediction. Proteins Struct. Funct. Bioinform. 41, 98–107. https://doi.org/10.1002/1097-0134(20001001)41:1%3c98::AID-PROT120%3e3.0.CO;2-S (2000).Article
CAS
Google Scholar
Rost, B. Twilight zone of protein sequence alignments. Protein Eng. Des. Sel. 12, 85–94. https://doi.org/10.1093/protein/12.2.85 (1999).Article
CAS
Google Scholar
Peters, M. E. et al. Deep contextualized word representations. http://arXiv.org/1802.05365 (2018).Devlin, J., Chang, M., Lee, K. & Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding (2019).Raffel, C. et al. Exploring the limits of transfer learning with a unified text-to-text transformer. https://doi.org/10.48550/ARXIV.1910.10683 (2020).Vaswani, A. et al. Proc. 31st International Conference on Neural Information Processing Systems 6000–6010 (Curran Associates Inc., Long Beach, 2017).Nielsen, J. T. & Mulder, F. A. A. There is diversity in disorder—“In all chaos there is a cosmos, in all disorder a secret order”. Front. Mol. Biosci. 3, 4. https://doi.org/10.3389/fmolb.2016.00004 (2016).Article
PubMed
PubMed Central
Google Scholar
Lange, J., Wyrwicz, L. S. & Vriend, G. KMAD: Knowledge-based multiple sequence alignment for intrinsically disordered proteins. Bioinformatics 32, 932–936. https://doi.org/10.1093/bioinformatics/btv663 (2016).Article
CAS
PubMed
Google Scholar
Radivojac, P., Obradovic, Z., Brown, C. J. & Dunker, A. K. Improving sequence alignments for intrinsically disordered proteins. Pac. Symp. Biocomput. 1, 589–600 (2002).
Google Scholar
Riley, A. C., Ashlock, D. A. & Graether, S. P. The difficulty of aligning intrinsically disordered protein sequences as assessed by conservation and phylogeny. PLoS ONE 18, e0288388. https://doi.org/10.1371/journal.pone.0288388 (2023).Article
CAS
PubMed
PubMed Central
Google Scholar
Brown, C. J. et al. Evolutionary rate heterogeneity in proteins with long disordered regions. J. Mol. Evol. 55, 104–110. https://doi.org/10.1007/s00239-001-2309-6 (2002).Article
ADS
CAS
PubMed
Google Scholar
Huang, H. & Sarai, A. Analysis of the relationships between evolvability, thermodynamics, and the functions of intrinsically disordered proteins/regions. Comput. Biol. Chem. 41, 51–57. https://doi.org/10.1016/j.compbiolchem.2012.10.001 (2012).Article
CAS
PubMed
Google Scholar
Ahnert, S. E., Marsh, J. A., Hernandez, H., Robinson, C. V. & Teichmann, S. A. Principles of assembly reveal a periodic table of protein complexes. Science 350, 2245. https://doi.org/10.1126/science.aaa2245 (2015).Article
CAS
Google Scholar
Ponting, C. P. & Russell, R. R. The natural history of protein domains. Annu. Rev. Biophys. Biomol. Struct. 31, 45–71. https://doi.org/10.1146/annurev.biophys.31.082901.134314 (2002).Article
CAS
PubMed
Google Scholar
Rey, F. A. One protein, many functions. Nature 468, 773–775. https://doi.org/10.1038/468773a (2010).Article
ADS
CAS
PubMed
Google Scholar
Wells, J., Hawkins-Hooker, A., Bordin, N., Paige, B. & Orengo, C. Chainsaw: Protein domain segmentation with fully convolutional neural networks. BioRxiv. https://doi.org/10.1101/2023.07.19.549732 (2023).Article
PubMed
PubMed Central
Google Scholar
Schütze, K., Heinzinger, M., Steinegger, M. & Rost, B. Nearest neighbor search on embeddings rapidly identifies distant protein relations. Front. Bioinform. https://doi.org/10.3389/fbinf.2022.1033775 (2022).Article
PubMed
PubMed Central
Google Scholar
Berman, H. M. et al. The protein data bank. Nucleic Acids Res. 28, 235–242. https://doi.org/10.1093/nar/28.1.235 (2000).Article
ADS
CAS
PubMed
PubMed Central
Google Scholar
Joosten, R. P., Long, F., Murshudov, G. N. & Perrakis, A. ThePDB_REDOserver for macromolecular structure model optimization. IUCrJ 1, 213–220. https://doi.org/10.1107/s2052252514009324 (2014).Article
CAS
PubMed
PubMed Central
Google Scholar
Sillitoe, I. et al. CATH: Increased structural coverage of functional space. Nucleic Acids Res. 49, D266–D273. https://doi.org/10.1093/nar/gkaa1079 (2021).Article
CAS
PubMed
Google Scholar
Sander, C. & Schneider, R. Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins 9, 56–68. https://doi.org/10.1002/prot.340090107 (1991).Article
CAS
PubMed
Google Scholar
Mika, S. UniqueProt: Creating representative protein sequence sets. Nucleic Acids Res. 31, 3789–3791. https://doi.org/10.1093/nar/gkg620 (2003).Article
CAS
PubMed
PubMed Central
Google Scholar
Kabsch, W. & Sander, C. Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22, 2577–2637. https://doi.org/10.1002/bip.360221211 (1983).Article
CAS
PubMed
Google Scholar
Howard, M. J. Protein NMR spectroscopy. Curr. Biol. 8, R331–R333. https://doi.org/10.1016/S0960-9822(98)70214-3 (1998).Article
CAS
PubMed
Google Scholar
Nielsen, J. T. & Mulder, F. A. A. In Intrinsically Disordered Proteins: Methods and Protocols (eds Kragelund, B. B. & Skriver, K.) 303–317 (Springer, 2020).Chapter
Google Scholar
Suzek, B. E. et al. UniRef clusters: A comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics 31, 926–932. https://doi.org/10.1093/bioinformatics/btu739 (2015).Article
CAS
PubMed
Google Scholar
Ben Chorin, A. et al. ConSurf-DB: An accessible repository for the evolutionary conservation patterns of the majority of PDB proteins. Protein Sci. 29, 258–267. https://doi.org/10.1002/pro.3779 (2020).Article
CAS
PubMed
Google Scholar
Katoh, K., Rozewicki, J. & Yamada, K. D. MAFFT online service: Multiple sequence alignment, interactive sequence choice and visualization. Brief. Bioinform. 20, 1160–1166. https://doi.org/10.1093/bib/bbx108 (2019).Article
CAS
PubMed
Google Scholar
Fukushima, K. Cognitron: A self-organizing multilayered neural network. Biol. Cybern. 20, 121–136. https://doi.org/10.1007/BF00342633 (1975).Article
CAS
PubMed
Google Scholar