Pre-training with fractional denoising to enhance molecular property prediction

Butler, K. T., Davies, D. W., Cartwright, H., Isayev, O. & Walsh, A. Machine learning for molecular and materials science. Nature 559, 547–555 (2018).Article 

Google Scholar 
Wong, F. et al. Discovery of a structural class of antibiotics with explainable deep learning. Nature 626, 177–185 (2023).Li, J. et al. AI applications through the whole life cycle of material discovery. Matter 3, 393–432 (2020).Article 

Google Scholar 
Deng, J. et al. A systematic study of key elements underlying molecular property prediction. Nat. Commun. 14, 6395 (2023).Article 

Google Scholar 
Stokes, J. M. et al. A deep learning approach to antibiotic discovery. Cell 180, 688–702 (2020).Article 

Google Scholar 
Dowden, H. & Munro, J. Trends in clinical success rates and therapeutic focus. Nat. Rev. Drug Discov. 18, 495–496 (2019).Article 

Google Scholar 
Galson, S. et al. The failure to fail smartly. Nat. Rev. Drug Discov. 20, 259–260 (2021).Article 

Google Scholar 
Pyzer-Knapp, E. O. et al. Accelerating materials discovery using artificial intelligence, high performance computing and robotics. npj Comput. Mater. 8, 84 (2022).Article 

Google Scholar 
Schneider, G. Automating drug discovery. Nat. Rev. Drug Discov. 17, 97–113 (2018).Article 

Google Scholar 
Chen, T., Kornblith, S., Norouzi, M. & Hinton, G. A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning 1597–1607 (PMLR, 2020).He, K. et al. Masked autoencoders are scalable vision learners. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 16000–16009 (IEEE, 2022).Dai, A. M. & Le, Q. V. Semi-supervised sequence learning. Adv. Neural Inf. Process. Syst. 28, 3079–3087 (2015).
Google Scholar 
Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. In Proc. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) 4171–4186 (Association for Computational Linguistics, 2019).Wang, Y., Wang, J., Cao, Z. & Barati Farimani, A. Molecular contrastive learning of representations via graph neural networks. Nat. Mach. Intell. 4, 279–287 (2022).Article 

Google Scholar 
Moon, K., Im, H.-J. & Kwon, S. 3D graph contrastive learning for molecular property prediction. Bioinformatics 39, 371 (2023).Article 

Google Scholar 
Fang, Y. et al. Knowledge graph-enhanced molecular contrastive learning with functional prompt. Nat. Mach. Intell. 5, 542–553 (2023).Stärk, H. et al. 3D Infomax improves GNNs for molecular property prediction. In International Conference on Machine Learning 20479–20502 (PMLR, 2022).Liu, S. et al. Pre-training molecular graph representation with 3D geometry. In International Conference on Learning Representations Workshop on Geometrical and Topological Representation Learning https://openreview.net/pdf?id=xQUe1pOKPam (ICLR, 2022).Li, S., Zhou, J., Xu, T., Dou, D. & Xiong, H. GeomGCL: geometric graph contrastive learning for molecular property prediction. In Proc. AAAI Conference on Artificial Intelligence 4541–4549 (PKP Publishing Services, 2022).Zeng, X. et al. Accurate prediction of molecular properties and drug targets using a self-supervised image representation learning framework. Nat. Mach. Intell. 4, 1004–1016 (2022).Article 

Google Scholar 
Zhang, X.-C. et al. MG-BERT: leveraging unsupervised atomic representation learning for molecular property prediction. Brief. Bioinf. 22, bbab152 (2021).Article 

Google Scholar 
Ross, J. et al. Large-scale chemical language representations capture molecular structure and properties. Nat. Mach. Intell. 4, 1256–1264 (2022).Article 

Google Scholar 
Xia, J. et al. Mole-BERT: rethinking pre-training graph neural networks for molecules. In The Eleventh International Conference on Learning Representations https://openreview.net/pdf/21b1918178090348ffb159460ee696cfe8360dd2.pdf (ICLR, 2023).Rong, Y. et al. Self-supervised graph transformer on large-scale molecular data. Adv. Neural Inf. Process. Syst. 33, 12559–12571 (2020).
Google Scholar 
Fang, X. et al. Geometry-enhanced molecular representation learning for property prediction. Nat. Mach. Intell. 4, 127–134 (2022).Article 

Google Scholar 
Zhou, G. et al. Uni-Mol: a universal 3D molecular representation learning framework. In The Eleventh International Conference on Learning Representations https://openreview.net/pdf?id=IfFZr1gl0b (ICLR, 2023).Wang, H. et al. Scientific discovery in the age of artificial intelligence. Nature 620, 47–60 (2023).Article 

Google Scholar 
Zaidi, S. et al. Pre-training via denoising for molecular property prediction. In International Conference on Learning Representations https://openreview.net/pdf?id=tYIMtogyee (ICLR, 2023).Luo, S. et al. One transformer can understand both 2D & 3D molecular data. In The Eleventh International Conference on Learning Representations https://openreview.net/pdf?id=vZTp1oPV3PC (ICLR, 2023).Liu, S., Guo, H. & Tang, J. Molecular geometry pretraining with SE(3)-invariant denoising distance matching. In The Eleventh International Conference on Learning Representations https://openreview.net/pdf?id=CjTHVo1dvR (ICLR, 2023).Jiao, R., Han, J., Huang, W., Rong, Y. & Liu, Y. Energy-motivated equivariant pretraining for 3D molecular graphs. Proc. of the AAAI Conference on Artificial Intelligence 37, 8096–8104 (2023).Article 

Google Scholar 
Feng, R. et al. May the force be with you: unified force-centric pre-training for 3D molecular conformations. Adv. Neural Inf. Process. Syst. 36, 72750–72760 (2023).Thölke, P. & Fabritiis, G.D. Equivariant transformers for neural network based molecular potentials. In International Conference on Learning Representations https://openreview.net/pdf?id=zNHzqZ9wrRB (ICLR, 2022).Boltzmann, L. Studien uber das gleichgewicht der lebenden kraft. Wissen. Abh. 1, 49–96 (1868).
Google Scholar 
Chmiela, S. et al. Machine learning of accurate energy-conserving molecular force fields. Sci. Adv. 3, 1603015 (2017).Article 

Google Scholar 
Schütt, K. et al. SchNet: a continuous-filter convolutional neural network for modeling quantum interactions. Adv. Neural Inf. Process. Syst. 30, 992–1002 (Curran Associates, 2017).Chmiela, S. et al. Accurate global machine learning force fields for molecules with hundreds of atoms. Sci. Adv. 9, 0873 (2023).Article 

Google Scholar 
Chmiela, S., Sauceda, H. E., Müller, K.-R. & Tkatchenko, A. Towards exact molecular dynamics simulations with machine-learned force fields. Nat. Commun. 9, 3887 (2018).Article 

Google Scholar 
Wang, Y., Xu, C., Li, Z. & Barati Farimani, A. Denoise pretraining on nonequilibrium molecules for accurate and transferable neural potentials. J. Chem. Theory Comput. 19, 5077–5087 (2023).Article 

Google Scholar 
Smith, J. S., Isayev, O. & Roitberg, A. E. ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost. Chem. Sci. 8, 3192–3203 (2017).Article 

Google Scholar 
Smith, J. S. et al. The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules. Sci. Data 7, 134 (2020).Article 

Google Scholar 
Ruddigkeit, L., Van Deursen, R., Blum, L. C. & Reymond, J.-L. Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17. J. Chem. Inf. Model. 52, 2864–2875 (2012).Article 

Google Scholar 
Ramakrishnan, R., Dral, P. O., Rupp, M. & Von Lilienfeld, O. A. Quantum chemistry structures and properties of 134 kilo molecules. Sci. Data 1, 140022 (2014).Article 

Google Scholar 
Townshend, R. et al. ATOM3D: tasks on molecules in three dimensions. In Proc. Neural Information Processing Systems Track on Datasets and Benchmarks (2021).Landrum, G. et al. RDKit: a software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum (2013).Chmiela, S., Sauceda, H. E., Poltavsky, I., Müller, K.-R. & Tkatchenko, A. sGDML: constructing accurate and data efficient molecular force fields using machine learning. Comput. Phys. Commun. 240, 38–45 (2019).Article 

Google Scholar 
Nakata, M. & Shimazaki, T. PubChemQC project: a large-scale first-principles electronic structure database for data-driven chemistry. J. Chem. Inf. Model. 57, 1300–1308 (2017).Article 

Google Scholar 
Schütt, K. T., Sauceda, H. E., Kindermans, P.-J., Tkatchenko, A. & Müller, K.-R. SchNet—a deep learning architecture for molecules and materials. J. Chem. Phys. 148, 241722 (2018).Article 

Google Scholar 
Satorras, V. G., Hoogeboom, E. & Welling, M. E(n) equivariant graph neural networks. In International Conference on Machine Learning 9323–9332 (PMLR, 2021).Gasteiger, J., Groß, J. & Günnemann, S. Directional message passing for molecular graphs. In International Conference on Learning Representations https://openreview.net/pdf?id=B1eWbxStPH (ICLR, 2020).Gasteiger, J., Giri, S., Margraf, J.T. & Günnemann, S. Fast and uncertainty-aware directional message passing for non-equilibrium molecules. In Machine Learning for Molecules Workshop, NeurIPS (2020).Liu, Y. et al. Spherical message passing for 3D molecular graphs. In International Conference on Learning Representations https://openreview.net/pdf?id=givsRXsOt9r (ICLR, 2022).Schütt, K., Unke, O. & Gastegger, M. Equivariant message passing for the prediction of tensorial properties and molecular spectra. In International Conference on Machine Learning 9377–9388 (PMLR, 2021).Öztürk, H., Özgür, A. & Ozkirimli, E. DeepDTA: deep drug–target binding affinity prediction. Bioinformatics 34, 821–829 (2018).Article 

Google Scholar 
Rao, R. et al. Evaluating protein transfer learning with tape. Adv. Neural Inf. Process. Syst. 32, 9689–9701 (2019).
Google Scholar 
Elnaggar, A. et al. ProtTrans: toward understanding the language of life through self-supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 44, 7112–7127 (2021).Article 

Google Scholar 
Somnath, V. R., Bunne, C. & Krause, A. Multi-scale representation learning on proteins. Adv. Neural Inf. Process. Syst. 34, 25244–25255 (2021).
Google Scholar 
Wang, L., Liu, H., Liu, Y., Kurtin, J. & Ji, S. Learning hierarchical protein representations via complete 3D graph networks. In The Eleventh International Conference on Learning Representations https://openreview.net/forum?id=9X-hgLDLYkQ (ICLR, 2022).Feng, S. MOL_LMDB. figshare https://doi.org/10.6084/m9.figshare.24961485.v1 (2024).Ramakrishnan, R., Dral, P., Rupp, M. & Anatole von Lilienfeld, O. Quantum chemistry structures and properties of 134 kilo molecules. figshare https://doi.org/10.6084/m9.figshare.c.978904.v5 (2014).Townshend, R. J. L. ATOM3D: ligand binding affinity (LBA) dataset. Zenodo https://doi.org/10.5281/zenodo.4914718 (2021).Ni, Y. Source data for figures in ‘Pre-training with fractional denoising to enhance molecular property prediction’. figshare https://doi.org/10.6084/m9.figshare.25902679.v1 (2024).Feng, S. Pre-training with fractional denoising to enhance molecular property prediction. Zenodo https://doi.org/10.5281/zenodo.12697467 (2024).

Hot Topics

Related Articles