Hybrid fragment-SMILES tokenization for ADMET prediction in drug discovery | BMC Bioinformatics

Malerba F, Orsenigo L. The evolution of the pharmaceutical industry. Bus Hist. 2015;57(5):664–87.Article 

Google Scholar 
Lu M, Yin J, Zhu Q, Lin G, Mou M, Liu F, Pan Z, You N, Lian X, Li F, et al. Artificial intelligence in pharmaceutical sciences. Engineering 2023Kumar M, Nguyen TN, Kaur J, Singh TG, Soni D, Singh R, Kumar P. Opportunities and challenges in application of artificial intelligence in pharmacology. Pharmacol Rep. 2023;1–16.Lipinski CF, Maltarollo VG, Oliveira PR, Da Silva AB, Honorio KM. Advances and perspectives in applying deep learning for drug design and discovery. Front Robot AI. 2019;6:108.Article 
PubMed 
PubMed Central 

Google Scholar 
Tran TTV, Surya Wibowo A, Tayara H, Chong KT. Artificial intelligence in drug toxicity prediction: recent advances, challenges, and future perspectives. J Chem Inf Model. 2023;63(9):2628–43.Article 
CAS 
PubMed 

Google Scholar 
Rajman I. PK/PD modelling and simulations: utility in drug development. Drug Discov Today. 2008;13(7–8):341–6.Article 
CAS 
PubMed 

Google Scholar 
Ferreira LL, Andricopulo AD. ADMET modeling approaches in drug discovery. Drug Discov Today. 2019;24(5):1157–65.Article 
CAS 
PubMed 

Google Scholar 
Weininger D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Comput Sci. 1988;28(1):31–6.Article 
CAS 

Google Scholar 
O’Boyle N, Dalke A. DeepSMILES: an adaptation of SMILES for use in machine-learning of chemical structures. ChemRxiv. 2018.Krenn M, Häse F, Nigam A, Friederich P, Aspuru-Guzik A. Self-referencing embedded strings (SELFIES): a 100% robust molecular string representation. Mach Learn: Sci Technol. 2020;1(4): 045024.
Google Scholar 
Cheng AH, Cai A, Miret S, Malkomes G, Phielipp M, Aspuru-Guzik A. Group SELFIES: a robust fragment-based molecular string representation. Digit Discov. 2023.Ståhl N, Falkman G, Karlsson A, Mathiason G, Bostrom J. Deep reinforcement learning for multiparameter optimization in de novo drug design. J Chem Inf Model. 2019;59(7):3166–76.Article 
PubMed 

Google Scholar 
Degen J, Wegscheid-Gerlach C, Zaliani A, Rarey M. On the art of compiling and using ‘drug-like’ chemical fragment spaces. ChemMedChem. 2008;3(10):1503–7.Article 
CAS 
PubMed 

Google Scholar 
Gómez-Bombarelli R, Wei JN, Duvenaud D, Hernández-Lobato JM, Sánchez-Lengeling B, Sheberla D, Aguilera-Iparraguirre J, Hirzel TD, Adams RP, Aspuru-Guzik A. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent Sci. 2018;4(2):268–76.Article 
PubMed 
PubMed Central 

Google Scholar 
Winter R, Montanari F, Noé F, Clevert D-A. Learning continuous and data-driven molecular descriptors by translating equivalent chemical representations. Chem Sci. 2019;10(6):1692–701.Article 
CAS 
PubMed 

Google Scholar 
Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. In: International Conference on Learning Representations. 2015.Luong M-T, Pham H, Manning CD. Effective approaches to attention-based neural machine translation. In: Conference on Empirical Methods in Natural Language Processing. 2015.Lin T, Wang Y, Liu X, Qiu X. A survey of transformers. AI Open. 2022;3:111–32.Article 

Google Scholar 
Fabian B, Edlich T, Gaspar H, Segler M, Meyers J, Fiscato M, Ahmed M. Molecular representation learning with language models and domain-relevant auxiliary tasks. 2020. arXiv preprint arXiv:2011.13230.Wu Z, Jiang D, Wang J, Zhang X, Du H, Pan L, Hsieh C-Y, Cao D, Hou T. Knowledge-based BERT: a method to extract molecular features like computational chemists. Brief Bioinform. 2022;23(3):131.Article 

Google Scholar 
Ahmad W, Simon E, Chithrananda S, Grand G, Ramsundar B. ChemBERTa-2: towards chemical foundation models. 2020. arXiv:2209.01712;2022.Chithrananda S, Grand G, Ramsundar B. ChemBERTa: large-scale self-supervised pretraining for molecular property prediction. 2020. arXiv preprint arXiv:2010.09885.Zhang X-C, Wu C-K, Yang Z-J, Wu Z-X, Yi J-C, Hsieh C-Y, Hou T-J, Cao D-S. MG-BERT: leveraging unsupervised atomic representation learning for molecular property prediction. Brief Bioinform. 2021;22(6):152.Article 

Google Scholar 
Wang S, Guo Y, Wang Y, Sun H, Huang J. SMILES-BERT: Large scale unsupervised pre-training for molecular property prediction. In: ACM International Conference on Bioinformatics, Computational Biology and Health Informatics 2019;429–436.Yang L, Jin C, Yang G, Bing Z, Huang L, Niu Y, Yang L. Transformer-based deep learning method for optimizing ADMET properties of lead compounds. Phys Chem Chem Phys. 2023;25:2377–85.Article 
CAS 
PubMed 

Google Scholar 
Adilov S. Generative pre-training from molecules ChemRxiv preprint. 2021. https://doi.org/10.26434/chemrxiv-2021-5fwjd.Article 

Google Scholar 
Liu Y, Zhang R, Li T, Jiang J, Ma J, Wang P. MolRoPE-BERT: an enhanced molecular representation with rotary position embedding for molecular property prediction. J Mol Graph Model. 2023;118: 108344.Article 
CAS 
PubMed 

Google Scholar 
Irwin R, Dimitriadis S, He J, Bjerrum EJ. Chemformer: a pre-trained transformer for computational chemistry. Mach Learn: Sci Technol. 2022;3(1):015022.
Google Scholar 
Méndez-Lucio O, Nicolaou C, Earnshaw B. MolE: a molecular foundation model for drug discovery. 2022. arXiv preprint arXiv:2211.02657.Torres LH, Ribeiro B, Arrais JP. Few-shot learning with transformers via graph embeddings for molecular property prediction. Expert Syst Appl. 2023;225: 120005.Article 

Google Scholar 
Jiang Y, Jin S, Jin X, Xiao X, Wu W, Liu X, Zhang Q, Zeng X, Yang G, Niu Z. Pharmacophoric-constrained heterogeneous graph transformer model for molecular property prediction. Commun Chem. 2023;6(1):60.Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 
Song Y, Chen J, Wang W, Chen G, Ma Z. Double-head transformer neural network for molecular property prediction. J Cheminform. 2023;15(1):1–16.Article 

Google Scholar 
Rong Y, Bian Y, Xu T, Xie W, Wei Y, Huang W, Huang J. Self-supervised graph transformer on large-scale molecular data. Adv Neural Inf Process Syst. 2020;33:12559–71.
Google Scholar 
Ying C, Cai T, Luo S, Zheng S, Ke G, He D, Shen Y, Liu T-Y. Do transformers really perform badly for graph representation? Adv Neural Inf Process Syst. 2021;34:28877–88.
Google Scholar 
Chen J, Zheng S, Song Y, Rao J, Yang Y. Learning attributed graph representations with communicative message passing transformer. 2021. arXiv preprint arXiv:2107.08773.Li H, Zhao D, Zeng J. KPGT: knowledge-guided pre-training of graph transformer for molecular property prediction. 2022. arXiv:2206.03364.Ross J, Belgodere B, Chenthamarakshan V, Padhi I, Mroueh Y, Das P. Large-scale chemical language representations capture molecular structure and properties. Nature Mach Intell. 2022;4(12):1256–64.Article 

Google Scholar 
Karpov P, Godin G, Tetko IV. Transformer-CNN: Swiss knife for QSAR modeling and interpretation. J Cheminform. 2020;12(1):1–12.Article 

Google Scholar 
Maziarka L, Danel T, Mucha S, Rataj K, Tabor J, Jastrzebski S. Molecule attention transformer. 2020. arXiv preprint arXiv:2002.08264.Honda S, Shi S, Ueda HR. SMILES Transformer: pre-trained molecular fingerprint for low data drug discovery. 2019. arXiv preprint arXiv:1911.04738.Zhang X-C, Wu C-K, Yi J-C, Zeng X-X, Yang C-Q, Lu A-P, Hou T-J, Cao D-S. Pushing the boundaries of molecular property prediction for drug discovery with multitask learning BERT enhanced by SMILES enumeration. Research. 2022;2022:0004.Article 
CAS 

Google Scholar 
Ke Z, Liu B, Ma N, Xu H, Shu L. Achieving forgetting prevention and knowledge transfer in continual learning. Adv Neural Inf Process Syst. 2021;34:22443–56.
Google Scholar 
Duvenaud D, Maclaurin D, Aguilera-Iparraguirre J, Bombarell R, Hirzel T, Aspuru-Guzik A, Adams RP. Convolutional networks on graphs for learning molecular fingerprints. In: Conference on Neural Information Processing Systems. 2015.Hu W, Liu B, Gomes J, Zitnik M, Liang P, Pande V, Leskovec J. Strategies for pre-training graph neural networks. 2019. arXiv preprint arXiv:1905.12265.Wieder O, Kohlbacher S, Kuenemann M, Garon A, Ducrot P, Seidel T, Langer T. A compact review of molecular property prediction with graph neural networks. Drug Discov Today Technol. 2020;37:1–12.Article 
PubMed 

Google Scholar 
Velickovic P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y. Graph attention networks. In: International Conference on Learning Representations. 2018.Gilmer J, Schoenholz SS, Riley PF, Vinyals O, Dahl GE. Neural message passing for quantum chemistry. In: ICML. 2017.Xu K, Hu W, Leskovec J, Jegelka S. How powerful are graph neural networks. In: International Conference on Learning Representations. 2019.Feinberg EN, Joshi E, Pande VS, Cheng AC. Improvement in ADMET prediction with multitask deep featurization. J Med Chem. 2020;63(16):8835–48.Article 
CAS 
PubMed 

Google Scholar 
Kearnes S, McCloskey K, Berndl M, Pande V, Riley P. Molecular graph convolutions: moving beyond fingerprints. J Comput Aided Mol Des. 2016;30:595–608.Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 
Coley CW, Barzilay R, Green WH, Jaakkola TS, Jensen KF. Convolutional embedding of attributed molecular graphs for physical property prediction. J Chem Inf Model. 2017;57(8):1757–72.Article 
CAS 
PubMed 

Google Scholar 
Montanari F, Kuhnke L, Ter Laak A, Clevert D-A. Modeling physico-chemical admet endpoints with multitask graph convolutional networks. Molecules. 2019;25(1):44.Article 
PubMed 
PubMed Central 

Google Scholar 
Xiong G, Wu Z, Yi J, Fu L, Yang Z, Hsieh C, Yin M, Zeng X, Wu C, Lu A, et al. ADMETlab 2.0: an integrated online platform for accurate and comprehensive predictions of ADMET properties. Nucl Acids Res. 2021;49(W1):5–14.Article 

Google Scholar 
Xiong Z, Wang D, Liu X, Zhong F, Wan X, Li X, Li Z, Luo X, Chen K, Jiang H, et al. Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism. J Med Chem. 2019;63(16):8749–60.Article 
PubMed 

Google Scholar 
Yu J, Wang J, Zhao H, Gao J, Kang Y, Cao D, Wang Z, Hou T. Organic compound synthetic accessibility prediction based on the graph attention mechanism. J Chem Inf Model. 2022;62(12):2973–86.Article 
CAS 
PubMed 

Google Scholar 
Yang K, Swanson K, Jin W, Coley C, Eiden P, Gao H, Guzman-Perez A, Hopper T, Kelley B, Mathea M, et al. Analyzing learned molecular representations for property prediction. J Chem Inf Model. 2019;59(8):3370–88.Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 
Li S, Zhou J, Xu T, Dou D, Xiong H. GeomGCL: geometric graph contrastive learning for molecular property prediction. In: AAAI Conference on Artificial Intelligence, Vol. 36. 2022. pp. 4541–9.Zhang Z, Liu Q, Wang H, Lu C, Lee C-K. Motif-based graph self-supervised learning for molecular property prediction. Adv Neural Inf Process Syst. 2021;34:15870–82.
Google Scholar 
Peng Y, Lin Y, Jing X-Y, Zhang H, Huang Y, Luo GS. Enhanced graph isomorphism network for molecular ADMET properties prediction. IEEE Access. 2020;8:168344–60.Article 

Google Scholar 
Wei Y, Li S, Li Z, Wan Z, Lin J. Interpretable-ADMET: a web service for ADMET prediction and optimization based on deep neural representation. Bioinformatics. 2022;38(10):2863–71.Article 
CAS 
PubMed 

Google Scholar 
Du B-X, Xu Y, Yiu S-M, Yu H, Shi J-Y. MTGL-ADMET: a novel multi-task graph learning framework for ADMET prediction enhanced by status-theory and maximum flow. In: International Conference on Research in Computational Molecular Biology. Springer. 2023. pp. 85–103.Zhang S, Yan Z, Huang Y, Liu L, He D, Wang W, Fang X, Zhang X, Wang F, Wu H, et al. HelixADMET: a robust and endpoint extensible ADMET system incorporating self-supervised knowledge transfer. Bioinformatics. 2022;38(13):3444–53.Article 
CAS 
PubMed 

Google Scholar 
Wang Y, Wang J, Cao Z, Barati Farimani A. Molecular contrastive learning of representations via graph neural networks. Nat Mach Intell. 2022;4(3):279–87.Article 

Google Scholar 
Li P, Wang J, Qiao Y, Chen H, Yu Y, Yao X, Gao P, Xie G, Song S. Learn molecular representations from large-scale unlabeled molecules for drug discovery. 2020. arXiv preprint arXiv:2012.11175.Fang X, Liu L, Lei J, He D, Zhang S, Zhou J, Wang F, Wu H, Wang H. Geometry-enhanced molecular representation learning for property prediction. Nat Mach Intell. 2022;4(2):127–34.Article 

Google Scholar 
Jin W, Barzilay R, Jaakkola T. Hierarchical generation of molecular graphs using structural motifs. In: International Conference on Machine Learning, 2020; 4839–4848. PMLR.Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, Light Y, McGlinchey S, Michalovich D, Al-Lazikani B, et al. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 2012;40(D1):1100–7.Article 

Google Scholar 
Polykovskiy D, Zhebrak A, Sanchez-Lengeling B, Golovanov S, Tatanov O, Belyaev S, Kurbanov R, Artamonov A, Aladinskiy V, Veselov M, et al. Molecular sets (MOSES): a benchmarking platform for molecular generation models. Front Pharmacol. 2020;11: 565644.Article 
CAS 
PubMed 
PubMed Central 

Google Scholar 
Huang K, Fu T, Gao W, Zhao Y, Roohani Y, Leskovec J, Coley CW, Xiao C, Sun J, Zitnik M. Therapeutics Data Commons: machine learning datasets and tasks for drug discovery and development. 2021. arXiv preprint arXiv:2102.09548.Landrum G. RDKit: open-source cheminformatics. 2006. http://www.rdkit.org.Wu Z, Ramsundar B, Feinberg EN, Gomes J, Geniesse C, Pappu AS, Leswing K, Pande V. Moleculenet: a benchmark for molecular machine learning. Chem Sci. 2018;9(2):513–30.Article 
CAS 
PubMed 

Google Scholar 
Boral N, Ghosh P, Goswami A, Bhattacharyya M. Accountable prediction of drug ADMET properties with molecular descriptors. bioRxiv, 2022;2022-06.Huang K, Fu T, Glass LM, Zitnik M, Xiao C, Sun J. DeepPurpose: a deep learning library for drug-target interaction prediction. Bioinformatics. 2020;36(22–23):5545–7.CAS 
PubMed Central 

Google Scholar 
Heid E, Greenman KP, Chung Y, Li S-C, Graff DE, Vermeire FH, Wu H, Green WH, McGill CJ. Chemprop: a machine learning package for chemical property prediction. J Chem Inf Model. 2023;64(1):9–17.Article 
PubMed 
PubMed Central 

Google Scholar 

Hot Topics

Related Articles