Kim, S. et al. Pubchem 2023 update. Nucleic Acids Res. 51, D1373–D1380 (2023).ArticleÂ
PubMedÂ
Google ScholarÂ
Fink, T., Bruggesser, H. & Reymond, J.-L. Virtual exploration of the small-molecule chemical universe below 160 daltons. Angewandte Chemie International Edition 44, 1504–1508 (2005).ArticleÂ
CASÂ
PubMedÂ
Google ScholarÂ
Blum, L. C. & Reymond, J.-L. 970 million druglike small molecules for virtual screening in the chemical universe database gdb-13. Journal of the American Chemical Society 131, 8732–8733 (2009).ArticleÂ
CASÂ
PubMedÂ
Google ScholarÂ
Fink, T. & Reymond, J.-L. Virtual exploration of the chemical universe up to 11 atoms of C, N, O, F: Assembly of 26.4 million structures (110.9 million stereoisomers) and analysis for new ring systems, stereochemistry, physicochemical properties, compound classes, and drug discovery. Journal of Chemical Information and Modeling 47, 342–353 (2007).ArticleÂ
CASÂ
PubMedÂ
Google ScholarÂ
Ruddigkeit, L., Van Deursen, R., Blum, L. C. & Reymond, J.-L. Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17. Journal of Chemical Information and Modeling 52, 2864–2875 (2012).ArticleÂ
CASÂ
PubMedÂ
Google ScholarÂ
Sterling, T. & Irwin, J. J. Zinc 15–ligand discovery for everyone. J. Chem. Inf. Model. 55, 2324–2337 (2015).ArticleÂ
CASÂ
PubMedÂ
PubMed CentralÂ
Google ScholarÂ
Tingle, B. I. et al. Zinc-22– a free multi-billion-scale database of tangible compounds for ligand discovery. J. Chem. Inf. Model. 63, 1166–1176 (2023).ArticleÂ
CASÂ
PubMedÂ
PubMed CentralÂ
Google ScholarÂ
Zdrazil, B. et al. The chembl database in 2023: a drug discovery platform spanning multiple bioactivity data types and time periods. Nucleic Acids Res. 52, D1180–D1192 (2024).ArticleÂ
PubMedÂ
Google ScholarÂ
Davies, M. et al. Chembl web services: streamlining access to drug discovery data and utilities. Nucleic Acids Res. 43, W612–W620 (2015).ArticleÂ
CASÂ
PubMedÂ
PubMed CentralÂ
Google ScholarÂ
Pence, H. & Williams, A. Chemspider: An online chemical information resource. Journal of Chemical Education 87 (2010).Wishart, D. S. et al. Drugbank 5.0: a major update to the drugbank database for 2018. Nucleic Acids Res. 46, D1074–D1082 (2018).ArticleÂ
CASÂ
PubMedÂ
Google ScholarÂ
Cheng, T., Pan, Y., Hao, M., Wang, Y. & Bryant, S. H. Pubchem applications in drug discovery: a bibliometric analysis. Drug Discovery Today 19, 1751–1756 (2014).ArticleÂ
PubMedÂ
PubMed CentralÂ
Google ScholarÂ
Miller, M. A. Chemical database techniques in drug discovery. Nature Reviews Drug Discovery 1, 220–227 (2002).ArticleÂ
CASÂ
PubMedÂ
Google ScholarÂ
Bohacek, R. S., McMartin, C. & Guida, W. C. The art and practice of structure-based drug design: A molecular modeling perspective. Medicinal Research Reviews 16, 3–50 (1996).ArticleÂ
CASÂ
PubMedÂ
Google ScholarÂ
Himanen, L., Geurts, A., Foster, A. S. & Rinke, P. Data-driven materials science: status, challenges, and perspectives. Advanced Science 6, 1900808 (2019).ArticleÂ
PubMedÂ
PubMed CentralÂ
Google ScholarÂ
Tripathi, M. K., Kumar, R. & Tripathi, R. Big-data driven approaches in materials science: A survey. Materials Today: Proceedings 26, 1245–1249 (2020). 10th International Conference of Materials Processing and Characterization.CASÂ
Google ScholarÂ
Cai, J., Chu, X., Xu, K., Li, H. & Wei, J. Machine learning-driven new material discovery. Nanoscale Adv. 2, 3115–3130 (2020).ArticleÂ
ADSÂ
PubMedÂ
PubMed CentralÂ
Google ScholarÂ
Zou, S.-J. et al. Recent advances in organic light-emitting diodes: toward smart lighting and displays. Mater. Chem. Front. 4, 788–820 (2020).ArticleÂ
CASÂ
Google ScholarÂ
Salehi, A., Fu, X., Shin, D.-H. & So, F. Recent advances in oled optical design. Advanced Functional Materials 29, 1808803 (2019).ArticleÂ
Google ScholarÂ
Zhao, Q., Stalin, S., Zhao, C.-Z. & Archer, L. A. Designing solid-state electrolytes for safe, energy-dense batteries. Nature Reviews Materials 5, 229–252 (2020).ArticleÂ
ADSÂ
CASÂ
Google ScholarÂ
Bruno, I. J. & Groom, C. R. Crystallographic perspective on sharing data and knowledge. Journal of Computer-Aided Molecular Design 28, 1015–1022 (2014).ArticleÂ
ADSÂ
CASÂ
PubMedÂ
PubMed CentralÂ
Google ScholarÂ
Montavon, G. et al. Machine learning of molecular electronic properties in chemical compound space. New Journal of Physics 15, 095003 (2013).ArticleÂ
ADSÂ
CASÂ
Google ScholarÂ
Kim, H., Park, J. Y. & Choi, S. Energy refinement and analysis of structures in the QM9 database via a highly accurate quantum chemical method. Scientific Data 6, 109 (2019).ArticleÂ
PubMedÂ
PubMed CentralÂ
Google ScholarÂ
Ramakrishnan, R., Hartmann, M., Tapavicza, E. & Von Lilienfeld, O. A. Electronic spectra from TDDFT and machine learning in chemical space. J. Chem. Phys. 143, 084111 (2015).ArticleÂ
ADSÂ
PubMedÂ
Google ScholarÂ
Ramakrishnan, R., Dral, P. O., Rupp, M. & Von Lilienfeld, O. A. Quantum chemistry structures and properties of 134 kilo molecules. Scientific Data 1, 140022 (2014).ArticleÂ
CASÂ
PubMedÂ
PubMed CentralÂ
Google ScholarÂ
Rupp, M., Tkatchenko, A., Müller, K.-R. & von Lilienfeld, O. A. Fast and accurate modeling of molecular atomization energies with machine learning. Physical Review Letters 108, 58301 (2012).ArticleÂ
ADSÂ
Google ScholarÂ
Nakata, M. & Maeda, T. PubChemQC B3LYP/6-31G*//PM6 data set: The electronic structures of 86 million molecules using B3LYP/6-31G* calculations. J. Chem. Inf. Model. 63, 5734–5754 (2023).ArticleÂ
CASÂ
PubMedÂ
Google ScholarÂ
Nakata, M., Shimazaki, T., Hashimoto, M. & Maeda, T. PubChemQC PM6: A dataset of 221 million molecules with optimized molecular geometries and electronic properties. Journal of Chemical Information and Modeling 60, 5891–5899 (2020).ArticleÂ
CASÂ
PubMedÂ
Google ScholarÂ
Nakata, M. & Shimazaki, T. PubChemQC Project: A large-Scale first-principles electronic structure database for data-driven chemistry. Journal of Chemical Information and Modeling 57, 1300–1308 (2017).ArticleÂ
CASÂ
PubMedÂ
Google ScholarÂ
Chen, G. et al. Alchemy: A quantum chemistry dataset for benchmarking ai models. arXiv arXiv:1906.09427 (2019).Pereira, F. et al. Machine learning methods to predict density functional theory b3lyp energies of HOMO and LUMO orbitals. Journal of Chemical Information and Modeling 57, 11–21 (2017).ArticleÂ
CASÂ
PubMedÂ
Google ScholarÂ
Liang, J., Xu, Y., Liu, R. & Zhu, X. QM-sym, a symmetrized quantum chemistry database of 135 kilo molecules. Scientific Data 6, 213 (2019).ArticleÂ
CASÂ
PubMedÂ
PubMed CentralÂ
Google ScholarÂ
Liang, J. et al. QM-symex, update of the QM-sym database with excited state information for 173 kilo molecules. Scientific Data 7, 400 (2020).ArticleÂ
ADSÂ
PubMedÂ
PubMed CentralÂ
Google ScholarÂ
Zou, Z. et al. A deep learning model for predicting selected organic molecular spectra. Nature Computational Science 3, 957–964 (2023).ArticleÂ
ADSÂ
CASÂ
PubMedÂ
Google ScholarÂ
Kayastha, P., Chakraborty, S. & Ramakrishnan, R. The resolution- vs. -accuracy dilemma in machine learning modeling of electronic excitation spectra. Digital Discovery 1, 689–702 (2022).ArticleÂ
CASÂ
Google ScholarÂ
Pengmei, Z., Liu, J. & Shu, Y. Beyond MD17: The Reactive xxMD Dataset. Scientific Data 11, 1 (2024).Vinod, V. & Zaspel, P. CheMFi: A Multifidelity Dataset of Quantum Chemical Properties of Diverse Molecules. arXiv. http://www.arxiv.org/abs/2406.14149 (2024).Glavatskikh, M., Leguy, J., Hunault, G., Cauchy, T. & Da Mota, B. Dataset’s chemical diversity limits the generalizability of machine learning predictions. J. Cheminformatics 11, 69 (2019).ArticleÂ
Google ScholarÂ
Isert, C., Atz, K., Jiménez-Luna, J. & Schneider, G. QMugs, quantum mechanical properties of drug-like molecules. Scientific Data 9, 273 (2022).ArticleÂ
CASÂ
PubMedÂ
PubMed CentralÂ
Google ScholarÂ
Kokkinos, I. Ubernet: Training a universal convolutional neural network for low-, mid-, and high-level vision using diverse datasets and limited memory. In Proceedings of the IEEE conference on computer vision and pattern recognition, 6129–6138 (2017).Zhang, D. et al. Dpa-2: Towards a universal large atomic model for molecular and material simulation. arXiv arXiv:2312.15492 (2023).Grimme, S., Ehrlich, S. & Goerigk, L. Effect of the damping function in dispersion corrected density functional theory. J. Comput. Chem. 32, 1456–1465 (2011).ArticleÂ
CASÂ
PubMedÂ
Google ScholarÂ
Sculley, D. Web-scale k-means clustering. In Proceedings of the 19th International Conference on World Wide Web, WWW ’10, 1177–1178 (Association for Computing Machinery, New York, NY, USA, 2010).O’Boyle, N. M., Morley, C. & Hutchison, G. R. Pybel: a python wrapper for the openbabel cheminformatics toolkit. Chemistry Central Journal 2, 1–7 (2008).
Google ScholarÂ
O’Boyle, N. M. et al. Open babel: An open chemical toolbox. J. Cheminformatics 3, 1–14 (2011).
Google ScholarÂ
Bannwarth, C., Ehlert, S. & Grimme, S. Gfn2-xtb—an accurate and broadly parametrized self-consistent tight-binding quantum chemical method with multipole electrostatics and density-dependent dispersion contributions. J. Chem. Theory Comput 15, 1652–1671 (2019).ArticleÂ
CASÂ
PubMedÂ
Google ScholarÂ
Bannwarth, C. et al. Extended tight-binding quantum chemistry methods. Wiley Interdisciplinary Reviews: Computational Molecular Science 11, e1493 (2021).CASÂ
Google ScholarÂ
Frisch, M. J. et al. Gaussian 16 Revision C.01 (2016). Gaussian Inc. Wallingford CT.Heller, S. R., McNaught, A., Pletnev, I., Stein, S. & Tchekhovskoi, D. Inchi, the iupac international chemical identifier. J. Cheminformatics 7, 1–34 (2015).ArticleÂ
CASÂ
Google ScholarÂ
Pulay, P. & Fogarasi, G. Geometry optimization in redundant internal coordinates. J. Chem. Phys. 96, 2856–2860 (1992).ArticleÂ
ADSÂ
CASÂ
Google ScholarÂ
Peng, C., Ayala, P. Y., Schlegel, H. B. & Frisch, M. J. Using redundant internal coordinates to optimize equilibrium geometries and transition states. J. Comput. Chem. 17, 49–56 (1996).ArticleÂ
CASÂ
Google ScholarÂ
Zhu, Y., Li, M., Xu, C. & Lan, Z. QCDGE dataset. Figshare https://doi.org/10.6084/m9.figshare.c.7259125.v1 (2024).The HDF Group, N., Koziol, Q. & of Science, U. O. HDF5-version 1.12.0, https://doi.org/10.11578/dc.20180330.1 (2020).Ertl, P. An algorithm to identify functional groups in organic molecules. J. Cheminformatics 9, 36 (2017).ArticleÂ
Google ScholarÂ
Schaub, J.Development and implementation of in silico molecule fragmentation algorithms for the cheminformatics analysis of natural product spaces. Ph.D. thesis, Friedrich-Schiller-Universität, Jena https://doi.org/10.22032/dbt.59051 (2023).Haider, N. Functionality pattern matching as an efficient complementary structure/reaction search tool: an open-source approach. Molecules 15, 5079–5092 (2010).ArticleÂ
CASÂ
PubMedÂ
PubMed CentralÂ
Google ScholarÂ
ChemAxon. Marvin. http://www.chemaxon.com (2024).