Rubin, D. B. Statistical disclosure limitation. J. Off. Stat. 9, 461–468 (1993).
Google ScholarÂ
Yoon, J., Drumright, L. N. & Van Der Schaar, M. Anonymization through data synthesis using generative adversarial networks (ADS-GAN). IEEE J. Biomed. Health Inform. 24, 2378–2388 (2020).ArticleÂ
Google ScholarÂ
Bond-Taylor, S., Leach, A., Long, Y. & Willcocks, C. G. Deep generative modelling: a comparative review of VAEs, GANs, normalizing flows, energy-based and autoregressive models. IEEE Trans. Pattern Anal. Mach. Intell. https://doi.org/10.1109/TPAMI.2021.3116668 (2021).Xu, D., Yuan, S., Zhang, L. & Wu, X. Fairgan: Fairness-aware generative adversarial networks. In 2018 IEEE International Conference on Big Data 570–575 (IEEE, 2018).Xu, D., Wu, Y., Yuan, S., Zhang, L. & Wu, X. Achieving causal fairness through generative adversarial networks. In Proc. International Joint Conference on Artificial Intelligence 1452–1458 (IJCAI, 2019).van Breugel, B., Kyono, T., Berrevoets, J. & van der Schaar, M. DECAF: generating fair synthetic data using causally-aware generative networks. Adv. Neural Inform. Process. Syst. 34, 22221–22233 (2021).
Google ScholarÂ
Antoniou, A., Storkey, A. & Edwards, H. Data augmentation generative adversarial networks. Preprint at https://doi.org/10.48550/arXiv.1711.04340 (2017).Dina, A. S., Siddique, A. B. & Manivannan, D. Effect of balancing data using synthetic data on the performance of machine learning classifiers for intrusion detection in computer networks. IEEE Access. 10, 96731–96747 (2022).ArticleÂ
Google ScholarÂ
Das, H. P. et al. Conditional synthetic data generation for robust machine learning applications with limited pandemic data. In Proc. AAAI Conference on Artificial Intelligence 36, 11792–11800 (AAAI, 2021).Bing, S., Dittadi, A., Bauer, S. & Schwab, P. Conditional generation of medical time series for extrapolation to underrepresented populations. PLoS Digital Health 1, e0000074 (2022).ArticleÂ
Google ScholarÂ
van Breugel, B., Seedat, N., Imrie, F. & van der Schaar, M. Can you rely on your model evaluation? Improving model evaluation with synthetic test data. Adv. Neural Inform. Process. Syst. 36, 1889–1904 (2023).
Google ScholarÂ
Iorio, F. et al. A landscape of pharmacogenomic interactions in cancer. Cell 166, 740–754 (2016).ArticleÂ
Google ScholarÂ
Vamathevan, J. et al. Applications of machine learning in drug discovery and development. Nat. Rev. Drug. Discov. 18, 463–477 (2019).ArticleÂ
Google ScholarÂ
Watson, J. L. et al. De novo design of protein structure and function with RFdiffusion. Nature 620, 1089–1100 (2023).ArticleÂ
Google ScholarÂ
Rombach, R., Blattmann, A., Lorenz, D., Esser, P. & Ommer, B. High-resolution image synthesis with latent diffusion models. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 10684–10695 (IEEE, 2022).Wang, H. et al. Predicting the epidemics trend of COVID-19 using epidemiological-based generative adversarial networks. IEEE J. Sel. Top. Signal Process. 16, 276–288 (2022).ArticleÂ
Google ScholarÂ
Morbiducci, U. et al. Synthetic dataset generation for the analysis and the evaluation of image-based hemodynamics of the human aorta. Med. Biol. Eng. Comput. 50, 145–154 (2012).ArticleÂ
Google ScholarÂ
Frangi, A. F., Tsaftaris, S. A. & Prince, J. L. Simulation and synthesis in medical imaging. IEEE Trans. Med. Imaging 37, 673–679 (2018).ArticleÂ
Google ScholarÂ
Bray, A. et al. Pulse physiology engine: an open-source software platform for computational modeling of human medical simulation. SN Compr. Clin. Med. 1, 362–377 (2019).ArticleÂ
Google ScholarÂ
Webb, J. B. et al. Computational simulation to assess patient safety of uncompensated COVID-19 two-patient ventilator sharing using the Pulse Physiology Engine. PLOS ONE 15, e0242532 (2020).ArticleÂ
Google ScholarÂ
Patki, N., Wedge, R. & Veeramachaneni, K. The synthetic data vault. In Proc. 3rd IEEE International Conference on Data Science and Advanced Analytics (DSAA 2016) 399–410 (IEEE, 2016).Walonoski, J. et al. Synthea: an approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record. J. Am. Med. Inform. Assoc. 25, 230–238 (2018).ArticleÂ
Google ScholarÂ
von Platen, P. et al. Diffusers: state-of-the-art diffusion models. GitHub github.com/huggingface/diffusers (2022).Qian, Z., Davies, R. & van der Schaar, M. Synthcity: a benchmark framework for diverse use cases of tabular synthetic data. Adv. Neural Inform. Process. Syst. 36, 3173–3188 (2023).
Google ScholarÂ
Dwork, C. & Roth, A. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9, 211–487 (2014).ArticleÂ
MathSciNetÂ
Google ScholarÂ
Qu, Y. et al. GAN-DP: generative adversarial net driven differentially privacy-preserving big data publishing. In 2019 IEEE International Conference on Communications (ICC) (IEEE, 2019).Nikolenko, S. I. Synthetic Data for Deep Learning SOIA Vol. 174 (Springer, 2021).Chen, R. J., Lu, M. Y., Chen, T. Y., Williamson, D. F. K. & Mahmood, F. Synthetic data in machine learning for medicine and healthcare. Nat. Biomed. Eng. 5, 493–497 (2021).ArticleÂ
Google ScholarÂ
Jordon, J. et al. Synthetic data — what, why and how? Preprint at https://doi.org/10.48550/arxiv.2205.03257 (2022).Alloza, C. et al. A case for synthetic data in regulatory decision-making in Europe. Clin. Pharmacol. Ther. 114, 795–801 (2023).ArticleÂ
Google ScholarÂ
Giuffrè, M. & Shung, D. L. Harnessing the power of synthetic data in healthcare: innovation, application, and privacy. npj Digital Med. 6, 186 (2023).ArticleÂ
Google ScholarÂ
Savage, N. Synthetic data could be better than real data. Nature https://doi.org/10.1038/d41586-023-01445-8 (2023).Rocher, L., Hendrickx, J. M. & de Montjoye, Y.-A. Estimating the success of re-identifications in incomplete datasets using generative models. Nat. Commun. 10, 3069 (2019).ArticleÂ
Google ScholarÂ
Hernandez, M., Epelde, G., Alberdi, A., Cilla, R. & Rankin, D. Synthetic data generation for tabular health records: a systematic review. Neurocomputing 493, 28–45 (2022).ArticleÂ
Google ScholarÂ
Li, J., Cairns, B. J., Li, J. & Zhu, T. Generating synthetic mixed-type longitudinal electronic health records for artificial intelligent applications. npj Digital Med. 6, 98 (2023).ArticleÂ
Google ScholarÂ
Theodorou, B., Xiao, C. & Sun, J. Synthesize high-dimensional longitudinal electronic health records via hierarchical autoregressive language model. Nat. Commun. 14, 5305 (2023).ArticleÂ
Google ScholarÂ
Alaa, A. M., van Breugel, B., Saveliev, E. & van der Schaar, M. How faithful is your synthetic data? Sample-level metrics for evaluating and auditing generative models. In International Conference on Machine Learning (ICML) 290–306 (PMLR, 2022).Stadler, T., Oprisanu, B. & Troncoso, C. Synthetic data — anonymisation Groundhog Day. In 31st USENIX Security Symp. (USENIX, 2022).Dressel, J. & Farid, H. The accuracy, fairness, and limits of predicting recidivism. Sci. Adv. 4, eaao5580 (2018).ArticleÂ
Google ScholarÂ
Dastin, J. Amazon scraps secret AI recruiting tool that showed bias against women. Reuters (11 October 2018).Lu, K., Mardziel, P., Wu, F., Amancharla, P. & Datta, A. Gender bias in neural natural language processing. In Logic, Language, and Security: Essays Dedicated to Andre Scedrov on the Occasion of his 65th Birthday 189–202 (Springer International Publishing, 2020).de Vassimon Manela, D., Errington, D., Fisher, T., van Breugel, B. & Minervini, P. Stereotype and skew: quantifying gender bias in pre-trained and fine-tuned language models. In Proc. 16th Conference of the European Chapter of the Association for Computational Linguistics (ECACL) 2232–2242 (ACL, 2021).Kadambi, A. Achieving fairness in medical devices. Science 372, 30–31 (2021).ArticleÂ
Google ScholarÂ
Abid, A., Farooqi, M. & Zou, J. Persistent anti-Muslim bias in large language models. In Proc. 2021 AAAI/ACM Conference on AI, Ethics, and Society 9, 298–306 (ACM, 2021).Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K. & Galstyan, A. A survey on bias and fairness in machine learning. ACM Comput. Surv. (CSUR) 54 (ACM, 2021).Grgic-Hlaca, N., Zafar, M. B., Gummadi, K. P. & Weller, A. The case for process fairness in learning: feature selection for fair decision making. In Symposium on Machine Learning and the Law at the 29th Conference on Neural Information Processing Systems (NIPS, 2016).Barocas, S. & Selbst, A. D. Big data’s disparate impact. Calif. Law Rev. 104, 671 (2016).
Google ScholarÂ
Zemel, R., Wu, Y., Swersky, K., Pitassi, T. & Dwork, C. Learning fair representations. In International Conference on Machine Learning 325–333 (PMLR, 2013).Alessandra, A. M. When doctrines collide: disparate treatment, disparate impact, and Watson v. Fort Worth Bank & Trust. Univ. Pennsylvania Law Rev. 137, 1755 (1988).ArticleÂ
Google ScholarÂ
Feldman, M., Friedler, S. A., Moeller, J., Scheidegger, C. & Venkatasubramanian, S. Certifying and removing disparate impact. In Proc. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 259–268 (ACM, 2015).Saxena, N. A. et al. How do fairness definitions fare? Testing public attitudes towards three algorithmic definitions of fairness in loan allocations. Artif. Intell. 283, 103238 (2020).ArticleÂ
MathSciNetÂ
Google ScholarÂ
Chawla, N. V., Bowyer, K. W., Hall, L. O., Kegelmeyer, W. P. & SMOTE Synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002).ArticleÂ
Google ScholarÂ
Draghi, B., Wang, Z., Myles, P. & Tucker, A. BayesBoost: identifying and handling bias using synthetic data generators. In Proc. 3rd Int. Worksh. on Learning with Imbalanced Domains: Theory and Applications 49–62 (PMLR, 2021).Waheed, A. et al. CovidGAN: data augmentation using auxiliary classifier GAN for improved Covid-19 detection. IEEE Access. 8, 91916–91923 (2020).ArticleÂ
Google ScholarÂ
Mahmood, F. et al. Deep adversarial training for multi-organ nuclei segmentation in histopathology images. IEEE Trans. Med. Imaging 39, 3257–3267 (2020).ArticleÂ
Google ScholarÂ
Shen, T., Hao, K., Gou, C. & Wang, F. Y. Mass image synthesis in mammogram with contextual information based on GANs. Comput. Meth. Prog. Biomed. 202, 106019 (2021).ArticleÂ
Google ScholarÂ
Tang, Y., Tang, Y., Zhu, Y., Xiao, J. & Summers, R. M. A disentangled generative model for disease decomposition in chest X-rays via normal image synthesis. Med. Image Anal. 67, 101839 (2021).ArticleÂ
Google ScholarÂ
van Breugel, B., Qian, Z. & van der Schaar, M. Synthetic data, real errors: how (not) to publish and use synthetic data. In Proc. 40th International Conference on Machine Learning (PMLR, 2023).Manousakas, D. & Aydöre, S. On the usefulness of synthetic tabular data generation. Preprint at https://doi.org/10.48550/arXiv.2306.15636 (2023).Liu, M. Y. & Tuzel, O. Coupled generative adversarial networks. Adv. Neural Inform. Process. Syst. 469, 477 (2016).
Google ScholarÂ
Kim, T., Cha, M., Kim, H., Lee, J. K. & Kim, J. Learning to discover cross-domain relations with generative adversarial networks. In 34th International Conference on Machine Learning 4, 2941–2949 (PMLR, 2017).Zhu, J. Y., Park, T., Isola, P. & Efros, A. A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proc. IEEE International Conference on Computer Vision 2017, 2242–2251 (IEEE, 2017).Liu, M. Y., Breuel, T. & Kautz, J. Unsupervised image-to-image translation networks. Adv. Neural Inform. Process. Syst. 30, 701–709 (2017).
Google ScholarÂ
Choi, E. et al. Generating multi-label discrete patient records using generative adversarial networks. In Machine Learning for Healthcare 286–305 (PMLR, 2017).Yoon, J., Jordon, J., Van Der Schaar, M. & RadialGAN Leveraging multiple datasets to improve target-specific predictive models using generative adversarial networks. In 35th International Conference on Machine Learning 13, 9060–9068 (PMLR, 2018).Karras, T., Laine, S. & Aila, T. A style-based generator architecture for generative adversarial networks. IEEE Trans. Pattern Anal. Mach. Intell. 43, 4217–4228 (2018).ArticleÂ
Google ScholarÂ
Ali, M. B. et al. Domain mapping and deep learning from multiple MRI clinical datasets for prediction of molecular subtypes in low grade gliomas. Brain Sci. 10, 463 (2020).ArticleÂ
Google ScholarÂ
Ge, C., Gu, I. Y.-H., Jakola, A. S. & Yang, J. Enlarged training dataset by pairwise GANs for molecular-based brain tumor classification. IEEE Access. 8, 22560–22570 (2020).ArticleÂ
Google ScholarÂ
Shwartz-Ziv, R. & Armon, A. Tabular data: deep learning is not all you need. Inf. Fusion 81, 84–90 (2022).ArticleÂ
Google ScholarÂ
Gebru, T. et al. Datasheets for datasets. Commun. ACM 64, 86–92 (2021).ArticleÂ
Google ScholarÂ
Tao, F. et al. Digital twin-driven product design, manufacturing and service with big data. Int. J. Adv. Manuf. Technol. 94, 3563–3576 (2018).ArticleÂ
Google ScholarÂ
Corral-Acero, J. et al. The ‘Digital Twin’ to enable the vision of precision cardiology. Eur. Heart J. 41, 4556–4564 (2020).ArticleÂ
Google ScholarÂ
Eddy, D. M. & Schlessinger, L. Validation of the Archimedes diabetes model. Diabetes Care 26, 3102–3110 (2003).ArticleÂ
Google ScholarÂ
Laubenbacher, R., Sluka, J. P. & Glazier, J. A. Using digital twins in viral infection. Science 371, 1105–1106 (2021).ArticleÂ
Google ScholarÂ
Chan, A., Bica, I., Hüyük, A., Jarrett, D. & van der Schaar, M. The medkit-learn(ing) environment: medical decision modelling through simulation. In Adv. Neural Inf. Process. Syst. Track on Datasets and Benchmarks 1 (Curran Associates, 2021).Berrevoets, J., Jarrett, D., Chan, A. J. & Schaar, M. van der. AllSim: Simulating and benchmarking resource allocation policies in multi-user systems. Adv. Neural Inf. Proces. Syst. 36, 851–866 (2023).
Google ScholarÂ
Zhang, J. et al. Combining mechanistic and machine learning models for predictive engineering and optimization of tryptophan metabolism. Nat. Commun. 11, 4880 (2020).ArticleÂ
Google ScholarÂ
Allen, A. et al. A digital twins machine learning model for forecasting disease progression in stroke patients. Appl. Sci. 11, 5576 (2021).ArticleÂ
Google ScholarÂ
Bertolini, D. et al. Forecasting progression of mild cognitive impairment (MCI) and Alzheimer’s disease with digital twins. Alzheimer’s Dement. 17, e054414 (2021).ArticleÂ
Google ScholarÂ
Tang, Y. et al. GANDA: a deep generative adversarial network conditionally generates intratumoral nanoparticles distribution pixels-to-pixels. J. Control. Rel. 336, 336–343 (2021).ArticleÂ
Google ScholarÂ
Du, P., Zhu, X. & Wang, J.-X. Deep learning-based surrogate model for three-dimensional patient-specific computational fluid dynamics. Phys. Fluids 34, 081906 (2022).ArticleÂ
Google ScholarÂ
Donovan-Maiye, R. M. et al. A deep generative model of 3D single-cell organization. PLoS Comput. Biol. 18, e1009155 (2022).ArticleÂ
Google ScholarÂ
Pearl, J. Causality (Cambridge Univ. Press, 2009).Yang, Y. & Perdikaris, P. Physics-informed deep generative models. Preprint at https://doi.org/10.48550/arXiv.1812.03511 (2018).Johansson, F., Shalit, U. & Sontag, D. Learning representations for counterfactual inference. In International Conference on Machine Learning 3020–3029 (PMLR, 2016).Hüllermeier, E. & Waegeman, W. Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods. Mach. Learn. 110, 457–506 (2021).ArticleÂ
MathSciNetÂ
Google ScholarÂ
Tsialiamanis, G., Wagg, D. J., Dervilis, N. & Worden, K. On generative models as the basis for digital twins. Data Centric Eng. 2, e11 (2021).ArticleÂ
Google ScholarÂ
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C. & Chen, M. Hierarchical text-conditional image generation with CLIP latents. Preprint at https://doi.org/10.48550/arxiv.2204.06125 (2022).Chambon, P. et al. RoentGen: vision-language foundation model for chest X-ray generation. Preprint at https://doi.org/10.48550/arXiv.2211.12737 (2022).Pérez-GarcÃa, F. et al. Radedit: stress-testing biomedical vision models via diffusion image editing. In Eur. Conf. on Computer Vision (ECCV) (Springer Science, 2024).Singhal, K. et al. Towards expert-level medical question answering with large language models. Preprint at https://doi.org/10.48550/arXiv.2305.09617 (2023).Singhal, K. et al. Large language models encode clinical knowledge. Nature 620, 172–180 (2023).ArticleÂ
Google ScholarÂ
Chen, Y. T. & Zou, J. GenePT: a simple but effective foundation model for genes and cells built from ChatGPT. Preprint at bioRxiv https://doi.org/10.1101/2023.10.16.562533 (2023).Naeem, M. F., Oh, S. J., Uh, Y., Choi, Y. & Yoo, J. Reliable fidelity and diversity metrics for generative models. In Proc. 37th International Conference Machine Learning Vol. 119, 7176–7185 (PMLR, 2020).Kahveci, Z. Ãœ. Attribution problem of generative AI: a view from US copyright law. J. Intellect. Property Law Pract. 18, 796–807 (2023).ArticleÂ
Google ScholarÂ
Thorp, H. H. ChatGPT is fun, but not an author. Science 379, 313–313 (2023).ArticleÂ
Google ScholarÂ
Susnjak, T. ChatGPT: the end of online exam integrity? Education Sciences 14, 656 (MDPI, 2024).van Dis, E. A. M., Bollen, J., Zuidema, W., van Rooij, R. & Bockting, C. L. ChatGPT: five priorities for research. Nature 614, 224–226 (2023).ArticleÂ
Google ScholarÂ
Gates, B. The age of AI has begun. Gates Notes https://www.gatesnotes.com/The-Age-of-AI-Has-Begun (21 March 2023).Kynkäänniemi, T., Karras, T., Laine, S., Lehtinen, J. & Aila, T. Improved precision and recall metric for assessing generative models. Adv. Neural Inf. Process. Syst. 32 (2019).Sajjadi, M. S. M. et al. Assessing generative model precision and recall. Adv. Neural Inf. Process. Syst. 31, 3927–3936 (2018).
Google ScholarÂ
Gretton, A. et al. A kernel two-sample test. J. Mach. Learn. Res. 13, 723–773 (2012).MathSciNetÂ
Google ScholarÂ
Arora, S., Ge, R., Liang, Y., Ma, T. & Zhang, Y. Generalization and equilibrium in generative adversarial nets (GANs). In 34th International Conference on Machine Learning 1, 322–349 (PMLR, 2017).Arjovsky, M., Bottou, L., Gulrajani, I. & Lopez-Paz, D. Invariant risk minimization. Preprint at https://doi.org/10.48550/arXiv.1907.02893 (2019).Gulrajani, I., Raffel, C. & Metz, L. Towards GAN benchmarks which require generalization. In 7th International Conference on Learning Representations (ICLR, 2019).Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B. & Hochreiter, S. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. Adv. Neural Inf. Process. Syst. 30, 6627–6638 (2017).
Google ScholarÂ
Theis, L., Van Den Oord, A. & Bethge, M. A note on the evaluation of generative models. In 4th International Conference on Learning Representations (ICLR, 2016).Lee, J. & Clifton, C. How much is enough? Choosing ε for differential privacy. Lecture Notes Comput. Sci. 7001, 325–340 (2011).ArticleÂ
Google ScholarÂ
Hayes, J., Melis, L., Danezis, G., De Cristofaro, E. & LOGAN Membership inference attacks against generative models. Proc. Priv. Enhancing Technol. 2019, 133–152 (2019).ArticleÂ
Google ScholarÂ
Hilprecht, B., Härterich, M. & Bernau, D. Monte Carlo and reconstruction membership inference attacks against generative models. In Proc. Conference on Privacy Enhancing Technologies https://doi.org/10.2478/popets-2019-0067 (De Gruyter Open/Sciendo, 2019).Chen, D., Yu, N., Zhang, Y. & Fritz, M. GAN-leaks: a taxonomy of membership inference attacks against generative models. In Proc. ACM Conference on Computer and Communications Security 343–362 (ACM, 2019).Liu, K. S., Xiao, C., Li, B. & Gao, J. Performing co-membership attacks against deep generative models. In Proc. IEEE International Conference on Data Mining (ICDM) 459–467 (IEEE, 2019).Hu, H. & Pang, J. Membership inference attacks against GANs by leveraging over-representation regions. In Proc. ACM Conference on Computer and Communications Security 2387–2389 (ACM, 2021).van Breugel, B., Sun, H., Qian, Z. & van der Schaar, M. Membership inference attacks against synthetic data through overfitting detection. In Proc. 26th International Conference on Artificial Intelligence and Statistics (AISTATS) (PMLR, 2023).Sweeney, L. k-anonymity: a model for protecting privacy. Int. J. Uncertainty Fuzziness Knowledge-based Syst. 10, 557–570 (2002).ArticleÂ
MathSciNetÂ
Google ScholarÂ
Machanavajjhala, A., Gehrke, J., Kifer, D. & Venkitasubramaniam, M. â„“-diversity: privacy beyond k-anonymity. In Proc. International Conference on Data Engineering 2006, 24 (IEEE, 2006).Ninghui, L., Tiancheng, L. & Venkatasubramanian, S. t-closeness: privacy beyond k-anonymity and â„“-diversity. In Proc. International Conference on Data Engineering 106–115 https://doi.org/10.1109/ICDE.2007.367856 (IEEE, 2007).Rubin, D. B. & Schenker, N. Multiple imputation in health-care databases: an overview and some applications. Stat. Med. 10, 585–598 (1991).ArticleÂ
Google ScholarÂ
Räisä, O., Jälkö, J. & Honkela, A. On consistent Bayesian inference from synthetic data. In NeurIPS 2023 Workshop on Synthetic Data Generation with Generative AI (2023).Hansen, L., Seedat, N., van der Schaar, M. & Petrovic, A. Reimagining synthetic tabular data generation through data-centric AI: a comprehensive benchmark. Adv. Neural Inf. Process. Syst. 36, 33781–33823 (2023).
Google ScholarÂ
Franceschelli, G. & Musolesi, M. Copyright in generative deep learning. Data Policy 4, e17 (2022).ArticleÂ
Google ScholarÂ
Kasneci, E. et al. ChatGPT for good? On opportunities and challenges of large language models for education. Learn. Individ. Differ. 103, 102274 (2023).ArticleÂ
Google ScholarÂ
Ji, Z. et al. Survey of hallucination in natural language generation. ACM Comput. Surveys 55, 248 (2023).ArticleÂ
Google ScholarÂ
Bohnet, B. et al. Attributed question answering: evaluation and modeling for attributed large language models. Preprint at https://doi.org/10.48550/arXiv.2212.08037 (2022).Gao, T., Yen, H., Yu, J. & Chen, D. Enabling large language models to generate text with citations. In The 2023 Conference on Empirical Methods in Natural Language Processing (ACL, 2023).Bommasani, R. et al. On the opportunities and risks of foundation models. Preprint at https://doi.org/10.48550/arXiv.2108.07258 (2022).OpenAI, R. GPT-4 technical report. Preprint at https://doi.org/10.48550/arXiv.2303.08774 (2023).Anil, R. et al. Palm 2 technical report. Preprint at https://doi.org/10.48550/arXiv.2305.10403 (2023).Jiang, Z., Zhang, Y., Liu, C., Zhao, J. & Liu, K. Generative calibration for in-context learning. In Findings of the Association for Computational Linguistics (EMNLP 2023) 2312–2333 (ACL, 2023).Gao, L. et al. The pile: an 800Gb dataset of diverse text for language modeling. Preprint at https://doi.org/10.48550/arXiv.2101.00027 (2020).Cheng, B., Misra, I., Schwing, A. G., Kirillov, A. & Girdhar, R. Masked-attention mask transformer for universal image segmentation. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 1290–1299 (IEEE, 2022).Oquab, M. et al. DINOv2: Learning robust visual features without supervision. Transactions on Machine Learning Research (2024).Baevski, A., Zhou, Y., Mohamed, A. & Auli, M. wav2vec 2.0: a framework for self-supervised learning of speech representations. Adv. Neural Inf. Process. Syst. 33, 12449–12460 (2020).
Google ScholarÂ
Radford, A. et al. Robust speech recognition via large-scale weak supervision. In International Conference on Machine Learning 28492–28518 (PMLR, 2023).Radford, A. et al. Learning transferable visual models from natural language supervision. In Proc. 38th International Conference on Machine Learning Vol. 139, 8748–8763 (PMLR, 2021).Driess, D. et al. Palm-e: an embodied multimodal language model. Preprint at https://doi.org/10.48550/arXiv.2303.03378 (2023).Brown, T. B. et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 1877–1901 (2020).
Google ScholarÂ
van Breugel, B. & van der Schaar, M. Why tabular foundation models should be a research priority. In International Conference on Machine Learning (PMLR, 2024).Ye, C. et al. Towards cross-table masked pretraining for web data mining. In Proc. ACM Web Conference 2024 (WWW ’24) (ACM, 2023).Borisov, V., Seßler, K., Leemann, T., Pawelczyk, M. & Kasneci, G. Language models are realistic tabular data generators. In 11th International Conference on Learning Representations (ICLR, 2023).Eggert, G., Huo, K., Biven, M. & Waugh, J. TabLib: A dataset of 627M tables with context. Preprint at https://doi.org/10.48550/arXiv.2310.07875 (2023).Schneider, G. & Fechner, U. Computer-based de novo design of drug-like molecules. Nat. Rev. Drug. Discov. 4, 649–663 (2005).ArticleÂ
Google ScholarÂ
Shervashidze, N., Schweitzer, P., van Leeuwen, E. J., Mehlhorn, K. & Borgwardt, K. M. Weisfeiler–Lehman graph kernels. J. Mach. Learn. Res. 12, 2539–2561 (2011).MathSciNetÂ
Google ScholarÂ
Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Model. 28, 31–36 (1988).
Google ScholarÂ
Gómez-Bombarelli, R. et al. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci. 4, 268–276 (2018).ArticleÂ
Google ScholarÂ
Blaschke, T. et al. REINVENT 2.0: an AI tool for de novo drug design. J. Chem. Inf. Model. 60, 5918–5922 (2020).ArticleÂ
Google ScholarÂ
Schütt, K. T., Sauceda, H. E., Kindermans, P.-J., Tkatchenko, A. & Müller, K.-R. SchNet — a deep learning architecture for molecules and materials. J. Chem. Phys. 148, 241722 (2018).ArticleÂ
Google ScholarÂ
Batzner, S. et al. E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nat. Commun. 13, 2453 (2022).ArticleÂ
Google ScholarÂ
Satorras, V. G., Hoogeboom, E. & Welling, M. E(n) equivariant graph neural networks. In Proc. 38th International Conference on Machine Learning Vol. 139, 9323–9332 (PMLR, 2021).Rogers, D. & Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 50, 742–754 (2010).ArticleÂ
Google ScholarÂ
Kayala, M. A., Azencott, C.-A., Chen, J. H. & Baldi, P. Learning to predict chemical reactions. J. Chem. Inf. Model. 51, 2209–2222 (2011).ArticleÂ
Google ScholarÂ
Genheden, S. et al. AiZynthFinder: a fast, robust and flexible open-source software for retrosynthetic planning. J. Chemoinf. 12, https://doi.org/10.1186/s13321-020-00472-1 (2020).Oglic, D., Garnett, R. & Gaertner, T. Active search in intensionally specified structured spaces. In Proc. AAAI Conference on Artificial Intelligence (AAAI, 2017).Schneider, G. & Böhm, H.-J. Virtual screening and fast automated docking methods. Drug. Discov. Today 7, 64–70 (2002).ArticleÂ
Google ScholarÂ
Hartenfeller, M. et al. DOGS: reaction-driven de novo design of bioactive compounds. PLoS Comput. Biol. 8, 1–12 (2012).ArticleÂ
Google ScholarÂ
Reker, D. & Schneider, G. Active-learning strategies in computer-assisted drug discovery. Drug. Discov. Today 20, 458–465 (2015).ArticleÂ
Google ScholarÂ
Oglic, D. et al. Active search for computer-aided drug design. Mol. Inform. 37, https://doi.org/10.1002/minf.201700130 (2018).Buterez, D., Janet, J. P., Kiddle, S. J., Oglic, D. & Lio, P. Transfer learning with graph neural networks for improved molecular property prediction in the multi-fidelity setting. Nat. Commun. 15, 1517 (2024).ArticleÂ
Google ScholarÂ
Ucar, T. et al. Improving antibody humanness prediction using patent data. In 41st International Conference on Machine Learning (PMLR, 2024).Jumper, J. M. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).ArticleÂ
Google ScholarÂ
Kovaltsuk, A. et al. Observed antibody space: a resource for data mining next-generation sequencing of antibody repertoires. J. Immunol. 201, 2502–2509 (2018).ArticleÂ
Google ScholarÂ
Dunbar, J. et al. SAbDab: the structural antibody database. Nucleic Acids Res. 42, D1140–D1146 (2013).ArticleÂ
Google ScholarÂ
Ji, Y., Zhou, Z., Liu, H. & Davuluri, R. V. DNABERT: pre-trained bidirectional encoder representations from transformers model for DNA-language in genome. Bioinformatics 37, 2112–2120 (2021).ArticleÂ
Google ScholarÂ
Tang, L. Large models for genomics. Nat. Meth. 20, 1868 (2023).ArticleÂ
Google ScholarÂ
Yang, F. et al. scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data. Nat. Mach. Intell. 4, 852–866 (2022).ArticleÂ
Google ScholarÂ
John, B. et al. Human microRNA targets. PLOS Biol. 2, e363 (2004).ArticleÂ
Google ScholarÂ
Hopf, T. A. et al. Mutation effects predicted from sequence co-variation. Nat. Biotechnol. 35, 128–135 (2017).ArticleÂ
Google ScholarÂ
Rohl, C. A., Strauss, C. E., Misura, K. M. & Baker, D. Protein structure prediction using Rosetta. Methods Enzymol. 383, 66–93 (2004).ArticleÂ
Google ScholarÂ
Baker, D. & Sali, A. Protein structure prediction and structural genomics. Science 294, 93–96 (2001).ArticleÂ
Google ScholarÂ
McKinney, B. A., Reif, D. M., Ritchie, M. D. & Moore, J. H. Machine learning for detecting gene–gene interactions. Appl. Bioinform. 5, 77–88 (2006).ArticleÂ
Google ScholarÂ
Van Steen, K. Travelling the world of gene–gene interactions. Brief. Bioinform. 13, 1–19 (2012).ArticleÂ
Google ScholarÂ
Gainza, P. et al. Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning. Nature Meth. 17, 184–192 (2019).ArticleÂ
Google ScholarÂ
Martinkus, K. et al. AbDiffuser: full-atom generation of in-vitro functioning antibodies. Adv. Neural Inf. Process. Syst. 36, 40729–40759 (2023).
Google ScholarÂ
Raybould, M. & Deane, C. The therapeutic antibody profiler for computational developability assessment. Methods in Molecular Biology 13, 115–125 (2022).ArticleÂ
Google ScholarÂ
Abanades, B., Georges, G., Bujotzek, A. & Deane, C. M. ABlooper: fast accurate antibody CDR loop structure prediction with accuracy estimation. Bioinformatics 38, 1877–1880 (2022).ArticleÂ
Google ScholarÂ
Gong, J. et al. xTrimoGene: an efficient and scalable representation learner for single-cell RNA-seq data. Adv. Neural Inf. Process. Syst. 36, 69391–69403 (2023).
Google ScholarÂ
Baldi, P. & Chauvin, Y. Neural networks for fingerprint recognition. Neural Comput. 5, 402–418 (1993).ArticleÂ
Google ScholarÂ
Ciresan, D., Giusti, A., Gambardella, L. & Schmidhuber, J. Deep neural networks segment neuronal membranes in electron microscopy images. Adv. Neural Inf. Process. Syst. 25, 2843–2851 (2012).
Google ScholarÂ
CireÅŸan, D. C., Giusti, A., Gambardella, L. M. & Schmidhuber, J. Mitosis detection in breast cancer histology images with deep neural networks. In Medical Image Computing and Computer-Assisted Intervention 411–418 (Springer, 2013).Wang, J. et al. Detecting cardiovascular disease from mammograms with deep learning. IEEE Trans. Medical Imaging 36, 1172–1181 (2017).ArticleÂ
Google ScholarÂ
Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 115–118 (2017).ArticleÂ
Google ScholarÂ
Klang, E. et al. Deep learning algorithms for automated detection of Crohn’s disease ulcers by video capsule endoscopy. Gastrointest. Endosc. 91, 606–613.e2 (2020).ArticleÂ
Google ScholarÂ
Ackerman, M. J. The visible human project: a resource for education. Acad. Med. 74, 667–670 (1999).ArticleÂ
Google ScholarÂ
Lundervold, A. S. & Lundervold, A. An overview of deep learning in medical imaging focusing on MRI. Z. Med. Phys. 29, 102–127 (2019).ArticleÂ
Google ScholarÂ
Liu, S. et al. Deep learning in medical ultrasound analysis: a review. Engineering 5, 261–275 (2019).ArticleÂ
Google ScholarÂ
Brattain, L. J., Telfer, B. A., Dhyani, M., Grajo, J. R. & Samir, A. E. Machine learning for medical ultrasound: status, methods, and future opportunities. Abdom. Radiol. 43, 786–799 (2018).ArticleÂ
Google ScholarÂ
Ng, K. et al. PARAMO: a PARAllel predictive MOdeling platform for healthcare analytic research using electronic health records. J. Biomed. Inform. 48, 160–170 (2014).ArticleÂ
Google ScholarÂ
Steinhubl, S. R., Wolff-Hughes, D. L., Nilsen, W., Iturriaga, E. & Califf, R. M. Digital clinical trials: creating a vision for the future. npj Digit. Med. 2, 126 (2019).ArticleÂ
Google ScholarÂ
Dunn, J. et al. Wearable sensors enable personalized predictions of clinical laboratory measurements. Nat. Med. 27, 1105–1112 (2021).ArticleÂ
Google ScholarÂ
Steinhubl, S. R. et al. Effect of a home-based wearable continuous ECG monitoring patch on detection of undiagnosed atrial fibrillation. JAMA 320, 146–155 (2018).ArticleÂ
Google ScholarÂ
Pandit, J. A., Radin, J. M., Quer, G. & Topol, E. J. Smartphone apps in the COVID-19 pandemic. Nat. Biotechnol. 40, 1013–1022 (2022).ArticleÂ
Google ScholarÂ
Strain, T. et al. Wearable-device-measured physical activity and future health risk. Nat. Med. 26, 1385–1391 (2020).ArticleÂ
Google ScholarÂ
Acosta, J. N., Falcone, G. J., Rajpurkar, P. & Topol, E. J. Multimodal biomedical AI. Nat. Med. 28, 1773–1784 (2022).ArticleÂ
Google ScholarÂ
Stahlschmidt, S. R., Ulfenborg, B. & Synnergren, J. Multimodal deep learning for biomedical data fusion: a review. Brief. Bioinform. 23, bbab569 (2022).ArticleÂ
Google ScholarÂ
Zhavoronkov, A. Artificial intelligence for drug discovery, biomarker development, and generation of novel chemistry. Mol. Pharm. 15, 4311–4313 (2018).ArticleÂ
Google ScholarÂ
Mann, M., Kumar, C., Zeng, W.-F. & Strauss, M. T. Artificial intelligence for proteomics and biomarker discovery. Cell Syst. 12, 759–770 (2021).ArticleÂ
Google ScholarÂ
Mandair, D., Reis-Filho, J. S. & Ashworth, A. Biological insights and novel biomarker discovery through deep learning approaches in breast cancer histopathology. npj Breast Cancer 9, 21 (2023).Lin, Q., Oglic, D., Lam, H.-K., Curtis, M. & Cvetkovic, Z. A Hybrid GCN-LSTM model for ventricular arrhythmia classification based on ECG pattern similarity. In 46th Annual International Conference IEEE Engineering in Medicine and Biology Society (EMBC 2024) (IEEE, 2024).Beaulieu-Jones, B. K., Greene, C. S. & Consortium, P. R. O.-A. A. C. T. Semi-supervised learning of the electronic health record for phenotype stratification. J. Biomed. Inform. 64, 168–178 (2016).ArticleÂ
Google ScholarÂ
Bent, B. et al. Non-invasive wearables for remote monitoring of HbA1c and glucose variability: proof of concept. BMJ Open Diabetes Res. Care 9, e002027 (2021).ArticleÂ
Google ScholarÂ
Smit, L. C., Dikken, J., Schuurmans, M. J., de Wit, N. J. & Bleijenberg, N. Value of social network analysis for developing and evaluating complex healthcare interventions: a scoping review. BMJ Open 10, e039681 (2020).ArticleÂ
Google ScholarÂ
Gupta, A. & Katarya, R. Social media based surveillance systems for healthcare using machine learning: a systematic review. J. Biomed. Inform. 108, 103500 (2020).ArticleÂ
Google ScholarÂ
Jensen, A. B. et al. Temporal disease trajectories condensed from population-wide registry data covering 6.2 million patients. Nat. Commun. 5, 4022 (2014).ArticleÂ
Google ScholarÂ
Miotto, R., Kidd, B. A. & Dudley, J. T. Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Sci. Rep. 6, 26094 (2016).ArticleÂ
Google ScholarÂ
Lee, C. K., Hofer, I., Gabel, E., Baldi, P. & Cannesson, M. Development and validation of a deep neural network model for prediction of postoperative in-hospital mortality. Anesthesiology 129, 649–662 (2018).ArticleÂ
Google ScholarÂ
Pham, T., Tran, T., Phung, D. & Venkatesh, S. Predicting healthcare trajectories from medical records: a deep learning approach. J. Biomed. Inform. 69, 218–229 (2017).ArticleÂ
Google ScholarÂ
Van Der Schaar, M. & Alaa, A. M. Synthetic healthcare data generation and assessment: challenges, methods, and impact on machine learning. In International Conference on Machine Learning (PMLR, 2021).Weng, L. What are diffusion models? https://lilianweng.github.io/posts/2021-07-11-diffusion-models (2021).Rezende, D. J., Mohamed, S. & Wierstra, D. Stochastic backpropagation and approximate inference in deep generative models. In 31st International Conference on Machine Learning 4, 3057–3070 (PMLR, 2014).Kingma, D. P. & Welling, M. Auto-encoding variational Bayes. In 2nd International Conference on Learning Representations (ICLR, 2014).Goodfellow, I. et al. Generative adversarial networks. Adv. Neural Inf. Process. Syst. 27, 2672–2680 (2014).
Google ScholarÂ
Sohl-Dickstein, J., Weiss, E. A., Maheswaranathan, N. & Ganguli, S. Deep unsupervised learning using nonequilibrium thermodynamics. In 32nd International Conference on Machine Learning 3, 2246–2255 (PMLR, 2015).Song, Y. & Ermon, S. Generative modeling by estimating gradients of the data distribution. Adv. Neural Inf. Process. Syst. 32, 11918–11930 (2019).
Google ScholarÂ
Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 33, 6840–6851 (2020).
Google ScholarÂ
van den Oord, A., Kalchbrenner, N. & Kavukcuoglu, K. Pixel recurrent neural networks. Proc. 33rd International Conference on Machine Learning 48, 1747–1756 (PMLR, 2016).Liu, J. et al. Towards out-of-distribution generalization: a survey. Preprint at https://doi.org/10.48550/arXiv.2108.13624 (2021).Bayer, J. et al. Universal ventricular coordinates: a generic framework for describing position within the heart and transferring data. Med. Image Anal. 45, 83–93 (2018).ArticleÂ
Google ScholarÂ
Kovatchev, B. A century of diabetes technology: signals, models, and artificial pancreas control. Trends Endocrinol. Metab. 30, 432–444 (2019).ArticleÂ
Google ScholarÂ
Ghaffarizadeh, A., Heiland, R., Friedman, S. H., Mumenthaler, S. M. & Macklin, P. PhysiCell: An open source physics-based cell simulator for 3-D multicellular systems. PLOS Comput. Biol. 14, e1005991 (2018).ArticleÂ
Google ScholarÂ