Identifying and embedding transferability in data-driven representations of chemical space

Transferability, especially in the context of model generalization, is a paradigm of all scientific disciplines. However, the rapid advancement of machine learned model development threatens this paradigm, as it can be difficult to understand how transferability is embedded (or missed) in complex models developed using large training data sets. Two related open problems are how to identify, without relying on human intuition, what makes training data transferable; and how to embed transferability into training data. To solve both problems for ab initio chemical modelling, an indispensable tool in everyday chemistry research, we introduce a transferability assessment tool (TAT) and demonstrate it on a controllable data-driven model for developing density functional approximations (DFAs). We reveal that human intuition in the curation of training data introduces chemical biases that can hamper the transferability of data-driven DFAs. We use our TAT to motivate three transferability principles; one of which introduces the key concept of transferable diversity. Finally, we propose data curation strategies for general-purpose machine learning models in chemistry that identify and embed the transferability principles.

This article is Open Access

Please wait while we load your content…

Something went wrong. Try again?

Identifying and embedding transferability in data-driven representations of chemical space

John I. Brauman (1937–2024) | Science

Hot and cold Earth through time | Science

KRAS takes the road to destruction | Science

Psychedelic research at a crossroads | Science

Pakistan’s fencing threatens conservation | Science

Hot Topics

John I. Brauman (1937–2024) | Science

Hot and cold Earth through time | Science

KRAS takes the road to destruction | Science

Related Articles

Balancing Act: Pregnancy and Bipolar Disorder

Cohesion at the cellular level: flexible yet stable

Gut bacteria influence responses to immunotherapy in patients with asbestos related cancer

Quick Links

Must Read

John I. Brauman (1937–2024) | Science

Hot and cold Earth through time | Science

KRAS takes the road to destruction | Science

Psychedelic research at a crossroads | Science

Popular Articles

John I. Brauman (1937–2024) | Science

Hot and cold Earth through time | Science

KRAS takes the road to destruction | Science