Cartesian atomic cluster expansion for machine learning interatomic potentials

The CACE architectureAtomic graphAs illustrated in Fig. 1, each atomic structure is represented as a graph, with atoms as nodes, and directed edges that point from pairs of sender atoms to receiver atoms that are within a cutoff radius of rcut. The state of an atom i is denoted by the tuple σi = (ri, θi, hi): ri is its Cartesian position, θi encodes its chemical element α, and hi is the learnable features. Later we will use the superscript, e.g., \({\sigma }_{i}^{(t)}\), to denote the atomic state in the message passing layer t, but for now we omit the layer-dependence indices for simplicity.Fig. 1: Schematic of the CACE potential.a–h show each step of the operation, and i illustrates the rule for making rotationally invariant features.For both the sender node and the receiver node of an edge, elements αi are first assigned to vectors of lengths equal to the total number of elements Nelement through a one-hot embedding, and then the one-hot vectors are multiplied with a learnable weight matrix of size Nembedding × Nelement that outputs a learnable embedding θi for each atom i of length Nembedding. Typically, Nembedding is selected to be a small number between 1 and 4. This continuous element embedding eliminates the unfavorable scaling of using discrete chemical element types when representing atomic environments, and also allows alchemical learning.Edge basisThe edge basis, shown in Fig. 1b, is used to describe the spatial arrangement of an atom j around the atom i (\({\overrightarrow{{{{\bf{r}}}}}}_{ji}\equiv {{{{\bf{r}}}}}_{j}-{{{{\bf{r}}}}}_{i}\), \({r}_{ji}=| {\overrightarrow{{{{\bf{r}}}}}}_{ji}|\) and \({\hat{{{{\bf{r}}}}}}_{ji}={\overrightarrow{{{{\bf{r}}}}}}_{ji}{/}{r}_{ji}\)), and is formed by the product of a radial basis R, an angular basis L, and an edge-type basis T:$${\chi }_{cn{{{\bf{l}}}}}({\sigma }_{i},{\sigma }_{j})={T}_{c}({\sigma }_{i},{\sigma }_{j}){R}_{n,cl}({r}_{ji}){L}_{{{{\bf{l}}}}}({\hat{{{{\bf{r}}}}}}_{ji}).$$
(1)
Eqn. (1) is general and fundamental for many MLIPs, and a systematic discussion of the design choices for each term can be found in ref. 30.In CACE, the type of edge T depends on the states of the two nodes it connects. In the initial layer, the edge states are only determined by the chemical elements of the two atoms i and j. The edge is encoded using the flattened tensor product of the two node embedding vectors, T = θi ⊗ θj, resulting in an edge feature of length \(c={N}_{{{{\rm{embedding}}}}}^{2}\). In a later message passing layer t, the edge features can depend on the node hidden features \({{{{\bf{h}}}}}_{i}^{(t)}\). This edge encoding formed by the tensor product can be interpreted as the attention mechanism31 between the two atoms, with the embedding vectors of the sender and the receiver nodes the query and the key.The orientation-dependent angular basis is$${L}_{{{{\bf{l}}}}}({\hat{{{{\bf{r}}}}}}_{ji})={({x}_{j}-{x}_{i})}^{{l}_{x}}{({y}_{j}-{y}_{i})}^{{l}_{y}}{({z}_{j}-{z}_{i})}^{{l}_{z}},$$
(2)
where l ≡ (lx, ly, lz) are the angular momentum numbers satisfying lx + ly + lz = l, with l being the total angular momentum. Such angular basis is just spherical harmonic functions in Cartesian basis. Indeed, one can easily transform the function of a certain angular momentum number l in Cartesian and spherical harmonic bases by a simple matrix multiplication32. As examples of operations in the Cartesian angular basis, one can add two vectors such as$${L}_{{{{{\bf{l}}}}}_{3}}({\hat{{{{\bf{r}}}}}}_{ji}+{\hat{{{{\bf{r}}}}}}_{ik})=\mathop{\sum}\limits_{{{{{\bf{l}}}}}_{1}+{{{{\bf{l}}}}}_{2}={{{{\bf{l}}}}}_{3}}{{{\mathcal{P}}}}({{{{\bf{l}}}}}_{1},{{{{\bf{l}}}}}_{2}){L}_{{{{{\bf{l}}}}}_{1}}({\hat{{{{\bf{r}}}}}}_{ji}){L}_{{{{{\bf{l}}}}}_{2}}({\hat{{{{\bf{r}}}}}}_{ik})$$
(3)
where the prefactor \({{{\mathcal{P}}}}({{{{\bf{l}}}}}_{1},{{{{\bf{l}}}}}_{2})=\frac{{l}_{3x}!}{{l}_{1x}!{l}_{2x}!}\frac{{l}_{3y}!}{{l}_{1y}!{l}_{2y}!}\frac{{l}_{3z}!}{{l}_{1z}!{l}_{2z}!}\) with l1 + l2 = l3. One can obtain vector dot products such as$${({\hat{{{{\bf{r}}}}}}_{ji}\cdot {\hat{{{{\bf{r}}}}}}_{ki})}^{l}=\mathop{\sum}\limits_{{{{\bf{l}}}}}{{{\mathcal{C}}}}({{{\bf{l}}}}){L}_{{{{\bf{l}}}}}({\hat{{{{\bf{r}}}}}}_{ji}){L}_{{{{\bf{l}}}}}({\hat{{{{\bf{r}}}}}}_{ki}),$$
(4)
where \({{{\mathcal{C}}}}({{{\bf{l}}}})=\frac{l!}{{l}_{x}!{l}_{y}!{l}_{z}!}\) is the combinatorial coefficient, and the sum is over all l with lx + ly + lz = l. One can prove the relationships in Eqns. (3) and (4) above using the multinomial formula.Rn,cl(rji) is a set of radial basis consisting of n functions, coupled with the total angular momentum l and the edge channel c. CACE first uses a set of raw radial basis \({R}_{\tilde{n}}({r}_{ji})\) consisting of \(\tilde{n}\) functions, which can be a set of trainable Bessel functions with different frequencies15 multiplied by a smooth cutoff function. The raw radial basis is mixed using a linear transformation to form Rn,cl for each c and l,$${R}_{n,cl}({r}_{ji})=\mathop{\sum}\limits_{\tilde{n}}{R}_{\tilde{n}}({r}_{ji}){W}_{\tilde{n}n,cl}$$
(5)
where \({W}_{\tilde{n}n,cl}\) is a \(\tilde{n}\times n\)-sized matrix indexed by c and l. This means that, for each l and c combination, there is a different set of n radial functions. In practice, the mixing is performed on the atom-centered basis for efficiency, as described below.Atom-centered basis (A-basis)As shown in Fig. 1c, the atom-centered representation is made by summing over all the edges of a node,$${A}_{i,cn{{{\bf{l}}}}}=\mathop{\sum}\limits_{j\in {{{\mathcal{N}}}}(i)}{\chi }_{cn{{{\bf{l}}}}}({\sigma }_{i},{\sigma }_{j}).$$
(6)
This corresponds to a projection of the edge basis on the atomic density, which is commonly referred to as the “density trick” and was introduced originally to construct SOAP and bispectrum descriptors8.Symmetrized basis (B-basis)The orientation-dependent A features are symmetrized to get the invariant B features of different body orders ν as sketched in Fig. 1c. For l = 0, A is already rotationally invariant, so the ν = 1 features \({B}_{i,cn0}^{(1)}={A}_{i,cn{{{\bf{0}}}}}\). For ν = 233,$${B}_{i,cn{l}_{1}}^{(2)}=\mathop{\sum}\limits_{{{{{\bf{l}}}}}_{1}}{{{\mathcal{C}}}}({{{{\bf{l}}}}}_{1}){A}_{i,cn{{{{\bf{l}}}}}_{1}}^{2}.$$
(7)
Doing the expansion explicitly and using Eq. (3), one can show that33$${B}_{i,cn{l}_{1}}^{(2)}=\mathop{\sum}\limits_{j,k\in {{{\mathcal{N}}}}(i)}{T}_{c}({\sigma }_{i},{\sigma }_{j}){R}_{n,c{l}_{1}}({r}_{ji}){T}_{c}({\sigma }_{i},{\sigma }_{k}){R}_{n,c{l}_{1}}({r}_{ki})\times {\cos }^{{l}_{1}}({\theta }_{ijk}).$$
(8)
So this term includes three-body contributions of \({\overrightarrow{{{{\bf{r}}}}}}_{ji}\) and \({\overrightarrow{{{{\bf{r}}}}}}_{ki}\) together with their enclosed angle θijk. For ν = 3,$${B}_{i,cn{l}_{1}{l}_{2}}^{(3)}=\mathop{\sum}\limits_{{{{{\bf{l}}}}}_{1},{{{{\bf{l}}}}}_{2}}{{{\mathcal{C}}}}({{{{\bf{l}}}}}_{1}){{{\mathcal{C}}}}({{{{\bf{l}}}}}_{2}){A}_{i,cn{{{{\bf{l}}}}}_{1}}{A}_{i,cn({{{{\bf{l}}}}}_{1}+{{{{\bf{l}}}}}_{2})}{A}_{i,cn{{{{\bf{l}}}}}_{2}},$$
(9)
which includes four-body interaction terms in the form \({R}_{n,c{l}_{1}}({r}_{ji}){R}_{n,c({l}_{1}+{l}_{2})}({r}_{ki}){R}_{n,c{l}_{2}}({r}_{hi}){\cos }^{{l}_{1}}({\theta }_{ijk}){\cos }^{{l}_{2}}({\theta }_{ikh})\). For ν = 4,$$\begin{array}{l}{B}_{i,cn{l}_{1}{l}_{2}{l}_{3}}^{(4)}\,=\,\mathop{\sum }\limits_{{{{{\bf{l}}}}}_{1},{{{{\bf{l}}}}}_{2},{{{{\bf{l}}}}}_{3}}{{{\mathcal{C}}}}({{{{\bf{l}}}}}_{1}){{{\mathcal{C}}}}({{{{\bf{l}}}}}_{2}){{{\mathcal{C}}}}({{{{\bf{l}}}}}_{3})\\\qquad\qquad\times {A}_{i,cn{{{{\bf{l}}}}}_{1}}{A}_{i,cn({{{{\bf{l}}}}}_{1}+{{{{\bf{l}}}}}_{2})}{A}_{i,cn({{{{\bf{l}}}}}_{2}+{{{{\bf{l}}}}}_{3})}{A}_{i,cn{{{{\bf{l}}}}}_{3}},\end{array}$$
(10)
which includes five-body contributions in the form \({R}_{n,c{l}_{1}}({r}_{ji}){R}_{n,c({l}_{1}+{l}_{2})}({r}_{ki}){R}_{n,c({l}_{2}+{l}_{3})}({r}_{hi}){R}_{n,c{l}_{3}}({r}_{gi}){\cos }^{{l}_{1}}({\theta }_{ijk}){\cos }^{{l}_{2}}({\theta }_{ikh}){\cos }^{{l}_{3}}({\theta }_{ihg})\).The general rule for constructing the symmetrized basis is illustrated in Fig. 1i. The key idea is that pairs of angular features L need to have a shared factor of l, when summed over to form invariants. To eliminate linearly dependent terms, the combination of the angular features that only differ from others by permutations are not considered. Moreover, the shared factor l between any pairs of L needs to be greater than zero, because if any l = 0 the resulting invariant can be constructed using the product of lower-order invariants and can thus be eliminated for redundancy. In other words, a valid combination illustrated in Fig. 1i needs to correspond to one connected graph, rather than two disconnect graphs of lower body orders. For instance, terms such as \({B}_{i,cn012}^{(4)}={B}_{i,cn0}^{(1)}{B}_{i,cn12}^{(3)}\), \({B}_{i,cn102}^{(4)}={B}_{i,cn1}^{(2)}{B}_{i,cn2}^{(2)}\) are eliminated. The CACE framework thus makes polynomially independent invariants. Other radial or edge terms that do not depend on l or only depend on the total angular momentum l can be multiplied on top without breaking the rotational symmetry.Based on the expansions, one can see that these CACE B(ν) terms are similar to the B(ν) terms in the original ACE paper4 or the invariant basis functions in MTP7, with two differences: First, many linearly or polynomially-dependent features are elimiated during the symmetrization stage in CACE, resulting in a much more compact basis set than the original ACE4 or MTPs7, where the basis functions span the space of all invariants including linearly dependent terms27. Second, since each radial channel Rn,cl contains all raw radial channels with trainable weights (Eq. (5)), it is not necessary to couple different radial channels together when forming the basis B. This means that the number of radial channels of B stays n rather than going up as nν as the body order increases.The B basis is a complete basis of permutation and rotation invariant functions of the atomic environment4. In practice, one truncates the series up to some maximum values of \({l}_{\max }\) and \({\nu }_{\max }\) to make the computation tractable. The dimension of the B basis is c × n × NL, where each factor comes from the number of edge channels, radial basis, and angular basis, respectively. Fig. 2 shows the total number of angular features NL up to certain \({l}_{\max }\) and \({\nu }_{\max }\). The number of Nl is compact even for \({\nu }_{\max }=5\). For most practical applications, \({l}_{\max }\) between 2 and 4 and \({\nu }_{\max }\) between 2 and 4 are sufficient for good accuracy when fitting CACE potentials.Fig. 2The total number of angular features NL, for maximum values of \({l}_{\max }\) and \({\nu }_{\max }\).Notice that in the CACE formulation, each invariant B feature can be evaluated independently, and there is no need to precompute or store the high-dimensional many-body product basis in ACE4 or moment tensors in MTPs7 even in the sparse form. The reason why this is efficient can be understood as it effectively exploits the extreme sparsity of the Clebsch-Gordan coupling matrix when forming invariants.Message passingThe framework above reproduces the original ACE, although formulated entirely in Cartesian space. In ACE, the atomic energy of each atom is determined by neighboring atoms within its cutoff radius rcut. On top of that, one can incorporate message passing, which iteratively propagates information along the edges, thereby communicating the local information and effectively enlarging the perceptive field of each atom. A systematic discussion on the design choices of message passing can be found in refs. 30,34.CACE uses two types of message passing (MP) mechanisms. The first type, denoted by the green arrow in Fig. 1e, communicates the A information between neighboring atoms: the orientation-dependent message from atom j to i can be expressed as:$${m}_{1,j\to i,cn{{{\bf{l}}}}}^{(t)}={F}_{cnl}({r}_{ji}){A}_{j,cn{{{\bf{l}}}}}^{(t)}$$
(11)
where Fcnl(rji) is a filter function that takes the scalar value of the distance rji, and it can depend on edge channel, radial channel and total angular momentum. In practice, we use an exponential decay function with trainable exponent, to capture the physics that farther atoms have less effect, although other choices are possible.The second MP mechanism, denoted by the two blue arrows in Fig. 1e, uses a recursive edge embedding scheme analogous to recursively embedded atom neural networks (REANN)33 and multilayer ACE (ml-ACE)35:$${m}_{2,j\to i,cn{{{\bf{l}}}}}^{(t)}={H}_{cl}\left({B}_{j}^{(t)}\right){\chi }_{cn{{{\bf{l}}}}},$$
(12)
where Hcl is a trainable scalar function that can optionally depend on edge channel and total angular momentum. We simply use a linear neural network (NN) layer for H.The aggregated message is$${M}_{i,cn{{{\bf{l}}}}}^{(t)}=\mathop{\sum}\limits_{j\in {{{\mathcal{N}}}}(i)}{m}_{1,j\to i,cn{{{\bf{l}}}}}^{(t)}+{m}_{2,j\to i,cn{{{\bf{l}}}}}^{(t)}$$
(13)
One can then update the A representation of each node,$${A}_{i,cn{{{\bf{l}}}}}^{(t+1)}=G\left({A}_{i,cn{{{\bf{l}}}}}^{(t)},{M}_{i,cn{{{\bf{l}}}}}^{(t)}\right),$$
(14)
and for G we use a simple linear function. After obtaining the updated A(t+1), we again take the symmetrized B-basis as previously described. It is easy to verify that these symmetrized representations are dependent only on the scalar values of the atomic distances and angles and thus rotationally invariant. Note that one can make various design choices for the F, H and G functions: they can be dependent on c, n, and l, without breaking the invariance at the symmetrization stage.In practice, only one or no MP layer is typically used in CACE. This is because the use of higher-body-order atomic features and messages reduces the need for many MP layers, as in the case of MACE18.Readout and outputAfter T MP layers, all the resulting B features at each layer are concatenated. These invariant features are compact; the number of the features can be computed as T × c × n × NL. Then, a multilayer perceptron (MLP) maps these invariant features to the target of the atomic energy of each atom i,$${E}_{i}=MLP\left({B}_{i}^{(0)}\oplus {B}_{i}^{(1)}\oplus \ldots \oplus {B}_{i}^{(T)}\right).$$
(15)
In practice, for the readout function, we use the sum of a linear NN and a multilayer NN, as the former preserves the body-ordered contributions and the latter considers the higher-ordered terms from the truncated series.Finally, the total potential energy of the system is the sum of all the atomic energies of each atom, and the forces can readily be computed by taking the derivatives of the total energy with respect to atomic coordinates.Bulk waterAs an example application on complex fluids, we demonstrate CACE on a dataset of 1593 liquid water configurations36. The details of the training are in the “Methods” section. The root mean square errors (RMSE) of the energy per water molecule and forces on the validation set for the CACE model and other MLIPs are presented in Table 1. MLIPs based on three-body and three-body descriptors without message passing, including BPNN based on ACSF5, embedded atom neural network (EANN)37, and DeepMD have similar errors, and are amongst the highest in this comparison. Linear ACE uses higher-body order features, which can be directly compared to the CACE model trained with the same cutoff and without message passing (rcut = 5.5 Å, T = 0), and the latter has lower error. The REANN potential33 with three message passing layers, the equivariant message passing model NequIP17, the high body order MACE18 model, and the CACE model with T = 1 have the lowest errors.Table 1 Root mean square errors (RMSE) for energy (E) per water molecule (in meV/H2O) and force (in meV Å−1) on a liquid water dataset from ref. 36For running molecular dynamics (MD) simulations using MLIPs, besides accuracy, a crucial requirement is stability, the possibility to run long enough trajectories with the system remaining in physically reasonable states and the simulation not exploding, even at conditions with few or no training data. To assess the simulation stability, we performed independent simulations of 300 ps in length for water at 1 g mL−1 at 300 K, 500 K, 1000 K, 1500 K, and 2000 K using the CACE T = 1 model. The simulation cell contains 512 water molecules. The time step was 1 fs and the Nosé-Hoover thermostat was used. The upper panel of Fig. 3 shows the oxygen–oxygen (O–O) radial distribution function (RDF). At 300 K, the computed O–O RDF is in excellent agreement with experiment from X-ray diffraction measurements38. Note that the nuclear quantum effects have a slight destructuring effect in the O-O RDF, but the effect is quite small36. The mean squared displacements (MSD) from these simulations are shown on the lower panel of Fig. 3. The diffusivity D, obtained by fitting a linear function to the MSD, at 300 K agrees well with the DFT MD result (DDFT = 2.67 ± 0.10 2 ps−1) from ref. 39. At high temperatures up to 2000 K, the MD simulations remained stable, which demonstrates the superior stability of the CACE potential. Note that we do not expect the result at very high temperatures to be physically predictive, and this example is just to illustrate the stability of CACE. Such stability is not only necessary for converging statistical properties but also allows the refinement of the potential: to iteratively improve the accuracy of the potential under certain conditions, one can perform active learning which collects snapshots from corresponding MD trajectories and use these snapshots for re-training.Fig. 3: Simulation results of water using CACE T = 1 model.a Oxygen-oxygen radial distribution functions (RDF) at different temperatures and 1 g/mL computed via classical MD simulations in the NVT ensemble. The experimental O-O RDF at ambient conditions was obtained from ref. 38. b Mean squared displacement (MSD) from the liquid water simulations. Diffusivities (D) are shown in the legends.Small molecules: ethanol and 3BPATo demonstrate CACE on small organic molecules, we fitted to 1000 ethanol structures from the MD17 database40 that is often used for benchmark purposes. Table 2 shows a comparison of the different MLIPs trained on energies and forces for the same MD17-ethanol dataset. The energy accuracy of CACE is similar to the NequIP model with T = 5, although the CACE force error is larger. The stability (S) in Table 2 measures how long MD simulations at 500 K using the MLIP is physically sensical (no pathological behavior or enter physically prohibitive states) in runs with a maximum length of 300 ps, so 300 ps is the maximum score in this case41. The stability values for other MLIPs are from ref. 41 which used 10,000 training structures and a time step of 0.5 fs in MD. For CACE, we used 1000 training structures and a time step of 1 fs, which is more stringent, but the MD had nevertheless remained stable. As observed in refs. 41,42, force accuracy does not imply stability, and the latter is a key metric for MLIPs.Table 2 Mean absolute error (MAE) of energy (E) and force (F) for ethanol in the MD-17 dataset, in units of meV and meV Å−1, respectively, trained on 1000 reference configurationsAs another example, we fitted to the (3-(benzyloxy)pyridin-2-amine) (3BPA) dataset43. Table 3 shows the errors of different potentials when extrapolating to higher temperatures. NequIP and MACE perform the best, while CACE is somewhere between them and ACE. We then compared the dihedral potential energy surface of the molecule. Figure 4 shows the complex energy landscapes of the α-γ plane predicted by DFT and by CACE, for the case of β = 120∘, the plane with the fewest training points. Nevertheless, the CACE landscape closely resembles the ground truth and is regular, more so than the other potentials benchmarked in ref. 43 despite those were trained on a more diverse dataset including high-temperature configurations.Table 3 Root mean square errors (RMSE) for energy (E) and force (F) on the 3BPA temperature transferability dataset, reported in units of meV and meV Å−1Fig. 4: The dihedral scan benchmark of the 3BPA molecule.a Molecular structure and three dihedral angles. b The dihedral potential energy landscape for β = 120∘, as predicted by DFT and CACE. The white dots on the DFT potential energy surface correspond to all the training configurations that have β between 100∘ and 140∘.25-element high-entropy alloysHigh-entropy alloys, which are composed of several metallic elements, have unique mechanical, thermodynamics, and catalytic properties44. They are challenging to model due to the high number of elements involved, for which many MLIPs scale poorly with. Reference45 introduced a 25 d-block transition metal HEA dataset of distorted crystalline structures. Reference45 further performed an alchemical learning (AL) that explicit compresses chemical space, leading to a model that is both interpretable and very stable.In Fig. 5a, the CACE learning curve is compared to the ones from ref. 45, including a 3B model that has an atomic energy baseline and three-body terms, and another 3B+NN model that further incorporates a full set of pair potentials and a non-linear term built on top of contracted power spectrum features. Overall, the CACE model achieved lower error and significantly more efficient learning for fewer data, and it has only 91,232 trainable parameters, compared to more than 160,000 in the 3B+NN model45.Fig. 5: Benchmark on the HEA25 dataset.a Learning curves for different models. Both alchemical learning (AL) models are from ref. 45. b The first two principal components (PCs) of the CACE embedding matrix. The periods are highlighted with orange, blue, and green lines. Interpolated positions for Re and Os are indicated with empty circles. c Energy comparison between DFT and CACE for a high temperature (5000 K) and a low temperature (300 K) trajectory. The inset shows example solid and melt configurations from the 300 K and the 5000 K trajectories, respectively. d Comparison between the substitution energy Esub, the potential energy difference between the original and the structure substituted with Re and Os element per substitution atom, computed using DFT and CACE. The inset shows the parity plot for the force components computed for those structures.The errors of the models are presented in Table 4. A very recent architecture, Point Edge Transformer (PET) that utilizes local coordinate systems, achieved state-of-the-art performance in several benchmark systems and also has impressively low test error for the HEA25 dataset46. However, as we discuss below, the PET potential has poor transferability.Table 4 Mean absolute error (MAE) of energy (E) and force (F) for the HEA25 dataset, reported in units of meV and meV Å−1Alchemical learning capability not only makes learning more efficient, but is also essential towards achieving a foundation model that is valid across the periodic table. CACE is able to perform alchemical learning. The learnable embedding θ for each element type encodes its chemical information, and can be visualized to gain insights into the data-driven similarities. As the Nembedding = 4 in this case, we performed a principal component analysis (PCA) and plotted the first two principal component axes in Fig. 5b. The elements are arranged in a way that is strongly reminiscent of their placement in the d-block. This shows that the element embedding scheme is able to learn the nature of the periodic table in a data-driven way.A stringent way to demonstrate alchemical learning is by extrapolating to unseen elements. In this case, Re and Os are absent from the training set. Reference45 provides a hold-out set containing 61 pairs of structures, each of 36 or 48 atoms. Each pair has one fcc or bcc structure with only the original 25 elements, and another structure with up to 11 atoms randomly substituted with Re and Os. We simply took the CACE model trained for the 25-element dataset and obtained the chemical embedding of Re and Os by a linear interpolation between W and Ir, as illustrated by the empty circles in Fig. 5b. We also added atomic-energy baselines for Re and Os (obtained by fitting the residual energy to a two-parameter linear model that depends only on the Re and Os content). In Fig. 5d we show the parity plot for the substitution energy, defined as the potential energy difference between the original and the substituted structure per substitution atom. Impressively, the CACE error for those substituted structures with unseen elements is rather similar to the test error on the 25-element training set, and is lower than the AL method in ref. 45. The force comparison, shown in the inset of Fig. 5d, shows the same trend. With the PET potential, it is not clear whether alchemical learning can be performed in a straightforward way.The most impressive validation of the AL model in ref. 45 is the extrapolation to configurations very far from the training set: At 5000 K, the systems generated from constant-pressure MD simulations with Monte Carlo element swaps have melted, but the AL potential that was trained exclusively on solid structures is still able to describe the energetics of the structures. This is a test that the PET model46 did not perform well. To verify the generalizability of CACE across a wide temperature range, we recomputed the energies and forces of the structures from the 5000 K trajectory as well as another one at a low temperature of 300 K, and show the results in Fig. 5c and Table 4. The CACE errors are a bit smaller compared to ref. 45, and quite modest compared to the training error.

Hot Topics

Related Articles