E(n)-Equivariant cartesian tensor message passing interatomic potential

Equivariant functions of Cartesian tensorsCartesian tensors are the tensors that transform under rotations in Euclidean space in a simple way. In other words, a Cartesian tensor is a tensor whose components transform as a product of vectors and covectors under rotations, without any additional factors that depend on the rotation matrix itself. Specifically, a n-th rank tensor transforms under rotation as:$${{{\boldsymbol{T}}}}_{{i}_{1}{i}_{2}\cdots {i}_{n}}{\to }^{{{\boldsymbol{R}}}}{{{{\boldsymbol{T}}}}^{{\prime} }}_{{i}_{1}{i}_{2}\cdots {i}_{n}}=\left({{{\boldsymbol{R}}}}_{{i}_{1}{j}_{1}}\right)\left({{{\boldsymbol{R}}}}_{{i}_{2}{j}_{2}}\right)\cdots \left({{{\boldsymbol{R}}}}_{{i}_{n}{j}_{n}}\right){{{\boldsymbol{T}}}}_{{j}_{1}{j}_{2}\cdots {j}_{n}}$$
(1)
where R is an orthogonal matrix. Under this definition, it is easy to find that since \({{{\bf{v}}}}_{i}\to {{{\bf{v}}}}_{i}^{{\prime} }={{{\boldsymbol{R}}}}_{{ij}}{{{\bf{v}}}}_{j}\), the vectors are first-order tensors, and the dyadic product of two vectors is a second-order tensor since \({({{\bf{u}}}{{\bf{v}}})}_{{i}_{1}{i}_{2}}{\longrightarrow }^{{{\boldsymbol{R}}}}{({{\bf{u}}}{{\bf{v}}})}_{{i}_{1}{i}_{2}}^{{\prime} }=({{{\boldsymbol{R}}}}_{{i}_{1}{j}_{1}})({{{\boldsymbol{R}}}}_{{i}_{2}{j}_{2}}){({{\bf{u}}}{{\bf{v}}})}_{{j}_{1} \, {j}_{2}}\).And equivariance is a property of functions or transformations between two spaces, where the transformation preserves the relationships between the elements of those spaces. More formally, a function \({{\rm{\phi }}}:{{\bf{X}}}\to {{\bf{Y}}}\) is said to be equivariant with respect to a group G acting on two sets X and Y if for all \(g\in {{\bf{G}}}\) and \(x\in {{\bf{X}}}\), we have:$${{\rm{\phi }}}\left(g^{\circ} \, x\right)=g^{\circ} \, {{\rm{\phi }}}\left(x\right)$$
(2)
This means that applying a function ϕ to an object x and then applying a group element g to the resulting object should give the same result as first applying the group element to the object and then applying the function. And the composition of equivariant maps is also equivariant:$${{\rm{\psi }}}\left({{\rm{\phi }}}\left(g\circ x\right)\right)={{\rm{\psi }}}\left(g\circ {{\rm{\phi }}}\left(x\right)\right)=g\circ {{\rm{\psi }}}\left({{\rm{\phi }}}\left(x\right)\right)$$
(3)
Therefore, by providing some basic equivariant functions between Cartesian tensors and combining them, we can obtain an equivariant neural network. Here, we use the following three equivariant operations, whose equivariances are proven in the Supplementary Note 1:

1.

Linear combinations of tensors with the same order: \(f\left({T}_{1},{T}_{2},\cdots,{T}_{m}\right)=\sum {c}_{i}{T}_{i}\).

2.

Contraction of two tensors.

The contraction of tensors is a mathematical operation that reduces the rank of tensors by summing over one or more pairs of indices. For example, consider a 3-order tensor A and a 2-order tensor B, the contraction of them can be \({{{\boldsymbol{C}}}}_{{ijk}}={{{\boldsymbol{A}}}}_{{ijl}}{{{\boldsymbol{B}}}}_{{kl}}\), this will reduce the rank of tensors by 2. More generally, if we sum over more than one pair of indices between an x-order tensor T1 and a y-order tensor T2 such as:$${{{\rm{\phi }}}}_{z}{\left({{{\boldsymbol{T}}}}_{1},{{{\boldsymbol{T}}}}_{2}\right)}_{{a}_{1}\cdots {a}_{x-z}{b}_{1}\cdots {b}_{y-z}}={{{{\boldsymbol{T}}}}_{1}}_{{a}_{1}\cdots {a}_{x-z}{c}_{1}\cdots {c}_{z}}\cdot {{{{\boldsymbol{T}}}}_{2}}_{{b}_{1}\cdots {b}_{y-z}{c}_{1}\cdots {c}_{z}}$$
(4)
We can get a new tensor with \(x+y-2z\) order, where \(0\le z\le \min (x,y)\). When z = 0, none of the indices are contracted and the Eq. (4) becomes tensor product \({{{\boldsymbol{T}}}}_{{a}_{1}\cdots {a}_{x}{b}_{1}\cdots {b}_{y}}={{{{\boldsymbol{T}}}}_{1}}_{{a}_{1}\cdots {a}_{x}}\cdot {{{{\boldsymbol{T}}}}_{2}}_{{b}_{1}\cdots {b}_{y}}\).3. Partial derivative with respect to another Cartesian tensor: \(\frac{\partial }{\partial {{{\boldsymbol{T}}}}_{{j}_{1}{j}_{2}\cdots {j}_{n}}}\).By the combination of these operations, we can get many equivariant functions. Many common operations frequently used in other equivariant neural networks that operate on vectors, such as scaling of vectors: \({{\rm{s}}}\cdot \vec{{{\rm{v}}}}\), scalar products \(\left\langle \vec{{v}_{1}},\,\vec{{v}_{2}}\right\rangle \), vector products \(\vec{{v}_{1}}\times \vec{{v}_{2}}\) (the upper triangle part of \(\vec{{v}_{1}}\otimes \vec{{v}_{2}}-\vec{{v}_{2}}\otimes \vec{{v}_{1}}\)) can all be viewed as special cases of these three operations. Some other more complex descriptors, such as MTP descriptors4,21, can also be obtained through combinations of these operations as shown in the Supplementary Note 3.Equivariant message passing neural networkTo obtain an end-to-end machine learning model for predicting material properties, the input should be the positions \(\{{{{\bf{r}}}}_{i}\}\) and chemical elements \(\{{Z}_{i}\}\) of all atoms, and for periodic crystals, the lattice parameters should also be considered. To apply graph neural networks, we first transform the crystal structure into a graph \(\{{n}_{i},{e}_{{ij}}\}\), where each node \({n}_{i}\) corresponds to an atom i in the unit cell, and all atoms j within a given cutoff distance rcut are considered connected to \({n}_{i}\) labeled with their relative positions rij. For periodic structures, since atom j and its equivalent atom j’ may both lie within the cutoff distance of atom i, \({n}_{i}\) may have more than one edge connected to \({n}_{j}\). To extract the information of the nodes, we use the scheme of MPN. A normal MPN can be described as:$${m}_{i}^{t+1}={\oplus }_{j\in N(i)}{{{\rm{M}}}}_{t}\left({h}_{i}^{t},{h}_{j}^{t},{e}_{{ij}}\right)$$
(5)
$${h}_{i}^{t+1}={{{\rm{U}}}}_{t}\left({h}_{i}^{t},{m}_{i}^{t+1}\right)$$
(6)
where \({h}_{i}^{t}\) is the hidden feature of \({n}_{i}\) at layer t that captures its local information, messages are then passed between nodes along edges, with the message at each edge \({e}_{{ij}}\) being a function \({{{\rm{M}}}}_{t}\) of the features of the nodes connected by that edge. The ⨁ is a differentiable and permutation invariant function such as sum, mean or max to aggregate the message at each node together to produce an updated message \({m}_{i}^{t+1}\) for that node, which in turn is used to update the hidden feature with the function \({{{\rm{U}}}}_{t}\) for the next iteration.A concrete example illustrating the principles of MPN is presented in Fig. 1. We first determine the connectivity of a structure based on a given cutoff radius and convert it into a graph (quotient graph for periodic structures). Then the messages on the nodes can be passed through the edges by a two-body interaction. As the process of message passing, the information from atoms beyond the cutoff radius can also be conveyed to the central atom. As illustrated in Fig. 1b, the blue arrows represent the first time of message passing, while the yellow arrows denote the second. To be noticed, each layer of message passing is performed simultaneously on all atoms; here, we focus on a specific atom in each layer for ease of analysis. During the initial time of message passing, information of atom 1 is encoded into the hidden information of atom 2. Subsequently, in the second time of message passing, the information of atom 2 including some information of atom 1 is collectively transmitted to atom 3, thereby achieving non-local effects from atom 1 to atom 3. On the other hand, due to the interaction between atom 4 and atom 2 in the second time of message passing also containing the information from atom 1, the effective interaction is elevated from a two-body interaction to a three-body interaction. This indeed encapsulates the two advantages of the message-passing architecture.Fig. 1: The schematic diagram of message-passing networks.a the origin structure; b the graph corresponding (a). The blue arrows represent the first time of message passing of the atom 2, while the yellow arrows denote the second time of the atom 3.However, the scalar hidden feature, message, and edge information (always relatively distance between two atoms) here will limit the expressive capacity and may cause the incompleteness of atomic structure representations. As shown in Fig. 2a and d, if we only use scalar information \({h}_{i}\), \({h}_{j}\), and \({d}_{{ij}}\) to pass the message in Eq. (5) and update the feature in Eq. (6), all nodes will always produce the same embedding information. As a result, the network will be unable to distinguish between these two structures and give the same total energy. Even if the 3-body message includes angles \({\alpha }_{{ijk}}\) are taken into consideration, some structures with only 4 atoms cannot be distinguished26, as shown in Fig. 2b and e. Due to the identical atomic environments within the truncation radius, no matter how many message passing iterations are performed, these two different structures will only yield the same result. To alleviate this problem, a series of models that use high-order geometric tensors during the message passing have been proposed. For example, allowing vectors in the message passing process can differentiate Fig. 2b and e, but in the case of Fig. 2c and f which have different α, the summation in Eq. (5) would cause the network to confuse these two structures (a more detailed explanation can be seen in the Supplementary Note 5). It can be anticipated that increasing the order of tensors in message passing would enhance the expressive power of the network. Previously, the order of high-order tensor networks based on Cartesian space was typically limited to 233,34, while our method can work with any order Cartesian tensors. In the following, we use \({\scriptstyle{l}\atop}\!{{\boldsymbol{h}}}_{i}^{t}\) to represent the l-order Cartesian tensor features of node i in the t-th layer, and \({{{\bf{r}}}}^{\otimes n}\) to represent tensor product of a vector r for n times: r⊗r⊗⋯⊗r. In particular, for n = 0 we define this to a learnable function of \({||}{{\bf{r}}}{||}:\,{{{\bf{r}}}}^{\otimes 0}={{\rm{f}}}({||}{{\bf{r}}}{||})\).Fig. 2: Some structures that cannot be distinguished by message passing networks that do not utilize high-order tensor information.a, d cannot be distinguished by using two-body scalar information; b, e cannot be distinguished by using three-body scalar information26; and c, f cannot be distinguished by using two-body vector information. c, f have different α and β, so they have the same center of mass but different moment of inertia.Initialize of node featuresThe scalar features in the first layer \({\scriptstyle0\atop}\!h_{i}^{0}\) should be invariant to rotation, translation, and permutation of the atoms with the same chemical species. This is also the requirement for most descriptors used in machine learning potentials, so these descriptors such as ACSF, SOAP, ACE, MTP, etc., can be used directly to expedite the process of feature extraction. Here, we used the trainable chemical embedding similar to SchNet23 to minimize human-designed elements as much as possible. Specifically, the atomic numbers are first encoded by one-hot and then multiplied by a learnable weight matrix, resulting in a learnable embedding for each element Zi. For high-order features \({\scriptstyle{l}\atop}\!{{\boldsymbol{h}}}_{i}^{0}\) with l > 0, we set them all to 0 at the beginning.Message and aggregateTo combine the information of neighboring nodes, we need to design a message passing function \({{{\rm{M}}}}_{t}\) in Eq. (5). Considering that the hidden feature \({\scriptstyle{l}_{i}\atop}\!{{\boldsymbol{h}}}_{i}^{t}\), \({\scriptstyle{l}_{j}\atop}\!{{\boldsymbol{h}}}_{j}^{t}\), the bond info \({e}_{{ij}}\), and the target message \({\scriptstyle{l}_{{{\rm{out}}}}\atop}\!{{\boldsymbol{m}}}_{{ij}}^{t+1}\) can be tensors of arbitrary order. Therefore, we need to find an equivariant way to compose the two tensors to a new tensor with different order, and Eq. (4) is such an operation. In our model, we write \({{{\rm{M}}}}_{t}\) in Eq. (5) as:$${\atop{\scriptstyle{l}_{i},{l}_{r}\!}}{\!\scriptstyle{l}_{o}\atop}{\!{\boldsymbol{\!m}}}_{{ij}}^{t}={{{\rm{M}}}}_{t}\left({{{\boldsymbol{h}}}}_{i}^{t},{{{\boldsymbol{h}}}}_{j}^{t},{{{\boldsymbol{e}}}}_{{ij}}\right)={{{\rm{f}}}}_{l_r}^{t}\left({d}_{{ij}}\right){{\cdot }}{\scriptstyle{l}_{i}\atop}{\!{{\boldsymbol{h}}}}_{j_{a_{1}\cdots {a}_{{l}_{i}{-}{l}_{c}}{c}_{1}\cdots {c}_{{l}_{c}}}}^{t}{{\cdot }}{\left({{{\bf{u}}}}_{{ij}}^{\bigotimes {l}_{r}}\right)}_{{c}_{1}\cdots {c}_{{l}_{c}}{b}_{1}\cdots {b}_{{l}_{r}{-}{l}_{c}}}$$
(7)
Where \({d}_{{ij}}={||}{r}_{{ij}}{||}\) is the relative distance between atom i and atom j, \({{{\bf{u}}}}_{{ij}}=\frac{{{{\bf{r}}}}_{{ij}}}{{d}_{{ij}}}\) is the direction vector, \(0\le {l}_{c}\le \min ({l}_{i},{l}_{r})\) is the number of the indices summing up during the contraction. \({{{\rm{f}}}}_{{l}_{r}}^{t}\left({d}_{{ij}}\right)\) is the radial function, which is a learnable multi-layer perceptron of radial basis functions such as Bessel basis and Chebyshev basis. The result is a Cartesian tensor with order \({l}_{o}=|{l}_{i}+{l}_{r}-2{l}_{c}|\), which is between \(|{l}_{i}-{l}_{r}|\) and \({l}_{i}+{l}_{r}\). Since \({l}_{r}\) can be chosen arbitrarily, we can obtain an equivariant tensor of order from 0 to any arbitrary order.We use a summation operation as the aggregation function for the messages in Eq. (5), that is, directly adding all the messages obtained from neighboring nodes. For tensors of the same order obtained from different \(({l}_{i},{l}_{r})\), we add them together with different coefficients. Due to the arbitrariness of \({l}_{o}\) and \({l}_{r}\), we need to specify their maximum values. With given \({l}_{{{\rm{omax}}}}^{t}\), \({l}_{{{\rm{rmax}}}}^{{{\rm{t}}}}\), we sum up all possible \(({l}_{i},{l}_{r})\):$${\scriptstyle{l}_{o}\atop}{\!{\boldsymbol{m}}}_{i}^{t+1}={\sum}_{{l}_{r}\le {l}_{{{\rm{rmax}}}}^{t}}{\sum}_{{l}_{i}}{c}_{{l}_{o},{l}_{r},{l}_{i}}{\sum}_{j\in N\left(i\right)}{{{\rm{f}}}}_{{l}_{r}}\left({d}_{{ij}}\right)\cdot {\scriptstyle{l}_{i}\atop}{\!{{\boldsymbol{h}}}}_{j_{{a}_{1}\cdots {a}_{{l}_{i}-{l}_{c}}{c}_{1}\cdots {c}_{{l}_{c}}}}^{t}\cdot {\left({{{\bf{u}}}}_{{ij}}^{\bigotimes {l}_{r}}\right)}_{{c}_{1}\cdots {c}_{{l}_{c}}{b}_{1}\cdots {b}_{{l}_{r}-{l}_{c}}}$$
(8)
UpdateFor scalar message \({\scriptstyle0\atop}\!m_{i}^{t+1}\), we feed it to a fully connected layer followed by a non-linear activation function to extract the information, and update the hidden feature with residual neural networks:$${{\scriptstyle0}\atop}\!h_{i}^{t+1}={\scriptstyle0\atop}\!h_{i}^{t}+{{\upsigma }}\left({\scriptstyle0\atop}\!{{\rm{W}}}^{t}({\scriptstyle0\atop}\!m_{i}^{t+1})\right)$$
(9)
Where σ is the nonlinear activation function, \({\scriptstyle0\atop}\!{{\rm{W}}}^{t}\) is a linear layer with bias in the t layer for the scalar message. However, for the tensors above 0 order, both the bias and the activation function will break the equivariance (Supplementary Note 2). Therefore, we only apply bias when l = 0.For the high-order activation function, as shown in Eq. (3), tensor multiplication by a scalar is equivariant. Hence, we need to find a mapping from an n-order tensor to a scalar. One simple idea is to use the squared norm of the tensor \({{||}{{\boldsymbol{T}}}{||}}^{2}={\sum }_{{i}_{1}\cdots {i}_{l}}{{{\boldsymbol{T}}}}_{{i}_{1}\cdots {i}_{l}}^{2}\) since it is invariant by definition. Therefore, for \(l \, > \, 0\), we write the element-wise non-linear function as:$${\scriptstyle{l}\atop}{\!{\upsigma }}{\left({{\boldsymbol{T}}}\right)}_{{i}_{1}\cdots {i}_{l}}={{{\upsigma }}}^{{\prime} }\left({\scriptstyle{l}\atop}\!{{\rm{W}}}({{||}{{\boldsymbol{T}}}{||}}^{2})\right)\cdot {{{\boldsymbol{T}}}}_{{i}_{1}\cdots {i}_{l}}$$
(10)
It should be noted that different notations σ and σ’ were used for the activation function in Eqs. (9) and (10), as the choice of activation function may vary for scalar and high-order tensors. Take the SiLU function \({{\rm{\sigma }}}\left(x\right)=x\cdot {{\rm{sigmoid}}}(x)\) for example and suppose \({\scriptstyle{l}\atop}\!{{\rm{W}}}\) is the identity function. For scalar, SiLU maps x to x itself when \(x\gg 0\). However, for higher-order tensors, Eq. (10) will map \({{{\boldsymbol{T}}}}_{{i}_{1}\cdots {i}_{l}}\) to \({{||}{{\boldsymbol{T}}}{||}}^{2}\cdot {{{\boldsymbol{T}}}}_{{i}_{1}\cdots {i}_{l}}\) instead of \({{{\boldsymbol{T}}}}_{{i}_{1}\cdots {i}_{l}}\) when \({{{\boldsymbol{T}}}}_{{i}_{1}\cdots {i}_{l}}\gg 0\). This is because if we apply the formula for higher-order tensors to a scalar, which is \({y}_{i}=\sigma \left({{\rm{W}}}({x}_{i})\right)\cdot {x}_{i}\), an extra \({x}_{i}\) is multiplied. Therefore, if we use SiLU function for \({{\rm{\sigma }}}\), we should use Sigmoid function for \({{\rm{\sigma }}}^{\prime} \). Other activation functions can be handled using a similar approach, and hence the update function for high-order tensors is:$${\scriptstyle{l}\atop}{\!{\boldsymbol{h}}}_{i}^{t+1}={\scriptstyle{l}\atop}{\!{\boldsymbol{h}}}_{i}^{t}+{\scriptstyle{l}\atop}{\!{\upsigma }}\left({\scriptstyle{l}\atop}\!{{\rm{W}}}^{t}({\scriptstyle{l}\atop}\!{{\boldsymbol{m}}}_{i}^{t + 1})\right)$$
(11)
ReadoutFor a target n-order property, we utilize a two-layer nonlinear MLP to operate on the n-order tensor at the last hidden layer. For the same reason, bias and element-wise nonlinear functions cannot be used in the high-order tensor.$${\scriptstyle{l}\atop}\!{{\boldsymbol{o}}}_{i}={\scriptstyle{l}\atop}\!{{\rm{W}}}_{2}\left({\scriptstyle{l}\atop}\right.{{\upsigma }}\left({\scriptstyle{l}\atop}\!{{\rm{W}}}_{1}({\scriptstyle{l}\atop}\!{{\boldsymbol{h}}}_{i}^{t})\right)$$
(12)
Performance of HotPPWe validate the accuracy of our method on a diverse range of systems, including small organic molecules, periodic structures, and predictions of dipole moments and polarizability tensors. For each system, we trained the HotPP model on the commonly used dataset and compared the results with other models. To demonstrate the robustness of our model, most models were trained using the same network architecture shown in Fig. 3 and similar hyperparameters. More training details can be seen in Methods section.Fig. 3: The architecture of HotPP.After embedding atomic information into scalars, vectors, and tensors, 4 propagation layers are used to further extract information and outputs the target properties. For higher-order tensors or additional propagation layers, the frame remains similar. The inputs to the network are the atomic numbers Zi and relative displacements between atoms rij. After Embedding layer, the scalar node embedding \({\scriptstyle0\atop}\!h^{0}\) are initialized based on Zi and the higher order embeddings are set to zero. In each Propagate layer, the inputs are the node embeddings \({\scriptstyle{l}_{{in}}\atop}\!h\) and the filter tensors \({\scriptstyle{l}_{r}\atop}\!f_{{ij}}\) obtained by the Filter layer as \({f}_{{l}_{r}}\left({d}_{{{\rm{ij}}}}\right)\cdot {{{\bf{u}}}}_{{ij}}^{\otimes {l}_{r}}\), where \({d}_{{{\rm{ij}}}}\) is the relative distance and uij is the unit vector of rij. The output messages \({\scriptstyle{l}_{{out}}\atop}\!m_{i}\) are transformed through a linear layer and activated by the nonlinear function \(\sigma \) or \(\sigma {\prime} \), dependent on the order of the message tensors. In the Readout layer, the node embeddings are transformed and summed up to get the target properties \({\scriptstyle{l}\atop}\!o\), such as \({\scriptstyle0\atop}\!o\) for energy, and \({}^{1}o\) for dipole.Small organic moleculeWe first test our model on molecular dynamics trajectories of small organic molecules. The ANI-1x dataset35,36 contains DFT calculations for approximately five million diverse molecular conformations obtained through an active learning algorithm. To evaluate the extrapolation capability of HotPP, we train our model with 70% data from the ANI-1x dataset and test on the COmprehensive Machine-learning Potential (COMP6) benchmark35, which samples the chemical space of molecules larger than those included in the training set. The results are shown in Table 1. Compared to ani-1x, our model has demonstrated superior performance across the majority of prediction tasks.Table 1 Test results on COMP6 datasetPeriodic systemsAfter testing HotPP on small molecule datasets without periodicity, we evaluated its performance on periodic systems. We selected the carbon system with various phases as the example37. It is a complicated dataset with a wide range of structures containing structural snapshots from ab initio MD and iteratively extended from GAP-driven simulations, and randomly distorted unit cells of the crystalline allotropes, diamond, and graphite. We show the results in Table 2. Clearly, our model with about 150k parameters demonstrates advantages in predicting forces and virials compared to most models and can achieve accuracy close to l = 3 NequIP model with around 2 million parameters. And when the parameters are expanded to 600k, HotPP achieves the best results on this dataset.Table 2 Test results on CarbonNext, we verify the accuracy of our potential in calculating phonon dispersions of diamond, which was not well-predicted in some of the previous models for carbon38. We can obtain the force constant matrix of the structure directly through automatic differentiation \(\frac{{\partial }^{2}E}{\partial {r}_{i\alpha }\partial {r}_{j\beta }}\). We used Phonopy Python package39,40 to calculate the phonon spectrum of diamond and compared it with the results from DFT in Fig. 4. The results show that HotPP can describe the vibrational behavior well. Although there are relatively large errors in the high-frequency part at the gamma point, this could be attributed to the inaccuracy of the DFT calculations within the training dataset. We retrained the model using a more accurate dataset38, and the newly calculated phonon spectrum almost perfectly matches the results from DFT, which demonstrates the reliability of our model.Fig. 4: Phonon spectrum of diamond.Phonon dispersion relation for diamond as predicted by HotPP1 (blue dotted) and HotPP2 (red dotted) with comparison to DFT reference data (black solid). The HotPP1 was trained with the dataset GAP201737, and the HotPP2 was trained with GAP202038. Source data are provided as a Source Data file.Dipole moment and polarizability tensorSince our model can directly output vectors and matrices, we attempted to directly predict the dipole moments and polarizability tensors of structures in the final section. We consider the water systems including water monomer, water dimer, Zundel cation, and liquid water bulk41. The dipole and polarizability of the aperiodic systems were calculated by CCSD theory and those of liquid water were calculated by DFT. Each system contains 1000 structures and we use 70% of them as a training set and the rest as a testing set. We calculate the RMSEs relative to the standard deviation of the testing samples to compare with previous results obtained by other models41,42,43 as shown in Table 3 In most cases, HotPP gets the best results except for the polarizability tensor of the water monomer. Compared to T-EANN42 and REANN43, HotPP performs particularly well in the case of the dipole moment of liquid water. This may be because they fit the dipole moment by learning \({q}_{i}\) and calculating \({{\boldsymbol{\mu }}}={\sum }_{i=1}^{N}{q}_{i}{{{\bf{r}}}}_{i}\), which is inappropriate for periodic systems. In contrast, the output results of our model are all obtained through relative coordinates and thus we can get rid of the selecting of reference point.Table 3 Comparison the results obtained by SA-GPR, T-EANN, REANN and HotPP in different water systemsSince now we can obtain the dipole moment and polarizability by HotPP, we can calculate the infrared (IR) absorption spectrum and Raman spectrum for liquid water. We separately trained a machine learning potential to learn the energy, forces, and stresses of liquid water to assist us in conducting dynamic simulations. With this potential, we perform a classical MD simulation under ambient conditions (300 K, 1 bar) for 100 ps and calculate the dipole moment and the polarizability tensor every 1 fs. Then we compute the IR and Raman spectra by Fourier transforming the autocorrelation function (ACF) of them and the results are compared to the experiment data44,45 as shown in Fig. 5. We can observe that both HotPP model and DeePMD model46,47 can closely approximate the experimental IR spectra. Our results accurately fit the first three peaks, corresponding to the hindered translation, libration, and H-O-H bending respectively, but there is a long tail in comparison to the experimental data for the O-H stretching mode. This discrepancy may arise from not accounting for quantum effects in our classical molecule dynamic simulation. And for Raman spectra, our model also gives the result in agreement with experimental data.Fig. 5: Experimental and simulated infrared absorption spectrum and reduced anisotorpic Raman spectra of liquid water under ambient condition.a The comparasion of the infrared abosption spectrum b The comparasion of the reduced anisotropic Raman spectra. The black circles are the experimental data44,45, and the lines are calculated with molecular dynamics trajectories performed by HotPP, MB-pol56, DeePMD46,47, and T-EANN42. Source data are provided as a Source Data file.

Hot Topics

Related Articles