A Euclidean transformer for fast and stable machine learned force fields

From equivariant message passing neural networks to separating invariant and equivariant structure: SO3krates
MPNNs34 carry over many of the properties of convolutions to unstructured input domains, such as sets of atomic positions in Euclidean space. This has made them one promising approach for the description of the PES12,17,24,36,50,51,52, where the potential energy is typically predicted as$${E}_{{\rm{pot}}}({\vec{r}}_{1},\ldots,{\vec{r}}_{n})=\mathop{\sum }\limits_{i=1}^{n}{E}_{i}.$$
(1)
The energy contributions ${E}_{i}\in {\mathbb{R}}$ are calculated from high-dimensional atomic representations ${f}_{i}^{[T]}$. They are constructed iteratively (from T steps), by aggregating pairwise messages mij over atomic neighborhoods ${\mathcal{N}}(i)$$${f}_{i}^{[t+1]}={\rm{U}}{\rm{PD}}\left({f}_{i}^{[t]},\bigoplus _{j\in {\mathcal{N}}(i)}{m}_{ij}\right),$$
(2)
where UPD( ⋅ ) is an update function that mixes the representations from the prior iteration and the aggregated messages.One way of incorporating the rotational invariance of the PES is to build messages that are based on potentially incomplete sets of invariant inputs such as distances, angles or dihedral angles. An alternative is to use SO(3) equivariant representations36,38,39,52 within a basis that allows for systematic multipole expansion of the geometry to match the complexity of the modeled system.This requires to generalize the concept of invariant continuous convolutions12 to the SO(3)-group of rotations. A message function performing an SO(3) convolution can be written as36,42$${m}_{ij}^{LM}=\sum _{{l}_{1}{l}_{2}{m}_{1}{m}_{2}}{C}_{{l}_{1}{l}_{2}L}^{{m}_{1}{m}_{2}M}{\phi }^{{l}_{1}{l}_{2}L}({r}_{ij}){Y}_{{l}_{1}}^{{m}_{1}}({\hat{r}}_{ij}){f}_{j}^{{l}_{2}{m}_{2}},$$
(3)
where ${C}_{{l}_{1}{l}_{2}L}^{{m}_{1}{m}_{2}M}$ are the Clebsch-Gordan coefficients, ${Y}_{l}^{m}$ is a spherical harmonic of degree l and order m, the function ${\phi }^{{l}_{1}{l}_{2}}:{\mathbb{R}}\mapsto {{\mathbb{R}}}^{F}$ modulates the radial part and ${f}_{j}^{{l}_{2}{m}_{2}}\in {{\mathbb{R}}}^{F}$ is an atomic feature vector. Thus, performing a single convolution scales as ${\mathcal{O}}({l}_{\max }^{6}\times F)$, where ${l}_{\max }$ is the largest degree in the network Fig. 1b and Fig. 2). Here we propose two conceptual changes to Eq. (3) that we will denote as Euclidean self-attention: (1) We separate the message into an invariant and an equivariant part and (2) replace the SO(3) convolution by an attention function on its invariant output. To do so, we start by initializing atomic features ${f}_{i}^{[t=0]}\in {{\mathbb{R}}}^{F}$ and Euclidean variables (EV) ${x}_{i,LM}^{[t=0]}\in {\mathbb{R}}$ from the atomic types and the atomic neighborhoods, respectively. Collecting all orders and degrees for the EV in a single vector, gives ${({l}_{\max }+1)}^{2}$ dimensional representations xi that transform equivariant under rotation and ${l}_{\max }$ (“Methods” section IV A).Fig. 1: Results overview.a Illustration of an invariant convolution. b Illustration of an SO(3) convolution. c Illustration of the Euclidean attention mechanism that underlies the SO3krates transformer. We decompose the representation of molecular structure into high dimensional invariant features and equivariant Euclidean variables (EV), which interact via self-attention. d The combination of simulation stability and computational efficiency of SO3krates enables the analysis of a broad set of properties (power spectra, folding dynamics, minima analysis, radius of gyration) on different simulation timescales.Fig. 2: Learning on invariants.SO(3) convolutions are constructed as triplet tensor products in the spherical harmonics basis, which is performed F times along the feature dimension F. We replace SO(3) convolutions by a parametrized filter function on the invariants (red blocks), which effectively reduces the tripled tensor product to taking the partial (per-degree) trace of a simple tensor product. Colored volumes correspond to the non-zero entries in the Clebsch-Gordan coefficients ${C}_{{l}_{1}{l}_{2}L}^{{m}_{1}{m}_{2}M}$, which mask the tensor products, l denotes the degree and ${Y}_{l}^{m}$ is the spherical harmonics function.

The message for the invariant components is expressed as$${m}_{ij}={\alpha }_{ij}{f}_{j},$$
(4)
whereas the equivariant parts propagate as$${m}_{ijLM}={\alpha }_{ij,L}{Y}_{L}^{M}({\hat{r}}_{ij}),$$
(5)
where ${\alpha }_{ij}\in {\mathbb{R}}$ are (per-degree) attention coefficients. Features and EV are updated with the aggregated messages to$${f}_{i}^{[t+1]}={f}_{i}^{[t]}+\sum _{j\in {\mathcal{N}}(i)}{m}_{ij},$$
(6)
for the features and$${x}_{iLM}^{[t+1]}={x}_{iLM}^{[t]}+\sum _{j\in {\mathcal{N}}(i)}{m}_{ijLM},$$
(7)
respectively. Due to this separation, the overall message calculation scales as ${\mathcal{O}}({l}_{\max }^{2}+F)$, as it replaces the multiplication of feature dimension and ${l}_{\max }$ that appears in other equivariant architectures by an addition (Table 1). As shown in53, a separation between invariant and equivariant representations can also achieved by adding an invariant latent space that is updated using iterated tensor products on an equivariant, edge based feature space. In contrast, our approach is centered around atom-wise representations and the a priori separation of both interaction spaces allows to fully avoid the usage of tensor products. Both design choices benefit computational efficiency.Table 1 Computational complexity

Instead of performing full SO(3) convolutions, we move the learning of complex interaction patterns into an attention function$${\alpha }_{ij}=\alpha \left({f}_{i},\,{f}_{j},\,{r}_{ij},\,{\oplus }_{l=0}^{{l}_{\max }}{{\boldsymbol{x}}}_{ij,l\to 0}\right),$$
(8)
where ${\oplus }_{l=0}^{{l}_{\max }}{{\boldsymbol{x}}}_{ij,l\to 0}$ is the invariant output of the SO(3) convolution over the EV signals located on atom i and j (“Methods” section IV B). Thus, Eq. (8) non-linearly incorporates information about the relative orientation of atomic neighborhoods. Since the Clebsch-Gordan coefficients are diagonal matrices along the l = 0 axis (Fig. 2), calculating the invariant projections requires to take per-degree traces of length (2l + 1) and can be computed efficiently in ${\mathcal{O}}({l}_{\max }^{2})$. Within SO3krates atomic representations are refined iteratively as$$[{{\bf{f}}}_{i}^{[t+1]},\,{{\boldsymbol{x}}}_{i}^{[t+1]}\,]={\scriptstyle{\scriptstyle{\mathrm EC}}}{\mathrm T}\scriptstyle{\scriptstyle{\mathrm BLOCK}}\left[{\{{{\bf{f}}}_{j}^{[t]},\,{{\boldsymbol{x}}}_{j}^{[t]},\,{\vec{r}}_{ij}\}}_{j\in {\mathcal{N}}(i)}\right],$$
(9)
where each Euclidean transformer block (ecTblock) consists of a self-attention block and an interaction block. The self-attention block, implements the Euclidean self-attention mechanism described in the previous section. The interaction block gives additional freedom for parametrization by exchanging information between features and EV located at the same atom. After T MP steps, per-atom energies Ei are calculated from the final features ${f}_{i}^{[T]}$ using a two-layered neural network and are summed to the total potential energy (Eq. (1)). Atomic forces are obtained using automatic differentiation, which ensures energy conservation8.

We remark that the outlined equivariant architecture does not preclude the modeling of vectorial and tensorial properties, such as atomic quadrupoles or octopoles, up to the set maximum degree ${l}_{\max }$. For example, molecular dipoles can be learned by combining invariant partial charge predictions with atomic dipoles extracted from the EVs of degree l = 124,37.A detailed outline of the architectural components and the proposed Euclidean self-attention framework is given in the “Methods” section and in Fig. 3.Fig. 3: SO3krates architecture and building blocks.a The SO3krates transformer takes atomic types Z = {z1, …, zn} and atomic positions $R=\{{\vec{r}}_{1},\ldots,{\vec{r}}_{n}\}$ and returns a corresponding energy Epot. Within the embedding block Z and R are embedded into invariant features F and equivariant Euclidean variables (EV) X. They are refined via T Euclidean transformer blocks. Per-atom energies are predicted from the final features via a multilayer perceptron (MLP) and are summed up to yield the total energy. b A Euclidean transformer block consists of a Euclidean attention block and an interaction block. Both blocks are enveloped by skip connections which allow to carry over information from prior layers. c The Euclidean attention block aggregates atomic neighborhood information. It consists of two branches one updating the features and one the EV. Within the feature branch the invariant features and the filter vectors wij are split into h attention heads. Query, key and value vector are constructed from the features via trainable matrices Q, K and V. Query and filter are multiplied point-wise before a dot product with the key vector yields attention coefficients αij which weight the value vectors during neighborhood aggregation. The aggregated per-head features are stacked back together and yield a single attended feature ${f}_{i}^{{\rm{Att}}}$. The EV branch follows a similar design with the number of heads being determined by the minimum (${l}_{\min }$) and maximal degree (${l}_{\max }$) in the network. Instead of an invariant value vector, spherical harmonics vectors ${{\boldsymbol{Y}}}_{ij}^{l}$ from minimal to maximal degree are used for the different heads. d The interaction block exchanges per-atom information between features and EV. It contracts the EV to per-degree invariants and concatenates the result with the invariant features. The concatenated result is passed through an affine transformation which gives updates for the features and the EV. e A pairwise difference vector between EV is contracted to per-degree invariants and passed through an MLP. From the atomic displacement vector the pairwise distance is calculated and expanded in radial basis functions before fed into an MLP. The MLP outputs are summed to yield the filter vector wij.Overcoming accuracy-stability-speed trade-offsWe now demonstrate in the following numerical experiment, that SO3krates can overcome the trade-offs between MD stability, accuracy and computational efficiency (Fig. 4).Fig. 4: Computational efficiency and molecular dynamics (MD) stability.a Number of frames per second (FPS) vs. the averaged stability coefficient (upper panel) and FPS vs. the averaged force mean absolute error (MAE) (lower panel) for four small organic molecules from the MD17 data set as reported in32 for different state-of-the-art MPNN architectures12,36,37,43,50,94,95,96. b Since run times are sensitive to hardware and software implementation details, we re-implement two representative models along the trade-off lines under settings identical to the SO3krates MLFF (using jax), which yields framework-corrected FPS (dashed vs. solid line). We observe speed-ups between 28 (for NequIP) and 15 (for SchNet) in our re-implementations. We find, that SO3krates enables reliable MD simulations and high accuracies without sacrificing computational performance. Gray shaded area indicates the regime of sub-millisecond step time. c MD step time vs. the number of atoms in the system. The smaller pre-factor in the computational complexity compared to SO(3) convolutions (Table 1) results in computational speed-ups that grow in system size. d MD stability observed at temperatures 300 K and 500 K. The transition to higher temperatures results in a drop of stability for the invariant model, hinting towards less robustness and weaker extrapolation behavior. Flexible molecules such as DHA pose a challenge for the invariant model at 300 K already. Bar height is the mean stability over six MD runs and yellow dots denote stability for individual MD runs. Error bars correspond to the 2σ confidence interval. Per-structure error distributions for an invariant and an equivariant SO3krates model with the same error on the test set. Spread and mean of the error distributions are given in Supplementary Table 1.A recent study compared the stability of different state-of-the-art MLFFs in short MD simulations and found that only the SO(3) convolution-based architecture NequIP36 gave reliable results32. However, the excellent stability of such models comes at a substantial computational cost associated with this operation (Fig. 4a, top panel). This necessitates a trade-off between the stability and the computational efficiency of the MLFF, which SO3krates can now overcome (Fig. 4b). Our model allows the prediction of up to one order of magnitude more frames per second (FPS) (Fig. 4c), enabling step times at sub-millisecond speed, without sacrificing reliability or accuracy in MD simulations (Fig. 4 (b)). We remark however, that test accuracy and stability do not necessarily correlate (compare e.g., GemNet and SphereNet in Fig. 4a) but only simultanously accurate and stable models are of practical interest. We find, that SO3krates yields accurate force predictions, thus overcoming this trade-off effectively (Fig. 4b). As for stability and speed, the investigated models in ref. 32 show an accuracy-speed trade-off (Fig. 4a, lower panel) in line with the findings reported in ref. 54.Any empirical runtime measurement depends on specific hardware and software conditions. The run times reported in32 have been measured for MLFF models implemented in PyTorch+ase. To ensure comparability with SO3krates (implemented in jax), we re-implement two representative models along the trade-off lines (Fig. 4a) under SO3krates-identical settings. Since the study mentioned above reports that the fastest model (ForceNet) yields wrong observables in their benchmark, we instead chose the second fastest contender (SchNet) for our re-implementation. As the most stable and accurate model we chose NequIP for re-implementation. This selection of architectures is also representative for invariant (SchNet) and equivariant SO(3)-convolution-based models (NequIP), constituting the upper and lower bounds in terms of computational complexity (Table 1). All models are re-implemented in jax55 using the e3x library56. MD step times are measured with the same MD code written in jax-md57 on the same physical device (Nvidia V100 GPU). The models follow the default MD17 hyperparameters as outlined in the original publications12,36. This ensures equal footing for our runtime comparisons that follow. Interestingly, the transition from PyTorch+ase (Fig. 4b) purple dashed line) to jax+jax-md (Fig. 4b) purple solid line) allows for a speed-up between 28 for NequIP and 15 for SchNet. This illustrates the importance of identical settings and the potential of the jax ecosystem. Notably, the step times are measured without the time required for IO operations, since they are highly dependent on the local HPC infrastructure. Thus, wall times we report for full simulations have a constant offset w. r. t. the reported step times.For small organic molecules with up to 21 atoms SO3krates achieves an averaged speed-up by a factor of 5 compared to the NequIP architecture whereas step times are slightly larger (by a factor of 1.4) than for the invariant SchNet model. The speed-up over SO(3) convolutions increases in the total number of atoms (Fig. 4c), which is in line with the smaller pre-factor in the theoretical scaling analysis (Table 1) such that for the double walled nanotube (370 atoms), the speed-up compared to NequIP has grown to a factor of 30. Compared to invariant convolutions, we find our approach to yield a slightly slower prediction speed which is in line with theoretical considerations.For the radial distribution functions (RDFs), we find consistent results across five simulation runs (Supplementary Fig. 3) for all of the four investigated structures, which are in agreement with the RDFs from DFT calculations. Interestingly, it has been found that some faster MLFF models can give inaccurate RDFs, which result in MAEs between 0.35 for salicylic acid and 1.02 for naphthalene32. In comparison the achieved accuracies with SO3krates show that the seemingly contradictory requirements of high computational speed and accurate observables from MD trajectories can be reconciled.A recent work, proposed a strictly local equivariant architecture, called Allegro53. This allows for parallelization without additional communication, whereas parallelization of MPNNs with T layers requires T − 1 additional communication calls between computational nodes. On the example of the Li3PO4 solid electrolyte we compare accuracy and speed to the Allegro model for a unit cell with 192 atoms (Table 2). For the recommended hyperparameter settings, SO3krates achieves energy and force accuracies, more than 50% better than the ones reported in ref. 53, even with only one tenth of the training data. At the same time, the timings in MD simulations are on par. Notably the model settings in ref. 53 have been optimized for speed rather than accuracy in order to demonstrate scalability. This again expresses an accuracy-speed trade-off that we can improve upon using the SO3krates architecture. To further validate the physical correctness of the obtained MD trajectory, we compare the RDFs at 600K to the ones obtained from DFT in the quenched phase of Li3PO4 (Supplementary Fig. 7). The results showcase the applicability of SO3krates to materials, beyond molecular structures.Table 2 Accuracy and speed for periodic systemsData efficiency, stability and extrapolationData efficiency and MD stability play an important role for the applicability of a MLFFs. High data efficiency allows to obtain accurate PES approximations even when only little data is available, which is a common setting due to the computational complexity of quantum mechanical ab-initio methods. Even when high accuracies can be achieved, without MD stability the calculation of physical observables from the trajectories becomes impossible. Here, we show that the data efficiency of SO3krates can be successively increased further by increasing the maximum degree ${l}_{\max }$ in the network (Supplementary Fig. 4). We further find, that the stability and extrapolation to higher temperatures of the MLFF can be linked to the presence of equivariant representations, independent of the test error estimate (Fig. 4 (d)).To understand the benefits of directional information, we use an equivariant (${l}_{\max }=3$) and an invariant model (${l}_{\max }=0$) within our analysis. Due to the use of multi-head attention, the change in the number of network parameters is negligible when going from ${l}_{\max }=0$ to ${l}_{\max }=3$ (“Methods” section IV H). All models were trained on 11k randomly sampled geometries from which 1k are used for validation. This number of training samples was necessary to attain force errors close to 1 kcal mol−1 Å−1 for the invariant model. Since equivariant representations increase the data efficiency of ML potentials24,36, we expect the equivariant model to have a smaller test error estimate given the same number of training samples. We confirm this expectation on the example of the DHA molecule, where we compare the data efficiency for different degrees ${l}_{\max }$ on the example of the DHA molecule (Supplementary Fig. 4). To make the comparison of invariant and equivariant model as fair as possible, we train the invariant model until the validation loss converges. Afterwards, we train the equivariant model towards the same validation error, which leads to identical errors on the unseen test set (Fig. 4d, and Supplementary Table 1). Since the equivariant model makes more efficient use of the training data, it requires only ~1/5 of the number of training steps of an invariant model to reach the same validation error (Supplementary Fig. 4d).After training, we compare the test error distributions since identical mean statistics do not imply a similar distribution. We calculate per atom force errors as ${\epsilon }_{i}=| | {\vec{F}}_{i}^{\text{\,pred}}-{\vec{F}}_{i}^{\,\text{ref}\,}| {| }_{2}$ and compare the resulting distribution of the invariant and the equivariant model. The so observed distributions are identical in nature and only differ slightly in height and spread without the presence of a clear trend (Supplementary Fig. 5b). In the distributions of the per-structure ${\mathcal{S}}$ force error ${R}_{i}=\frac{1}{| {\mathcal{S}}| }{\sum }_{i\in {\mathcal{S}}}{\epsilon }_{i}$, however, one finds a consistently larger spread of the error (Fig. 4d). Thus, the invariant model performs particularly well (and even better than the equivariant model) on certain conformations which comes at the price of worse performance for other conformations, a fact which is invisible to per-atom errors.The stability coefficients (Eq. (28)) are determined from six 300 ps MD simulations with a time step of 0.5 fs at temperatures T = 300 K and T = 500 K (Fig. 4d). We find the invariant model to perform best on Ac-Ala3-NHMe, which is the smallest and less flexible structure of the three under investigation where one observes a noticeable decay in stability for the larger temperature. Due to the increase in temperature, configurations that have not been part of the training data are visited more frequently, which requires better extrapolation behavior. When going to flexible structures such as DHA (second row Fig. 4 (d)) the invariant model becomes unable to yield stable MD simulations. To exclude the possibility that the instabilities in the invariant case are due to the SO3krates model itself, we also trained a SchNet model which yielded MD stabilities comparable to the invariant SO3krates model. Thus, directional information has effects on the learned energy manifold that go beyond accuracy and data efficiency.A subtle case is highlighted by the adenine-thymine complex (AT-AT). The MD simulations show one instability (in a total of six runs) for the equivariant model at 500 K, which illustrates that the stability improvement of an equivariant model should be considered as a reduction of the chance of failure rather than a guarantee for stability. We remark that unexpected behaviors can not be ruled out for any empirical model. We further observed dissociation of substructures (either A, T or AT) from the AT-AT complex during MD simulations (Supplementary Fig. 6). Such a behavior corresponds to the breaking of hydrogen bonds or π-π-interactions, which highlights weak interactions as a challenge for MLFFs. Interestingly, for other supra-molecular structures the non-covalent interactions are described correctly (section II F and Fig. 5a). The training data for AT-AT has been sampled from a 20 ps long ab-initio MD trajectory which only covers a small subset of all possible conformations and makes it likely to leave the data manifold. As a consequence, we observe an increase in the rate of dissociation when increasing the simulation temperature, since it effectively extends the space of accessible conformations per unit simulation time.Fig. 5: Stable, long-timescale molecular dynamics (MD) simulations and extrapolation to larger bio-molecules.a Stability and speed of SO3krates enable nanosecond-long MD simulations for supra-molecular structures within a few hours. For the buckyball catcher, the ball stays in the catcher over the full simulation time of 20 ns, illustrating that the model successfully picks up on weak, non-covalent bonding. b Distribution of the radius of gyration (ROG) for Ac-Ala3-NHMe as a function of MD simulation time in 20 ns steps. The distribution converges after 60–80 ns simulation time, underlining the importance of stable, but at the same time computationally efficient, simulations. In Supplementary Fig. 8 the ROG as a function of simulation time is shown. c Dynamics of Ala15 obtained from a SO3krates model, trained on only 1k data points of a much smaller peptide (Ac-Ala3-NHMe). Analysis of the end-to-end difference shows rapid folding into helical structure illustrating the generalization capabilities of the learned local representations towards conformational changes on length-scales greatly exceeding the training data (dashed gray line).Radius of gyrationThe radius of gyration (ROG) is an important observable for determining the structural and dynamical behavior of polymers as it allows to gain an estimate for structural compactness of proteins58 and is experimentally accessible59. The timescales of structural changes are often between tenths or even hundreds of nanoseconds60 which requires simulation durations at the same order of magnitude to observe a converged distribution of the ROG. Thus, the MLFF must be robust enough to yield stable dynamics for hundreds of nanoseconds while being computationally efficient at the same time.Here we showcase the potential of the proposed SO3krates model for such applications by using it to calculate a converged distribution of the ROG for Ac-Ala3-NHMe from 100 ns long MD simulations at 300 K (Fig. 5b). With a time step of 0.5 fs such a simulation requires 200M force evaluations. The SO3krates model enables us to perform such simulations within 5 days on a single A100 GPU. By analyzing the ROG distribution as a function of simulation time (different colors in Fig. 5b) we find such timescales to be necessary for convergence. We find characteristic peaks in the distribution, which correspond to the folded and unfolded conformation, respectively. Details on the MD simulation can be found in the “Methods” section.Generalization to larger peptidesGeneralization to larger structures and unknown conformations is an inevitable requirement for scaling MLFFs to realistic simulations in biochemistry. Here, we showcase that by only using 1k training points of a small peptide (42 atoms), SO3krates can generalize to much larger peptides (151 atoms) without the need of any additional training data. Despite the locality of the model, we observe folding into a helical structure, illustrating the extrapolation capabilities of the learned representations to larger structures.We use the same model as for the ROG experiment from section II D. As already illustrated, the obtained model is able to perform long and stable dynamics for the structure it has been trained on. To further increase the complexity of the task, we use the model without any modification to investigate the dynamics of Ala15, starting from the extended structure (most left structure in Fig. 5c). Our analysis of the end-to-end distance between the carbonyl carbon of the first residue and the last residue (green spheres in Fig. 5c) reveals that the peptide rapidly folds into the secondary, helical structure (most right structure in Fig. 5c). A comparison to the end-to-end distance in Ac-Ala3-NHMe (dashed horizontal line) reveals the generalization capabilities of SO3krates towards conformational changes on length-scales that go beyond the ones present in the training data (gray dashed line in Fig. 5c).Power spectraPower spectra of the atom velocities are an important tool to relate MD simulations to real world experimental data. They are calculated as the Fourier transform of the velocity auto-correlation function for systems ranging from small peptides up to host-guest systems and nanostructures. To achieve a correct description for such systems, the model must describe both covalent and non-covalent bonding correctly. For the largest structure with 370 atoms, 5M MD steps with SO3krates takes 20 h simulation time ( ~15 ms per step).We train an individual model for each structure in the MD22 data set and compare it to the sGDML model (Table 3). A comparison of training time to other neural network architectures is given in the “Methods” section IV I. To that end, we decided to train the model on two different sets of training data sizes: (A) On structure depended sizes (600 to 8k) as reported in61, and (B) on structure independent sizes of 1k training points per structure. Since some settings might require accurate predictions when trained on a smaller number of training data points, we chose to include setting (B) into our analysis. The approximation accuracies achievable with SO3krates compare favorably to the ones that have been observed with the sGDML model35,61 (Table 3). Even for setting (B) the force errors on the test set are below 1 kcal mol−1 Å−1. We use the SO3krates FF to run 1 ns long MD simulations, from which we calculate the power spectra, enabling a comparison to experimental data from IR spectroscopy. Although the frequencies in these systems do not require such simulation lengths, we chose them to illustrate computational feasibility as well as simulation stability. We start by analyzing two supra-molecular structures in form of a host-guest system and a small nanomaterial. The former play an important role for a wide range of systems in chemistry and biology26,62, whereas the latter offer promises for the design of materials with so far unprecedented properties63. Here, we investigate the applicability of the SO3krates FF to such structures on the example of the buckyball catcher and the double walled nanotube (Fig. 6a).Table 3 Performance on MD22Fig. 6: Power spectra and potential energy surface exploration via minima hopping.a Power spectra for the buckyball catcher (upper panel) and the double walled nanotube (lower panel) computed as the Fourier transform of the velocity auto-correlation function. For the nanotube, the structure is shown from the side and from the front. b Power spectrum for DHA from an NVE at the zero point energy (ZPE) (light blue) and 500 K (dark blue), as well as the frequencies from harmonic approximation (red dashed lines). c Results of a minima search for DHA. We ran the simulation until 10k minima had been visited, which corresponds to 20 M molecular dynamics steps for the escape trials and to ~10 M PES evaluations for the structure relaxations, afterwards. Next to it minima with the largest energy (top), the lowest energy (bottom) and an example minimum with an intermediate energy value (middle) are depicted. d Ramachandran density plots for the training conformations (upper, blue) and of the visited minima during minima hopping (lower, green) for two of the six backbone angles in Ac-Ala3-NHMe. Yellow dots correspond to the actually visited minima. Parts of the visited minima have not been in the training data, hinting towards the capability of the model to find minima beyond the training data. e Ac-Ala3-NHMe structure with backbone angles as inset. f Relative energies for four minima, which have been selected from the regions in ψ − ϕ space visited most frequently during minima hopping (A–D in (d)). SO3krates energies are compared to a DFT single point calculation and to the conformation obtained from a full DFT relaxation starting from the minima obtained from SO3krates. g Location in the Ramachandran plot of the minima obtained with SO3krates and the relaxed DFT minima.For both systems under investigation, one finds notable peaks for C-C vibrations (500 cm−1 and 1500 cm−1), C-H bending ( ~900 cm−1) and for high frequency C-H stretching ( ~3000 cm−1). Both systems exhibit covalent and non-covalent interactions62,64, where e.g., van-der-Waals interactions hold the inner tube within the outer one. Although small in magnitude, we find the MLFF to yield a correct description for both interaction classes, such that the largest degree of freedom for the double walled nanotube corresponds to the rotation of the tubes w. r. t. each other, in line with the findings from61.For DHA, we further analyze the evolution of the power spectrum with temperature and find non-trivial shifts in the spectrum hinting towards the capability of the model to learn non-harmonic contributions of the PES (6b). As pointed out in65, FFs that only rely on (learn) harmonic bond and angle approximations fail to predict changing population or temperature shifts in the middle to high frequency regime. To further showcase the anharmonicity of the spectra obtained with SO3krates, we first identify the global minimum of DHA (using the minima hopping results from section II G). For the found conformation, we calculate harmonic frequencies as well as the zero point energy (ZPE) from harmonic approximation. The ZPE is 12.979 eV, which corresponds to a temperature of roughly 930 K. We find that our method yields stable dynamics, even though the simulation temperature is almost twice as high as the one used for generating the training data (500 K). The emergence of non-trivial shifts between the spectra from the NVE (blue curve) with the harmonic frequencies (dashed red lines) illustrates the non-trivial anharmonicity that our proposed method is able to model. Similar results are obtained for Ac-Ala3-NHMe (Supplementary Fig. 2).Potential energy surface topologyThe accurate description of conformational changes remains one of the hardest challenges in molecular biophysics. Every conformation is associated with a local minimum on the PES, and the count of these minima increases exponentially with system size. This limits the applicability of ab-initio methods or computationally expensive MLFFs, since even the sampling of sub-regions of the PES involves the calculation of thousands to millions of equilibrium structures. Here, we explore 10k minima for two small bio-molecules, which requires ~30M FF evaluations per simulation. This analysis would require more than a year with DFT and more than a month with previous equivariant architectures, whereas we are able to perform it in ~2 days.We employ the minima hopping algorithm46, which explores the PES based on short MD simulations (escapes) that are followed by structure relaxations. The MD temperature is determined dynamically, based on the history of minima already found. In that way low energy regions are explored and high energy (temperature) barriers can be crossed as soon as no new minima are found. This necessitates a fast MLFF, since each escape and structure relaxation process consists of up to a few thousands of steps. At the same time, the adaptive nature of the MD temperature, can result in temperatures larger than the training temperature (Supplementary Fig. 9a) which requires stability towards out-of-distribution geometries.We start by exploring the PES of DHA and analyze the minima that are visited during the optimization (Fig. 6c). We find many minima close in energy that are associated with different foldings of the carbon chain due to van-der-Waals interactions. This is in contrast to the minimum energies found for other chainlike molecules such as Ac-Ala3-NHMe, where less local minima are found per energy unit (Supplementary Fig. 11a). The largest observed energy difference corresponds to 0.57 eV, where the minima with the largest potential energy (top) and the lowest potential energy (bottom) as well as an example structure from the intermediate energy regime (middle) are shown in Fig. 6c). We find the observed geometries to be in line with the expectation that higher energy configurations promote an unfolding of the carbon chain.Funnels are sets of local minima separated from other sets of local minima by large energy barriers. The detection of folding funnels plays an important role in protein folding and finding native states, which determine the biological functioning and properties of proteins. The combinatorial explosion of the number of minima configurations makes funnel detection unfeasible with ab initio methods or computationally expensive MLFFs. We use the visited minima and the transition state energies that are estimated from the MD between successive minima to create a so-called disconnectivity graph66. It allows detect multiple funnels in the PES of DHA, which are separated by energy barriers up to 3 eV (Supplementary Fig. 12).Ac-Ala-NHMe is a popular example system for bio-molecular simulations, as its conformational changes are primarily determined by Ramachandran dihedral angles. These dihedral angles also play a crucial role in representing important degrees of freedom in significantly larger peptides or proteins67. Here, we go beyond this simple example and use the minima hopping algorithm to explore 10k minima of Ac-Ala3-NHMe and visualize their locations in a Ramachandran plot (green in Fig. 6d) for two selected backbone angles ϕ and ψ (Fig. 6e). By investigating high-density minima regions and comparing them to the training data (blue in 6d), we can show that SO3krates finds minima in PES regions, which highlights the capability of the model to extrapolate beyond known conformations. Extrapolation to unknown parts of the PES is inevitable for the application of MLFFs in bio-molecular simulations, since the computational cost of DFT only allows to sample sub-regions of the PES for increasingly large structures.To confirm the physical validity of the found minima, we select one equilibrium geometry from each of the four highest density regions in the Ramachandran plots (A–D in Fig. 6d). A comparison of the corresponding energies predicted by SO3krates with DFT single point calculations (Fig. 6f) shows excellent agreement with a mean deviation of 3.45 meV for this set of four points. Remarkably, the minimum in the unsampled region of the PES (gray box in Fig. 6d) only deviates by a mere 0.7 meV in energy. We further compare the SO3krates relaxed structure to structures obtained from a DFT relaxation, initiated from the same starting points. For minima A and B, we again find excellent agreement with an energy error of 2.38 meV and 3.57 meV, respectively. The extrapolated minima D shows a slightly increased deviation (41.84 meV), which aligns with our expectation that the model performs optimally within the training data regime. Further, minima A, B, and D show good agreement with the backbone angles obtained from DFT relaxations (Fig. 6g).For minimum C, we find the largest energy deviation w. r. t. both, DFT single point calculation and DFT relaxation. When comparing the relaxed structures, we observe a rotation of 180∘ for one methyl group, the addition of a hydrogen bond and a stronger steric strain in the SO3krates prediction. These deviations coincide with a relatively large distance in the ϕ-ψ plane (Fig. 6g). To investigate the extend of minimum 3, we have generated random perturbations of the equilibrium geometry from which additional relaxation runs have been initiated. All optimizations returned into the original minimum (Supplementary Fig. 11b), confirming that it is not an artifact due to a non-smooth or noisy PES representation.

A Euclidean transformer for fast and stable machine learned force fields

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis

Chemistry wordoku #062 | Puzzle

Multi-output prediction of dose–response curves enables drug repositioning and biomarker discovery

Hot Topics

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis

Related Articles

Balancing Act: Pregnancy and Bipolar Disorder

Cohesion at the cellular level: flexible yet stable

Gut bacteria influence responses to immunotherapy in patients with asbestos related cancer

Quick Links

Must Read

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis

Chemistry wordoku #062 | Puzzle

Popular Articles

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis