Modelling chemical processes in explicit solvents with machine learning potentials

WorkflowThe AL strategy employed in this work to train MLPs is illustrated in Fig. 1a. The first step involves generating a small set of configurations labelled with reference energies and forces. This information is used to train the initial version of the MLP. For a given reaction, we employ two different training sets, one containing the reacting substrates in the gas phase (or implicit solvent) and another including explicit solvent molecules. The latter is necessary to account for specific non-covalent interactions between the solute and solvent.Fig. 1: Active learning (AL) workflow and structure selection strategies for training a machine learning potential (MLP).a In the gas phase, initial configurations are generated by random displacements (ΔD) on the atomic coordinates (x, y, z, red arrows). For the system in the condensed phase, cluster configurations with a minimum radius (rcluster) ≥ cut-off hyperparameters (rcut) in MLP are used as initial configurations. b Selection methods in the AL process: energy selector: selects data points where the energy difference (red) between the ground truth density functional theory (DFT) value (EDFT) and MLP predicted values (EMLP) exceeds the threshold (ET). similarity selector: collects configurations with dissimilar geometry to the original data set, i.e., configurations with the maximum of the similarity vector K lower than similarity threshold kT. distance selector: uses the local outlier factor (LOF) method to identify outliers o from the original data set. The circle with radius k-distance centred at the outlier (o) is highlighted in red.Different initial data generation strategies are used for each of the data sets mentioned above. For the gas phase or implicit solvated molecules, initial training configurations are generated by randomly displacing the atomic coordinates. In the case of a chemical reaction, the training typically starts from the corresponding TS. The initial dataset containing substrate and the explicit solvent can be generated either from solvent molecules in a box under periodic boundary conditions (PBC) or cluster models containing only a handful of solvent molecules placed at relevant positions. While PBC reproduces well the structure of the bulk solvent and includes long-range interactions, generating such training data, in particular with AIMD, is computationally expensive as thousands of configurations are required to accurately describe all the interactions, making it unfeasible for most chemically relevant systems. An alternative approach for generating PBC data is to use classical MM force fields; however, they are often inaccurate and unable to describe bond-breaking/forming processes. Moreover, MM configurations exhibit a weak overlap with the true potential energy surfaces (PES), even for non-reactive systems making them unsuitable for the training of MLPs42. In this regard, cluster data labelled with molecular energy and forces provide all structural information for MLPs based on the local descriptors and offer access to a large spectrum of electronic structure methods, including long-range corrected and double-hybrid DFT functionals. The minimum radius of the solvent shell around the substrate should be no less than the cut-off radius used for training the MLP to avoid artificial forces close to the solvent-vacuum interface in the cluster data.As discussed in more detail below, we observed good transferability of cluster-based MLPs to systems with PBC. Similar transferability has already been reported for an NNP applied to bulk water, where NNPs trained using both PBC and cluster training data demonstrate similar performance in predicting bulk properties, such as radial distribution functions (RDF), self-diffusion coefficients, and equilibrium densities52.After the initial MLP training, one structure from the initial training set is selected as the starting point to propagate the molecular dynamics (MD) using the first version of the trained MLP. Several rounds of short MD simulations are then performed to assess the stability of the potential and generate new training structures. The simulation time is set to (n3 + 2) fs where n corresponds to the index of the MD run, starting from 0. The time step of the dynamics is 0.5 fs. More details can be found in ref. 42 and SI§ S6.1. From each MD trajectory, the last frame is evaluated by the selector to determine whether to add or not this structure to the training set. If the structure is not selected, n is incremented, and MD runs are repeated until the maximum simulation time or a maximum number of MD iterations is achieved.Descriptor-based selectorsIn previous studies, we have utilised an energy-based selector (further referred to as energy) to determine whether a configuration should be added to the training set. This selector identifies structures that show an error in predicted energy higher than the threshold ET, satisfying the condition \(\left\vert {E}^{{{{{{{\rm{DFT}}}}}}}}-{E}^{{{{{{{\rm{MLP}}}}}}}}\right\vert > {E}_{{{{{{{\rm{T}}}}}}}}\). Structures with prediction errors greater than 10 ET were excluded from the dataset as these were too distorted to provide meaningful information. Although reliable, this approach requires QM calculations for each selection step, which is computationally prohibitive for large systems.In this work, we introduce selectors using descriptor-based selection criteria. Descriptors transfer molecular representation from Cartesian coordinates to a physical invariant description, encompassing both the geometrical and chemical information of molecular structures. Evaluating the SOAP descriptor over the training data thus provides information on how well the training set covers the relevant conformational and chemical space, enabling the identification of underrepresented data points.During the AL loop, we apply so-called similarity- and distance-based selectors, referred here to as similarity and distance selectors, respectively. The former quantifies the similarity between a new data point p and existing configurations \({p}^{{\prime} }\) using the kernel function \(k(p\cdot {p}^{{\prime} })\). The similarity vector of the data point is defined as:$${{{{{{\bf{K}}}}}}}={\left({\Big| k\left({{{{{{{\bf{p}}}}}}}}_{0}\cdot {{{{{{{\bf{p}}}}}}}}_{{{{{{{\rm{i}}}}}}}}\right)\Big| }^{\zeta },{\left| k\left({{{{{{{\bf{p}}}}}}}}_{0}\cdot {{{{{{{\bf{p}}}}}}}}_{{{{{{{\rm{j}}}}}}}}\right)\right| }^{\zeta },\cdots \right)}^{T}$$
(1)
where p0 is the SOAP vector of the new structure, pi is the SOAP vector of the i-th configuration in the existing set, and ζ is a positive integer that increases the sensitivity of kernel to changes in atomic position51. The kernel is computed between the new configuration and all other configurations in the training data set. The selector adds structure to the training set if the maximum value of its similarity vector, K, is smaller than threshold kT, i.e., \(\max ({{{{{{\bf{K}}}}}}}) < {k}_{{{{{{{\rm{T}}}}}}}}\). Selecting an appropriate threshold is key as too low values (e.g., similarity below 0.9) can result in the selection of non-physical structures that fail to converge in the self-consistent field (SCF) computations, while too high values (e.g., 1) do not provide any additional information.For the distance selector, we use the local outlier factor (LOF) method53 to determine whether the SOAP vector of the new configuration is an outlier compared to the SOAP vectors of the existing training data. LOF is based on the local density of each point, which is calculated by measuring the Euclidean distance between the target point and its k-nearest neighbours (Fig. 1b). The local reachability density of an object o, denoted as lrdk(o), is calculated as:$${{{{{{{\rm{lrd}}}}}}}}_{{{{{{{\rm{k}}}}}}}}(o)={\left(\frac{{\sum }_{i\in {N}_{{{{{{{\rm{k}}}}}}}}(o)}{{{{{{{\rm{rd}}}}}}}}_{{{{{{{\rm{k}}}}}}}}(o,\,i)}{\left\vert {N}_{{{{{{{\rm{k}}}}}}}}(o)\right\vert }\right)}^{-1}.$$
(2)
rdk(o, a) in the equation corresponds to the reachability distance defined as$${{{{{{{\rm{rd}}}}}}}}_{{{{{{{\rm{k}}}}}}}}\left(o,a\right)=\max \left(k\,{{\mbox{-distance}}}\,\left(a\right),d\left(o,a\right)\right).$$
(3)
Here, k-distance (a) represents the radius of the smallest circle with its origin in a, which includes the k-nearest neighbours of a (illustrated in dark blue in Fig. 1b), d(o, a) is the Euclidean distance between points o and a, where point o is the target point and point a is one of its neighbours. Nk(o) is a set of k-nearest neighbours of o, which are illustrated by blue points. The local reachability density is thus expressed as the number of neighbours per distance unit. If the local reachability density of the target point is smaller than that of its neighbours, the point is considered an outlier and added to the training data set. The comparison among the local densities is achieved by computing the ratio of the average local density of neighbours and the local density of the point, as follows:$${{{{{{{\rm{LOF}}}}}}}}_{{{{{{{\rm{k}}}}}}}}(o)=\frac{{\sum }_{i\in {N}_{{{{{{{\rm{k}}}}}}}}(o)}\frac{{{{{{{{\rm{lrd}}}}}}}}_{{{{{{{\rm{k}}}}}}}}(i)}{{{{{{{{\rm{lrd}}}}}}}}_{{{{{{{\rm{k}}}}}}}}(o)}}{\left\vert {N}_{{{{{{{\rm{k}}}}}}}}(o)\right\vert }$$
(4)
A LOF value close to 1 indicates that a point is located in a similarly dense region as its neighbours. A LOF value less than 1 represents an inlier, meaning that the point is situated in a denser region, while a value greater than 1 indicates an outlier. Since there is no definitive rule for selecting an LOF threshold to identify outliers, in this study, we chose the threshold as an LOF value which is larger than 80 % of LOF values for the given training set. This approach ensures that the threshold value varies with the PES exploration during the AL iterations. More details of the distance selector are provided in SI§ 3.2.In contrast to the variance metric, descriptor-based selectors can be applied to any regression method without incurring additional computational costs for training multiple models.As discussed in the next section, both descriptor-based selectors accelerate the training of MLP models compared to the energy selector. This acceleration arises from the reduction in QM calculations and the selectors’ ability to explore the relevant chemical space more efficiently.Performance of selectors – water modelsWe assessed the performance of three different selectors (energy, similarity, and distance) during the AL training of MLPs for water. In each case, we evaluated the accuracy of the generated potential by measuring the mean absolute deviation (MAD) of total energy and atomic forces with respect to the ground truth method, PBE0-D3BJ. We also considered the efficiency of training by analysing the number of configurations required for training. The AL process was considered complete when no configurations were selected within the maximum simulation time of 5 ps.Our results demonstrate that, in general, all descriptors provide accurate and stable potentials. This is evident from the direct comparison of predicted and ground-true energies for two systems: a small cluster system comprising 27 water molecules, from which we extracted 200 configurations from a 1 ps MD simulation, and a larger system consisting of 216 water molecules under PBC, where we performed 50 ps dynamics in the NVE ensemble (SI § S3.1). All MLPs achieved MAD errors in energy below 1 kcal mol−1 and errors in force of less than 2 kcal mol−1 Å−1. Moreover, all MLPs remained stable during the NVE simulations, which lasted significantly longer than the maximum simulation time in the AL process.The main difference between the selectors lies in the number of configurations selected during the training and computational efficiency. Descriptor-based selectors generally require fewer configurations, with 40 and 52 data points for similarity and distance selectors, respectively, compared to 281 data points for energy. To explore the geometrical similarity among the data selected by different selectors, we combined all the training sets (373 configurations in total, shown in grey in Fig. 2a) and analysed them with t-SNE maps based on global SOAP representation. This analysis reveals three distinct clusters. The bottom cluster represents initial structures in training, where water molecules are randomly placed and have relatively high energy. The middle cluster contains configurations with more structured arrangements due to the presence of hydrogen bonds (HB). Finally, the upper cluster comprises configurations near equilibrium, evidenced by lower energies. Interestingly, the energy selector predominantly selects geometries near the equilibrium configurations (67%), with only a handful of configurations in the middle/bottom cluster. In contrast, the configurations selected by either the similarity or distance selectors are more evenly distributed (Fig. 2b). The size of the training set does not influence the MLP performance, as all MLPs reproduced well the experimental radial distribution function (RDF) of water (Fig. 2c). The small differences in the positions and amplitudes of peaks for the first and second solvation shells are likely due to the level of theory used and the lack of nuclear quantum effects54.Fig. 2: Comparative analysis of selectors.a t-Distributed Stochastic Neighbour Embedding (t-SNE) maps of configurations generated during active learning using energy, similarity and distance selectors for a 27-water molecule system. Configurations generated by each selector are labelled by their energy relative to the lowest energy configuration in the dataset, while configurations obtained from other selectors are shown in grey. b Energy distribution of configurations obtained using the different selectors for a 27-water cluster system. c Oxygen-oxygen radial distribution function (RDF) from a 20 ps NVT Atomic Cluster Expansion (ACE) machine learning potential molecular dynamics (MLP-MD) simulation of 216 water molecules in an 18.65 Ã… cubic box under periodic boundary conditions. RDF obtained from ACE MLP trained with energy (orange), similarity (blue), or distance (green) selector. Experimental RDF is shown in grey shading88. Data associated with this figure is provided as a Source Data file.This demonstrates that both similarity and distance selectors explore the chemical space more efficiently, using only around 15% of the training data required by the energy selector. Furthermore, they reduce the structural correlation in the training sets, as illustrated in Fig. 2a. Instead of containing numerous points with similar geometries, these training sets encompass data points distributed across the space. Descriptor-based selectors are also significantly faster than energy-based ones as they do not require QM computation at every selection step, making them suitable for larger and more complex systems.Overall, the descriptor-based selectors demonstrate superior efficiency in terms of training speed and amount of data, outperforming the energy selector. All three selectors yield potentials with comparable accuracy and stability. However, the distance selector requires a more extensive initial data set to perform the selection criteria in the first iteration of the AL process, as it relies on neighbouring information. Throughout this paper, we will use the similarity selector for MLP training.Training strategy – DA reaction of CP and MVK in explicit waterThe ability of MLPs to accurately describe chemical processes in explicit solvents is critical for extending their application to more challenging systems. To this end, we investigated the solvent effects on the DA reaction between CP and MVK in explicitly modelled water and methanol. While this reaction exhibits only minor solvent effects compared to charged systems, several reports have shown rate acceleration and selectivity enhancement when water or aqueous solvent mixtures are employed55,56,57,58. The reaction is accelerated up to 58-fold in water compared to methanol56, and the endo/exo selectivity is enhanced by 8-fold in water over benzene58. This behaviour is widely explained by the formation of stronger HBs between solvent and substrate at the TS compared to the reactants state (RS) and product state (PS)59,60,61,62,63,64,65. In addition, solvent polarity and hydrophobic effects have also been suggested to contribute to this enhancement55,62,66,67.We trained four ACE MLPs for each endo/exo DA reaction between CP and MVK in either explicit water or methanol. For comparison, we also trained the models in implicit solvent, neglecting the presence of the explicit solvent molecules. Here, we describe the strategy employed to obtain the ACE MLP for the endo reaction in explicit water. A similar approach was used for the other systems (SI § S6 for further details). For the reaction in explicit solvents, the training set consisted of four subsets, each aimed at describing different types of interactions in the system. Subset 1 corresponds to the substrate complex (CP + MVK) and provides information about intramolecular interactions and intrinsic reactivity (Fig. 3a). Subsets 2 and 3 consist of the substrate with 2 and 33 water molecules, respectively, and aim at describing various solute-solvent interactions. Finally, subset 4 contains only water molecules, providing information about solvent-solvent interactions in bulk solvent. All subsets were generated independently using the AL scheme initiated from the transition state structure, except the pure water subset, which started from a random water configuration. This approach ensured that the training set contained reactants, products and connecting reaction paths for all studied environments. The combination of these sub-training sets yielded 600 training points, which we used to train the final ACE MLP. See SI § S6.1 for further details.Fig. 3: Training approach and accuracy of Atomic Cluster Expansion (ACE) machine learning potential (MLP) for endo Diels-Alder reaction of cyclopentadiene (CP) and methyl vinyl ketone (MVK) in explicit water.a The training data consists of four subsets, each describing key interactions. b Comparison of ground-truth and ACE MLP energies and forces over a 500-fs ACE MLP-molecular dynamics (MD) downhill dynamics for a system containing substrate and 55 water molecules with a timestep of 0.5 fs at 300 K.The accuracy of the resulting ACE MLP was assessed by conducting 500 fs ACE MLP-MD simulations starting from two configurations not included in the training set. The first configuration, which is similar to subset 2, consisted of the gas phase TS and three water molecules forming HBs with the carbonyl group of MVK. The resulting dynamics are stable with the energy error of 2 meV atom−1, demonstrating the ability of the ACE MLP to represent reactions and specific HB interactions (Supplementary Fig. 14). The second one corresponds to a TS immersed in a box containing 55 water molecules (box size: 12.42 Ã…, Fig. 3b). The resulting accuracy confirms that ACE MLP is reliable for investigating the reaction of CP and MVK in solution. Validation of the other systems is provided in the SI § S6.Application of ACE MLPs – DA reaction of CP and MVK in solventsAfter obtaining accurate and stable ACE MLPs for the reaction of CP and MVK in two solvent environments, we utilised these potentials to investigate the reaction pathway in more detail. This was done by conducting a relaxed 2D scan along the forming of C-C bonds r1 and r2 in both implicit and explicit solvents (Fig. 4a and  Supplementary Fig. 26, respectively). Analysis of the endo 2D scan in both implicit and explicit water reveals the presence of a zwitterionic-like structure characterised by the formation of only one C-C bond in the region around r1 < 1.6 Ã… and 2.5 < r2 < 3.0 Ã… (an example marked by a cross in Fig. 4a). Unrestricted DFT calculations on these geometries confirmed that they do not exhibit any diradical character (SI § S8). The cross-labelled zwitterionic species is slightly more stabilised in the explicit solvent than in implicit solvent (ΔΔE = 2.2 kcal mol−1).Fig. 4: Diels-Alder reaction of cyclopentadiene (CP) and methyl vinyl ketone (MVK) with r1 and r2 representing distances of the two formed C-C bonds. The reactant state (RS), transition state, and product state (PS) are depicted.a 2D potential energy surface (PES), which is along r1 and r2 distances, generated by Atomic Cluster Expansion (ACE) machine learning potential (MLP) in explicit water (box size 18.5 Ã…). Solid and dashed lines indicate the reaction pathway from PES and ACE MLP-molecular dynamics (MD), respectively. b Free energy surfaces (FES) obtained from ACE MLP-MD/umbrella sampling (US) simulation along the reaction coordinate of (r1 + r2)/2. The distribution of oxygen (red) and hydrogen (grey) atoms around the solute is shown for the reactants and intermediate state. c Downhill ACE MLP-MD dynamics in explicit water along r1 and r2 for 500 trajectories, including snapshots of forward/backward trajectories. d Number of hydrogen bonds per solvent molecule during explicit solvent uphill trajectories at the RS, intermediate (inter) if existent, and PS states as a function of distance (D) from the centre of mass (CoM) of the reactive molecules. Data associated with this figure is provided as a Source Data file.Using ACE MLP-MD in conjunction with umbrella sampling (US, ACE MLP-MD/US), we then computed the activation free energy, ΔG‡, in implicit and explicit solvents (Fig. 4b and Supplementary Fig. 29). ΔG‡ in implicit solvent are 21.2 kcal mol−1 and 23.6 kcal mol−1, for the endo and exo pathway, respectively (ΔΔG‡ = 2.4 kcal mol−1). Incorporating explicit solvent reduces these values to 18.8 kcal mol−1 and 20.3 kcal mol−1 (ΔΔG‡ = 1.5 kcal mol−1), thereby improving the agreement with experimental data (19.2 kcal mol−1 and 21.1 kcal mol−1, respectively56,57,65, Supplementary Table 5). These results also illustrate the significance of solute-solvent interactions in the reaction.Furthermore, the presence of solvent molecules influences the synchronicity of the reaction. In explicit solvent, the reaction exhibits an earlier and more asynchronous TS compared to the implicit solvent or gas phase. The difference in bond length between the forming C-C bonds Δr = ∣r2 − r1∣ at the TS increases from 30 pm in the gas phase to 37 pm in implicit solvent and 46 pm in explicit water. Notably, explicit water molecules also altered the reaction mechanism from a concerted asynchronous to a “pseudo” stepwise mechanism, as evidenced by the presence of a shallow local minimum in the FESs immediately after the TS (Fig. 4b). This intermediate state, which was observed for both endo and exo reactions, corresponds to a zwitterionic state. Interestingly, such an intermediate is absent in the PES, where the structure corresponds to a high energy state (labelled as a cross in (Fig. 4a)). This behaviour suggests that the intermediate arises from an entropic rather than an enthalpic contribution. The formation of an entropic intermediate has been previously reported by Singleton et al. for the reaction of cis-2-butene with dichloroketene, in which the free energy surface illustrates the entropic barrier and the mechanism change from concerted to stepwise68. In this study, we observed a similar phenomenon, highlighting the necessity of explicit solvent to capture the formation of an entropic intermediate.The presence of this entropic intermediate is further confirmed by downhill dynamics initiated from the TS (Fig. 4c and  Supplementary Table 13). For the endo reaction, the trajectories reveal a significantly more asynchronous reaction in explicit solvent compared to implicit solvent, in agreement with the ACE-MD/US data. The asynchronicity in the downhill dynamic is evident in the increased average time gap between the formation of the two C-C bonds, from 19.9 fs in an implicit water solvent to 84.3 fs in an explicit solvent. It is worth noting that although the average time gaps observed in the water exceeded 60 fs (the time gap criterion proposed by Houk et al. to distinguish concerted and stepwise mechanisms69), some trajectories displayed time gaps below this threshold, indicating that not all trajectories passed through the intermediate region, and certain trajectories bypass the intermediate free energy well. The presence of this intermediate did not affect the product ratio, as the intermediate lifetime was shorter than the C-C bond rotation period, leaving no time to form alternative products by bond rotation.A comparison of the average time gap for the reactions in explicit water and methanol reveals that the latter exhibits a more concerted mechanism, with a time gap of 24.8 fs. These distinct reaction mechanisms are in line with the differences in the synchronicity of this reaction in different solvents, where the TSs exhibit a Δr of 7 pm in methanol and 46 pm in water (listed in  Supplementary Table 12). In addition, a shift in the stability of the intermediate species occurs in methanol (further discussed in SI § S10.1).To assess the effect of solvent at the molecular level, we performed uphill trajectories propagated from the optimised RS towards the TS to PS (further details in SI § S10.2). In contrast to downhill dynamics propagated from the TS, uphill dynamics allow the solvent sufficient time to reorganise before the trajectory passes the free energy barrier, providing a more realistic view of solvent behaviour during the reaction.To determine the importance of HB stabilisation throughout the reaction, we analysed the number of HBs as well as their bond lengths (O(carbonyl)-H(water)) and angles (O(carbonyl)-H(water)-O(water)) distributions at RS, TS and PS in explicit water and methanol (Supplementary Table 15 and Supplementary Fig. 34). We also analysed the density distribution of water molecules surrounding the reactive species at the RS and intermediate states obtained from the corresponding US windows (Fig. 4b). The latter was chosen over the TS due to their similar geometry and the fact that only a few configurations representative of the TS were obtained. In both reactions, the number of HBs and the density of water’s oxygen and hydrogen atoms around the substrate remained practically constant. However, the reaction in water exhibited stronger HB interactions compared to methanol during the reaction process, as evidenced by the existence of HBs with shorter bond lengths at TS in water and a larger decrease in the angle in methanol.These observations challenge the hypothesis that the reaction is accelerated in water due to the stabilisation of the TS by enhanced HB interactions compared to the RS, as suggested in previous studies59,60,70. These differences can be attributed to the differing dynamics employed in the studies. For example, Houk et al. investigated the reaction in water using downhill dynamics from the TS and observed shorter HBs bond length and more linear bond angle at TS compared with RS and PS. In this approach, the TS with solvent molecules is fully optimised, and the solvent does not undergo complete reorganisation at RS or PS as the trajectory approaches the RS or PS at a faster rate than the reorganisation process due to the absence of energy barriers to overcome.The reactions in solution are further affected by hydrophobic effects66, which were investigated by analysing the change in cavity volume during uphill dynamics. The cavity volume was measured as empty space formed by the solvent when the solute was extracted from the system. Supplementary Fig. 33 illustrates the reduction in cavity volume from the RS to the PS. Reduction in the cavity is more pronounced in water with a change in cavity volume from RS to post-TS of − 60 Ã…3 compared to methanol ( − 40 Ã…3). The decrease in cavity volume is entropically unfavourable as it implies the formation of a more ordered solvent structure. However, at the same time, it is enthalpically favourable due to the formation of HBs among solvent-solvent (rather than solvent-solute) molecules, resulting in a decrease in ΔG‡. The importance of HB formation is supported by analysing the average number of HBs per solvent molecule at increasing distances from the solute. The distance was measured between the oxygen atom of the solvent and the centre of mass (CoM) of the RS, intermediate, and PS. Since the reaction passes the TS very quickly, the intermediate was chosen for analysis instead.As the distance from the CoM of the substrate to solvent molecules increases, the number of HBs for each solvent converges to the average bulk value, with two HBs for water molecules and one for methanol. Fig. 4d displays a peak in the number of HB in the water close to the substrate, demonstrating the higher number of HBs in the first solvation shell for the RS (2.13), intermediate (2.43), and PS (2.49) compared to bulk water. Such an increase suggests the organisation of water molecules around the substrate during the reaction. In contrast, no such peak is observed in methanol, indicating that the solvent HB network is not influenced by the presence of substrate. The organisation of solvent molecules should lead to a reduction in free energy, as previously stated. Magsumov et al. demonstrated a linear relationship between the free energy of cavity formation and the volume of the cavity for various solvents through MD simulations utilising classical force fields71. Applying their parameters for water and methanol, the change in cavity contribution from RS to post-TS was approximately − 2.6 kcal mol−1 for the endo reaction in water, compared to around − 1.1 kcal mol−1 in methanol, further evidencing that the reactions benefit more from the hydrophobic effect in the water than in methanol. These observations thus further support the role of the hydrophobic effect in the acceleration of DA reaction in water.

Hot Topics

Related Articles