Sensing the structural and conformational properties of single-stranded nucleic acids using electrometry and molecular simulations

Molecular effective charge values for short chain length ssNA fragmentsWe first measured the escape dynamics of a series of short chain length ssDNA fragments (\({n}_{\text{b}}=\) 5–12) diffusing in a lattice of electrostatic fluidic traps (see Fig. 1). The oligomeric species displayed distinct average escape times, \({t}_{\text{esc}}\), with the smallest \({n}_{\text{b}}\) = 5 fragment exhibiting an escape time of \({t}_{\text{esc}}\approx\) 29 ms and the longest fragment with \({n}_{\text{b}}\) = 12 a value of \({t}_{\text{esc}}\approx\) 98 ms (Fig. 1E). The measured values of \({t}_{\text{esc}}\) were then converted to values of molecular effective charge, \({q}_{\text{eff}}\) (see “Methods” and Refs.32,33,34,36,37,53). A calibration measurement using 60 basepair dsDNA—a species that has been previously thoroughly characterized in electrometry—was used to determine the value of the surface potential of the slit, \({\phi }_{\text{s}}\), which is a priori unknown (see “Methods”)32,34.We then compared our measurements with effective charge values for simulated molecular structures, \({q}_{\text{calc}}\), calculated as described previously33,34,35,38,54. Note that each molecular species (apart from the \({n}_{\text{b}}\) = 5 case) is characterised by a value of \({q}_{\text{str}}=-\left({n}_{b}+3\right)e\), which results from accounting for the additional charges of the attached fluorescent dyes (see “Methods” and Figs. S1, S3). We found close agreement (within 5% on average) between the effective charge values computed for model ‘half-helix’ structures, \({q}_{\text{calc},\text{HH}},\) (orange data points) and the experimentally measured data (blue data points) (see Fig. 2A). Performing 0.5 \(\upmu\)s long MD simulations (employing the DES-Amber forcefield) on these short chain molecules permitted their configurations to depart from their initial helical structure, yielding a conformational ensemble of structures characterised by a spectrum of \({R}_{\text{g}}\) values and therefore a range of calculated \({q}_{\text{calc},\text{MD}}\) values. Calculations of the effective charge for conformations corresponding approximately to the minimum and maximum \({R}_{\text{g}}\) values obtained from MD simulations (see “Methods” and Fig. S2 for details) revealed a small amount of variation in their \({q}_{\text{calc},\text{MD}}\) values, reflecting the conformational heterogeneity (associated error bars of the MD structure data points in Fig. 2A). In order to enable a quantitative comparison of calculated values with experimental readouts, the effective charge contributed by the covalently attached fluorescent dye molecules was added a posteriori to the effective charge calculated for dye-free molecular conformations (see “Methods” and Fig. S3). Importantly, for the series of short nucleic acid molecules, we found that the measurements revealed \({|q}_{\text{eff}}|\approx {|q}_{\text{str}}|\). In the regime where a typical length scale describing the extent of the molecule, such as the polymer contour length \({l}_{\text{c}}\), is smaller than the Debye length, i.e., \({l}_{\text{c}}\approx 5\) nm \(<{\kappa }^{-1}\approx 10\) nm, the ‘far-field’ electrostatic potential distribution resembles that of a sphere and charge renormalization vanishes, as recognised in previous theoretical work39,40. Under our present experimental conditions (\(\kappa {l}_{\text{c}}<\) 1), the conformational properties of very short fragments of ssDNA are not expected to significantly impact the measured charge renormalization values, therefore implying \(\eta \approx\) 1. Indeed, for 5-base ssDNA, for example, we measure \(\eta\) = 0.94. Thus for short nucleic acids our present measurements are rather insensitive to the precise 3D arrangement of the charge in the molecule, since under the conditions of the measurement, the spatial extent of the molecular charge distribution under consideration is smaller than the Debye length. Figure 2B displays the contours of electrostatic potential around molecular structures corresponding approximately to the minimum and maximum \({R}_{\text{g}}\) for an \({n}_{\text{b}}=12\) polymer from MD, which can be seen to resemble one another a short distance away (\(\sim 0.5{\kappa }^{-1}\)) from the molecule, with the potential distribution appearing similar to that of a small sphere for smaller values of \({n}_{\text{b}}\). The two conformations are characterised by a difference in \({q}_{\text{calc}}\) of 0.7 e or less than 6% of the mean value of 12.6 e. For this reason, \({q}_{\text{calc}}\) values calculated for model structures consisting of only a half-helix, without any higher-order structural information, find good agreement both with MD-simulated conformations as well as with the experimental data. Thus, although little molecular structural or conformational insight can be gleaned in this electrostatic regime (\({l}_{\text{c}}< {\kappa }^{-1})\), the results demonstrate sensitivity of the measurement to the addition of just a single nucleobase. Note that the measurement of 100–1000 molecules per species in this study yields high-precision measurements of \({q}_{\text{eff}}\) (measurement imprecision \(\approx\) 0.1–1%), which should enable small structural differences between species to be discerned, as described later. Although highly sensitive to relative differences between species, we expect that absolute values of the measurements reported here may reflect an inaccuracy of up to 10%, which arises from a number of different experimental sources such as sample purity, and the influence thereof on the fitting procedure to extract \({t}_{\text{esc}}\) values, as well as accuracy limits on the value of \({q}_{\text{eff}}\) of the calibration molecule (see “Methods”). This can however be improved upon in future such that the accuracy of the method better reflects the precision. The inaccuracy of \(\approx\) 10% on \({q}_{\text{eff}}\) in this work nonetheless does not preclude comparisons with computational estimates of the same quantity under conditions where \({q}_{\text{eff}}\) and \({q}_{\text{calc}}\) are more sensitive to the 3D charge distribution in the molecule, i.e., \({l}_{\text{c}}> {\kappa }^{-1}\), as described in the following section.Fig. 2Molecular modelling of short chain ssNAs (\({n}_{\text{b}}=\) 5–12) and comparison of experimental measurements of molecular effective charge (\(|{q}_{\text{eff}}|\)) with calculations (\(|{q}_{\text{calc}}|\)). (A) Plot of the molecular effective charge |\({q}_{\text{eff}}|\) as measured by ETe (blue symbols), calculated for model half-helix structures (|\({q}_{\text{calc},\text{HH}}|\), orange symbols) and for model conformations taken from MD simulations (\({|q}_{\text{calc},\text{MD}}|\), red symbols) vs. molecular structural charge \({|q}_{\text{str}}|\). The simulated structures are free of dye molecules but the effective charge of two ATTO dye molecules is added post-calculation to enable comparison with the experimental measurements (see “Methods” and Figs. S1, S3). Error bars on \({|q}_{\text{eff}}|\) arise predominantly from the statistical error on the fitted \({t}_{\text{esc}}\) values. The error bars for the MD data points represent extrema values of the effective charge of structures across the \({R}_{\text{g}}\) range of the molecule observed in simulations. Dashed grey line represents \(\eta =1\), presented for reference. Lower panel: Plot of the percentage difference, \(r\), between calculations of effective charge for model half-helix structures, |\({q}_{\text{calc},\text{HH}}|\), and experimental measurements \({|q}_{\text{eff}}|\). (B) Representative electrostatic potential distributions surrounding an \({n}_{\text{b}}=\) 12 oligomer (\({|q}_{\text{str}}|=\) 15 e) obtained by solving the PB equation for 3D atomic charge distributions of conformations corresponding to extrema values of \({R}_{\text{g}}=\) 1.65 nm (top) and \({R}_{\text{g}}=\) 1.1 nm (bottom) taken from an MD simulation using the DES-Amber forcefield. Electrostatic potential contours are shown at \(\phi =\) − 5, − 2.5, − 1.25 and − 0.625 \({k}_{\text{B}}T/e\). The difference in |\({q}_{\text{eff}}|\) between the two structures is \(\approx\) 0.7 e or \(<\) 6% of the mean value of 12.6 e.Sequence dependence of effective charge values for longer chain ssNAsInspired by a range of X-ray scattering studies, smFRET, and force extension measurements that have reported structural differences between homopolymeric sequence ssNAs composed of an identical number of bases, we turned our attention to examining differences in homopolymeric ssNAs. The precision offered by ETe and the sensitivity of the underlying physics to the 3D charge distribution in the molecule may indeed offer the prospect of using effective charge to detect sequence dependent differences in molecular structure and/or conformation. Note that when measurements on various molecular species are performed under identical conditions, as defined by the same measurement device and solution conditions, the measurement precision (\(\le\) 1% in this study) effectively becomes the relevant quantity in determining the ability of the approach to discriminate between different molecular states, conformations or species.We performed experiments on ssNA species of two different lengths (\({n}_{\text{b}}\)= 30 and \({n}_{\text{b}}\) = 60) and explored the effect of nucleic acid sequence alone on the measured effective molecular charge. The measurement regime corresponds to \({l}_{\text{c}}>{\kappa }^{-1}\) where \(\kappa {l}_{\text{c}}\approx\) 1.7 for 30 base and \(\kappa {l}_{\text{c}}\approx\) 3.4 for 60 base ssDNA, assuming an approximate contour length per base value \({b}_{\text{c}}\) = 0.5 nm. In this regime we expect substantial charge renormalization (\(\eta\)<1), in contrast to the measurements performed on short ssNA fragments. Indeed we found significant charge renormalization corresponding to \(\eta \approx\) 0.6–0.8 in general (e.g., \(\eta \approx\) 0.8 and \(\eta \approx\) 0.7 for 30 and 60 base poly-dT respectively), and differences in measured \({q}_{\text{eff}}\) of approximately 5% between 30 base poly-dT and poly-dA, and 10% between 60 base poly-dT and poly-dA. For instance, the measured difference in \({q}_{\text{eff}}\) between poly-dT and poly-dA homopolymers is \(\approx\) 2 e and \(\approx\) 4.5 e for the \({n}_{\text{b}}\)= 30 and \({n}_{\text{b}}\) = 60 length chains respectively. These values reflect inter-species disparities in effective charge that are much larger than measurement precisions of approximately 0.5% and 1% for \({n}_{\text{b}}\) = 30 and \({n}_{\text{b}}\) = 60 respectively. Importantly, the measured order of \(\left|{q}_{\text{eff}}\right|\) follows poly-dT > mixed sequence > -rU \(\ge\) -dA and implies different average molecular conformations of the homopolymers (Fig. 3A,B). We attribute the sensitivity of the readout to molecular structural detail to the fact that in the regime \({l}_{\text{c}}>{\kappa }^{-1}\), characterising 60 base fragments, higher order multipoles of the molecular charge distribution make a greater contribution to the electrical potential in the far-field, i.e., at large distances from the molecule, \(h>3{\kappa }^{-1}\)55,56. This accentuates the sensitivity of the electrical free energy measurement to molecular 3D conformation and highlights the ability of the ETe technique to sense spatial features of molecular charge density distributions.Fig. 3ETe detects composition differences between ssNA oligos. (A) Probability density distributions \(P\left(\Delta t\right)\) of residence times for measurements on single stranded homopolymers poly-dT (blue), -dA (red), a mixed DNA sequence (purple) and RNA poly-rU (orange), shown here for the fragment length \({n}_{\text{b}}=60\), and for the calibrator molecule, 30dsDNA (grey) (see “Methods”). Atomic representations of the nucleotide bases are shown as insets. (B) Measured \({t}_{\text{esc}}\) values for the data presented in (A) (upper panel). Two datasets of \({|q}_{\text{eff}}|\) values inferred from the measured data (solid and dashed vertical lines), for the \({n}_{\text{b}}=60\) ssNA polymers (lower panel). Widths of presented Gaussian distributions of \({t}_{\text{esc}}\) and error bars on \({|q}_{\text{eff}}|\) arise predominantly from the statistical error on the fitted \({t}_{\text{esc}}\) values in (A). All measurements were performed in devices with slits of depth \(2h\) = 75 nm and surface potential value \({\phi }_{\text{s}}\) = −\(2.24\,{k}_{\text{B}}T/e\), under similar electrolyte conditions NaCl \(=\) 1.2 mM, pH \(=\) 7.5. The average molecular escape times, \({t}_{\text{esc}}\), measured effective charge values, \({q}_{\text{eff}}\), and number of escape events \(N\) for the given measurement dataset are as follows: poly-dT: \({t}_{\text{esc}}=\) \(372.01\pm 2.13\) ms (\({q}_{\text{eff}}=-44.3\pm 0.0923\,e, N=1.1\times {10}^{5}\)), poly-dA: \({t}_{\text{esc}}=\) \(265.79\pm 2.62\) ms (\({q}_{\text{eff}}=-40.7\pm 0.165\, e, N=6\times {10}^{4}\)), mixed sequence: \({t}_{\text{esc}}=\) \(323.73\pm 6.864\) ms (\({q}_{\text{eff}}=-42.3\pm 0.342\,e, N=5.2\times {10}^{3}\)), poly-rU: \({t}_{\text{esc}}=\) \(293.11\pm 2.94\) ms (\({q}_{\text{eff}}=-41.5\pm 0.165\,e, N=5.8\times {10}^{4}\)). (C) Gel electrophoresis of ssNA samples for \({n}_{\text{b}}\) = 30 and 60 (see Fig. S6 for details). The electrophoretic mobility follows poly-dA > mixed sequence >  -dT >  -rU\(.\)We further noted good qualitative agreement in comparing trends in molecular effective charge and conformation deduced using electrometry with those inferred from other experimental techniques. The measured \({q}_{\text{eff}}\) values for poly-dA and poly-rU are significantly smaller than that of poly-dT indicating higher density charge distributions that are subject to greater charge renormalization (smaller \(\eta\)), compared to poly-dT (e.g. \(\left|{q}_{\text{eff}}\right|\)= 44.1 e, 38.7 e and 40.7 e for poly-dT, -dA and -rU, corresponding to \(\eta\) values 0.70, 0.61, 0.65 respectively in a given measurement dataset)24. Techniques such as ASAXS and atomic emission spectroscopy (AES) directly quantify the number of excess ions in the ion atmosphere surrounding nucleic acids and therefore also report on a molecule’s effective charge in solution43,57,58,59. Plumridge et al. have conducted AES measurements of NA homopolymers and reported values for the excess number of Na+ counterions per phosphate to be \(\approx\) 0.68, 0.71 and 0.83 for poly-dT30, -dA30 and -rU30 respectively, implying a smaller magnitude of \({q}_{\text{eff}}\) for poly-dA and -rU compared to poly-dT (see Eq. (1) and Supplementary Information Sect. S7)15,60. In accordance with these results, SAXS measurements found that poly-dT polymers were associated with the fewest excess ions whereas other oligonucleotides considered, which had T bases substituted with A, G or C, were all associated with a larger number of excess counterions61. Furthermore, an AES study comparing 50 base poly-rU and -dT ssNA homopolymers implied a smaller magnitude of \({q}_{\text{eff}}\) for RNA compared to DNA, and attributed this disparity to an increased polymer charge density brought about by the smaller inter-charge spacing for RNA62. We also note here that the counterion excess \({\Gamma }_{+}\) as reported by AES measurements can be less sensitive to the charge density of a rigid rod model of ssDNA than \({q}_{\text{eff}}\), particularly at low ionic strengths and in relation to the measurement precision of each method (see Supplementary Information Sect. S7 for further details). For example, the magnitude of the difference in \({\Gamma }_{+}\) between poly-dA and poly-dT of around 4% in AES measurements can be seen to correspond to \(>\) 10% difference in \(\eta\) (see Fig. S7)60. Furthermore, our measured differences in \({q}_{\text{eff}}\) between molecular species are much larger than the typical measurement imprecision and immediately suggest differences in the 3D distribution of charge in the various homopolymers that are qualitatively consistent with expectations from ion counting studies. In summary, ETe measurements of molecular effective charge in solution indicate that |\({q}_{\text{eff}}|\) values follow the ordering poly-dT > mixed sequence > -rU \(\ge\) -dA and are in agreement with AES studies. The ssNAs under consideration also display clear but different qualitative trends in their electrophoretic mobilities in polyacrylamide gels which may be more challenging to interpret and are discussed further in Supplementary Information Sect. S5 (Fig. 3C). Furthermore, inferences on molecular conformation from ETe in combination with molecular modelling analyses are in some cases supported both by experimental smFRET and SAXS measurements, as described in the following section24.Comparing electrometry measurements with the results of computational molecular modellingGeneral considerationsBecause the ETe readout is highly amenable to computational modelling, we embarked on an analysis that sought to quantitatively compare and relate the measurements with predictions from molecular modelling approaches. We therefore generated a host of representative ssNA structures from MD simulations employing various forcefield descriptions. The conformational free energy landscape of the molecule obtained from MD simulations was characterised by the radius of gyration \({R}_{\text{g}}\) and end-to-end distance parameter \(R\) (Fig. 4B). We then determined \({q}_{\text{calc}}\) values for a pool of representative structures using our PB electrostatics framework (see “Methods”). We typically worked with a subset of structures for each case that included the most likely conformation as well as structures corresponding to the maximum and minimum \(R\) values on the 25th percentile contour of structures in \({(R}_{\text{g}}, R)\) parameter space (black crosses in Fig. 4B), furnishing a range of possible effective charge values for a given molecular species. Although the measured sequence-dependent differences in \({q}_{\text{eff}}\) are more significant for \({n}_{\text{b}}=\) 60 compared to \({n}_{\text{b}}=\) 30, we restricted all-atom simulation studies to 30-base oligomers owing to the significantly lower computational cost.Fig. 4Molecular modelling and calculation of theoretical effective charge values for long chain ssNAs (fragment lengths \({n}_{\text{b}}\)= 30 and 60) (A) Molecular model of the initial ‘half-helix’ configuration used for MD simulations employing conventional all-atom forcefields Amberbsc1 and CHARMM36, specialised ssNA DES-Amber and HBCUFIX forcefields and the coarse grained oxDNA and oxRNA models. (B) Simulated conformational landscapes of ssNA species (poly-dT—blue, poly-dA—red, poly-rU—orange), parametrised in terms of the polymer radius of gyration, \({R}_{\text{g}}\), and end-to-end distance, \(R\). Each contour encloses a 12.5th percentile of structures. \(R\) vs. \({R}_{\text{g}}\) landscapes are shown for the DES-Amber (left) and oxDNA models (right) for \({n}_{\text{b}}=\) 30 and \({n}_{\text{b}}=\) 60 respectively (inset: representative structures with the most likely \(R\), \({R}_{\text{g}}\) combination). The \(R\) vs. \({R}_{\text{g}}\) landscapes for ssNAs modelled with the other forcefield descriptions are shown in Fig. S5. (C) Comparison of the effective charge values, \({|q}_{\text{calc}}|\), for long chain ssNA structures (\({n}_{\text{b}}=\) 30 and 60) obtained via molecular modelling approaches, with experimentally measured \({|q}_{\text{eff}}|\) data (open circles). Measurements (left column, circular symbols and associated measurement precision error bars) represent the average values of two measurement datasets. Horizontal shaded bands depict the range of \(\left|{q}_{\text{eff}}\right|\) values across two measurement datasets. \(\left|{q}_{\text{calc}}\right|\) values of representative model ssNA structures (right columns) carry upper and lower bounds that correspond to structures with the maximum and minimum \(R\) values contained within the most likely 25th percentile of structures (corresponding to black crosses in (B)). (D) Contour length per base, \({b}_{\text{c}}\), values calculated from molecular simulations employing different forcefield descriptions, and where \({b}_{\text{c}}\) is calculated as the average inter-phosphate distance for all atom models or as the average inter-backbone site distance for oxDNA models. Black symbols denote values for both poly-dT and poly-dA sequences that share the same \({b}_{\text{c}}\) value.In keeping with qualitative expectations, we found in general that model structures with larger values of (\({R}_{\text{g}},R\)) gave rise to a larger magnitude of \({q}_{\text{calc}}\), and vice versa (error bars in Fig. 4C display the range of calculated values for each case). Despite the fact that we are only able to report all-atom simulation results for \({n}_{\text{b}}=\) 30 where the sensitivity of effective charge to molecular properties is poorer under the present experimental conditions, and measured disparities between the molecular species are therefore smaller, it must be noted that percentage changes in \({q}_{\text{calc}}\) are small compared to the corresponding percentage changes in \({R}_{\text{g}}\) and \(R\) (e.g. \(\approx\) 10% and 20% change in \({R}_{\text{g}}\) and \(R\) respectively resulted in a 1.5% difference in \({q}_{\text{calc}}\) for poly-dT modelled with DES-Amber). Overall, we found that model structures for different molecular species, e.g., poly-dT, -dA and -rU, that are characterised by very similar values of \({(R}_{\text{g}},R)\), also had very a similar \({q}_{\text{calc}}\). However, we also found that this is not always the case, and there were instances where we noted more substantial differences (\(\approx\) 2–3%) in \({q}_{\text{calc}}\) for model structures of different species characterised by the same \({(R}_{\text{g}},R)\). Therefore, the ETe readout appears to carry information reflecting both the global properties of the charge distribution, as captured for instance by \({(R}_{\text{g}},R)\), as well as local properties such as the charge density along the backbone of the molecule, and it is challenging to clearly quantify the relative magnitude of these contributions to the overall molecular effective charge.Molecular dynamics simulations: the Amber and CHARMM forcefieldsOf the available classical all-atom models for simulating nucleic acid dynamics, the parmbcs1 model of the Amber forcefields and the CHARMM36 forcefield represent the state of the art, and have undergone significant rounds of improvement since their inception63. However, these forcefields were originally parametrised for dsDNA and are therefore known to have shortcomings in modelling ssDNA21,64. Here we discuss theoretical molecular effective charge values obtained for ssNA structures taken from all-atom MD simulations performed with the Amber and CHARMM forcefields for \({n}_{\text{b}}\) = 30.As noted previously, simulations of poly-dT and -dA30 performed with the Amber-bsc1 forcefield, generated structures did not depart from the initial half-helix configuration (Fig. S4)21. This outcome is in contradiction with the SAXS and smFRET experimental data that poly-dT is a somewhat disordered polymer chain, even under low salt conditions21,64. Overall, the simulations indicated very similar conformational properties for poly-dT and poly-dA as well as comparatively narrow \({R}_{\text{g}}\) vs \(R\) distributions for both (Figs. S4, S5). This is reflected in the similarity of their corresponding calculated molecular effective charge values, \(\left|{q}_{\text{calc}}\right|\approx\) 22.\(6\pm\) 0.1 e (blue and red square symbols in Fig. 4C). This result is in obvious qualitative disagreement with our experimentally measured \({q}_{\text{eff}}\) values that displays clear differences of approximately 8% between poly-dT and poly-dA (open circles in Fig. 4C).The CHARMM36 forcefield, on the other hand, appeared to perform much better at capturing the conformational properties of ssNAs in general. For instance, the results compare more favourably with the experimentally reported \({R}_{\text{g}}\) and \(R\) values for poly-dT30, as well as with inferences on the ‘local’ polymer chain properties from SAXS measurements at 20 mM NaCl (see Supplementary Information Sect. S4 for further discussion)15,23,43. Interestingly, simulations with the CHARMM forcefield revealed a marked difference in conformational ensembles for poly-dT and poly-dA sequences, with the poly-dA structures characterised by collapsed conformations that resided in metastable hairpin structures, as reflected in a small average end-to-end distance (\({R}_{\text{g}}\approx 2.3\) nm and \(R\approx 3\) nm), well below the reported SAXS values of \({R}_{\text{g}}=2.72\) and \(R=7\) nm for poly-dA30 (Fig. S5)23. However, such globular forms of poly-dA have not been observed in SAXS experiments up to salt concentrations as high as 1 M and possibly points to a shortcoming of the CHARMM36 forcefield in modelling poly-dA28. Indeed we found that the simulated conformations for poly-dA, and to a lesser extent for poly-dT, gave calculated values of molecular effective charge that fell significantly short of those measured by ETe by about 20% (blue and red triangle symbols in Fig. 4C). This is not surprising, given that conformational compactness generally entails higher charge renormalization (\(\eta <\) 1) and therefore a lower magnitude of effective charge. Nonetheless, despite the fact that the absolute \({q}_{\text{calc}}\) values were lower by about 20% than the measured \({q}_{\text{eff}}\) values (e.g., poly-dT30 \({|q}_{\text{calc}}|\approx 21.5\, e\), \(|{q}_{\text{eff}}|\approx 27.1\,e)\), the qualitative trend indicated by simulations that poly-dT has a less compact conformation than poly-dA (\(\approx\) 7% disparity in \({|q}_{\text{calc}}|\)) is indeed captured in experimental data that points to an \(\approx\) 8% disparity in \(|{q}_{\text{eff}}|\). However, for poly-rU RNA, the CHARMM36 forcefield resulted in more extended structures and thus higher magnitudes of \({q}_{\text{calc}}\) values than the DNA homopolymers (see Figs. 4, S5)—a trend clearly not observed in the \({q}_{\text{eff}}\) values measured by ETe. Such disparities between measurement and structural modelling trends was a recurring theme for the force-fields examined in this study, except for DES-Amber discussed next.The specialised DES-Amber and HB-CUFIX forcefieldsNext we analysed results from DES-Amber, a next generation ssNA forcefield. We found that DES-Amber (star symbols in Fig. 4C) was able to better capture experimental sequence dependent trends in \({q}_{\text{eff}}\) for ssNAs, i.e., \(\left|{q}_{\text{eff}}\right|\) of poly-dT >  -rU \(\ge\) -dA. The DES-Amber forcefield overcomes some of the deficiencies seen with the Amber and CHARMM36 models, with the ssDNA exhibiting some degree of flexibility whilst not engaging in extensive internal base-pairing that results in the DNA collapsing into globular conformations. Furthermore, DES-Amber indicates minor differences in the conformational ensembles of poly-dT and poly-dA, with poly-dT exhibiting a higher degree of polymer flexibility (able to transiently access more extended and collapsed conformations) and poly-dA adopting more rigid and compact structures with a higher degree of inter-base stacking (Fig. 4B). These trends may be seen to be reflected in the \({|q}_{\text{calc}}|\) values, which are \(\approx\) 22.8 \(\pm\) 0.15 e and \(\approx\) 22.6 \(\pm\) 0.08 e for poly-dT and poly-dA respectively (Fig. 4C). Importantly however, this difference in effective charge between the two species of \(\approx\) 0.2 e (\(\approx\) 1%) remains substantially lower than the experimentally measured disparity in \({q}_{\text{eff}}\) of \(\approx\) 2 e (\(\approx\) 8%) measured by ETe. Such quantitative discrepancies between experimental readouts and DES-Amber modelling results are also evident in other studies. For example, whilst values of \({R}_{\text{g}}\) for both poly-dA and -dT modelled with DES-Amber are in good agreement with the reported experimental SAXS value of approximately \(3\) nm, the DES Amber mean end-to-end length (\(\approx\) 9 nm) for both cases appears to be larger than the experimentally reported values of \(R\approx 7\) nm at the lowest salt concentrations probed23,60.Finally we turn our attention to modelling results for poly-rU. The DES-Amber forcefield was introduced as being able to model RNA successfully, having been validated to capture the end-to-end length for RNA as measured by smFRET65. Importantly, we found that the DES-Amber forcefield generated structures for DNA and RNA whose calculated \({|q}_{\text{calc}}|\) values were indeed significantly different (see Fig. 4C). The MD conformational ensemble for DES-RNA reflects a higher degree of polymer flexibility and a much shorter average end-to-end length (\({R}_{\text{g}}\approx\) 3 nm and \(R\approx\) 8 nm) than its DNA counterpart (see Fig. 4B). Furthermore, the average contour length of DES-RNA (calculated as the sum of inter-phosphate distances along the polymer chain) was calculated to be 17.2 nm for \({n}_{\text{b}}=30\), considerably shorter than poly-dT, which measured 19.5 nm. This ratio in contour lengths of poly-rU to -dT is 0.88, which is in close agreement with the value of \(\approx\) 0.87 reported using a combination of SAXS and smFRET in Ref.24, albeit for measurements performed under different conditions. We expect the suggested reduced spatial extent and inter-backbone charge spacing of the RNA polymer chain to result in a reduced molecular effective charge for RNA compared to DNA. Indeed, we found that representative configurations for 30 base poly-rU and -dT from DES-Amber have average \({|q}_{\text{calc}}|\) values of 22.1 and 22.8 e respectively, giving a ratio of 0.97 between -rU and -dT (Fig. 4C). Despite the fact that the absolute values are similar but not identical to the measured values (\({|q}_{\text{eff}}|\approx\) 26.2 \(\pm\) 0.12 e and 27.4 \(\pm\) 0.1 e), the qualitative trend obtained for the simulated molecular structures, as reflected in the ratio of the two values, is in fact captured in the electrometry measurements that indicate a ratio of 0.96. Thus, experimentally measured disparities between the two molecular species are indeed successfully captured in DES-Amber simulations. It is worth noting again in this context, that whilst the ETe measurements are of high precision (\(\le\) 1% in this work), which enables differences and relative trends in effective charge between molecular species to be measured reliably, the accuracy of the absolute values of \({q}_{\text{eff}}\) can display around 10% uncertainty in the present measurements which precludes rigorous testing against absolute \({q}_{\text{eff}}\) values for computed structures. Furthermore, minor additional uncertainties on modelling parameters enter the picture on the computational side, carrying implications for quantitative comparisons of absolute values between experiment and simulation. One source of uncertainty in the electrostatic modelling of molecular structures concerns the width of the ion accessible region, w, surrounding the molecular structure, which introduces a source of variation in the magnitude of the \({q}_{\text{calc}}\) values that may well account for about 5% of the presently noted disparity in absolute values. For instance, increasing the value of \(w\) from 0.2 to 0.4 nm raises the magnitude of the \({q}_{\text{eff}}\) by \(\approx\) 2% per Å for 30 and 60 base poly-dT. Currently therefore, these considerations in both experiments and computation taken together place constraints on the ability of our present methodology to clearly highlight shortcomings and gaps in molecular simulation models when disparities in absolute values of \({q}_{\text{eff}}\) between experiments and structural models lie in the range of \(\approx\) 10% or less. Despite the noted disparities between absolute values of effective charge from measurement and computation, inferences based on measured differences or relative trends in effective charge between species measured under identical conditions are less subject to experimental inaccuracy and model-parameter uncertainty in computation. Comparative trends in effective charge therefore do nonetheless provide a suitable basis on which to compare experiments with computational indications, and may aid in the fine-tuning of molecular models.Finally, we employed an alternative ssRNA model, HBCUFIX, which has been demonstrated to better capture the persistence length and \({R}_{\text{g}}\) of short poly-rU chains compared to DES-Amber66. We found that the conformational landscape sampled by this molecular model was overall similar to that of DES-Amber, exhibiting a substantial exploration of the conformational parameter space, reflecting that of a flexible polymer chain (see Fig. S5). The effective charge value of a representative poly-rU structure using the HB-CUFIX model of ssRNA gave \({|q}_{\text{calc}}|\approx\) 21.3 e, similar to the DES-Amber value of \(\approx\) 22.3 e. Thus we found that under the conditions of this study, the specialised ssRNA forcefield descriptions generated effective charge values that were in good mutual agreement.The coarse grained oxDNA modelHaving examined and compared our experimental data with a range of atomistic MD models we then performed a comparison of the results of oxDNA2, a coarse-grained DNA model, with our experiments for both nucleic acids lengths \({n}_{\text{b}}=\) 30 and 6067. We proceeded along the same lines, relating computed effective charge values, \({|q}_{\text{calc}}|\), with experimental measurements as described previously. Since oxDNA was optimised to predict the self-assembly, structure and mechanical and thermodynamic properties of DNA nanostructures, e.g., DNA origami, it is not entirely clear if oxDNA predictions may generally be relied upon as providing an accurate picture of ssNA molecular 3D conformation67. However, the oxDNA2 model incorporates a simple combination of both sequence-specific inter-base stacking strengths and a salt dependent non-bonded interaction term between backbone sites that could be sufficient to describe the molecular configuration of DNA. In fact we found that computed trends in \({|q}_{\text{calc}}|\) values for oxDNA structures converted to all atom structures (see “Methods” for details) compared favourably with experimental \({|q}_{\text{eff}}|\) trends (i.e., \(\left|{q}_{\text{eff}}\right|\) poly-dT > mixed sequence \(>\) -dA) determined with ETe for DNA, for both \({n}_{\text{b}}\) = 30 and \({n}_{\text{b}}\) = 60 (crosses in Fig. 4C). For example, the ratios in \({|q}_{\text{calc}}|\) values of poly-dT to poly-dA, modelled with oxDNA, were calculated to be \(\approx\) 1.03 and 1.06 for \({n}_{\text{b}}\) = 30 and \({n}_{\text{b}}\) = 60 respectively, which finds qualitative agreement with the ETe measured ratios of 1.08 and 1.11. At the same time, the mean end-to-end length, \(R\approx 10\) nm, implied by the oxDNA results for poly-dT30 at the salt concentrations of our measurements (ca. 1 mM) is similar to DES-Amber (\(R\approx\) 9 nm) but is slightly larger than SAXS measurements, implying some quantitative disparities in comparisons of oxDNA results with other experimental data23,43.We note however, that although successful for DNA, the oxRNA model, which incorporates the weakest inter-base stacking between adjacent U bases compared to A and T, generated structures with the largest values of \({R}_{\text{g}}\) and \(R\), and the largest \({|q}_{\text{calc}}|\) values of all sequences considered, similar to the results obtained with the CHARMM36 forcefield (see Fig. 4C). This trend, reflected in the ratio of \({|q}_{\text{calc}}|\) values for poly-dT to poly-rU of approximately 0.97, does not agree well with our measured ratio in \({|q}_{\text{eff}}|\) of 1.07 for \({n}_{\text{b}}=\) 60. Thus, of all the models tested for RNA it appears that the DES-Amber forcefield better captures the trend in effective charge measurements. This likely points to the importance of the incorporation of QM-level accurate base stacking and torsional energetics in determining RNA 3D conformation65.In summary, ETe reveals a lower |\({q}_{\text{eff}}|\) value for poly-dA DNA compared to poly-dT, which we may attribute to a more compact polymer structure by comparing our measured trends with those that are reflected in MD model structures. At the microscopic level, this compaction is believed to arise from an increased inter-base stacking strength of neighbouring adenine bases compared to thymine (see Fig. 4B), further experimental evidence for which can be found from AFM pulling measurements68,69,70. Many of the other inter-base stacking strengths (including G, C and other pairwise permutations) have been calculated to lie in between these two extrema which can in fact be seen to be reflected in the intermediate value of both \({q}_{\text{eff}}\) and electrophoretic mobility of the mixed sequence case for \({n}_{\text{b}}\) = 60 (see Figs. 4C, 3C)71. However, it is interesting that the ETe measurements place poly-rU as having an effective charge similar to that of poly-dA, whereas arguments on base stacking strength alone would place poly-rU as having the highest |\({q}_{\text{eff}}|\) of all homopolymers examined in this study, as also suggested by coarse grained oxRNA simulations. However, poly-rU structures obtained with the DES-Amber forcefield clearly indicate a shorter average end-to-end distance \(R\) and contour length \({l}_{\text{c}}\) for RNA compared to DNA, and capture the ETe readout better.The rigid-rod model of ssDNAAlthough molecular simulations are growing increasingly sophisticated and powerful, field theoretical descriptions remain important in quantitative modelling of experimental data. This is particularly important in descriptions of many-body phenomena where computational expense grows rapidly with system size. For example, in a solution phase electrostatics simulation a typical atomistic computation could involve at least 1010 ions and up to 1000 times as many water molecules.Thus in order to glean further physical insight into the experimental trends observed for \({q}_{\text{eff}}\) as a function of ssNA sequence we sought to relate our measured effective charge values to simple rigid rod models of the ssNAs within a continuum electrostatics framework. Although for dsDNA the rigid rod electrostatic model is an excellent approximation of the atomic level reality, its appropriateness for describing the interactions of single stranded nucleic acids is less obvious38,40,62. Specifically, given the high conformational flexibility of the disordered polyelectrolyte chain, it is not clear that the charged groups in the molecule could be reasonably expected to lie on the surface of a rectilinear rod as suggested in the simplest models of polyelectrolyte theory40,72,73. Interestingly however, a rigid rod model of ssDNA has been shown to be sufficient in PB modelling of the ion excess as measured by AES experiments62.We calculated the effective charge of uniformly charged rigid rods of radii \(r\) = 0.05 and 0.4 nm using our continuum electrostatics calculation framework (see “Methods”). At the lower limit the rod radius approximates a line charge, whilst the upper radius limit may be thought to incorporate the excluded volume due to the finite size of the ssNA backbone atoms as well as that of a hydrated cation (Fig. 5A). We have previously shown that accounting for the excluded volume due to ions and water of hydration at the molecular interface in this manner permits us to incorporate the role of finite ion size in a point-ion based PB model34. Previous studies on the generation of molecular surfaces for PB calculations suggested that a value of \(w\approx\) 0.2 nm produces an ion accessible surface for an atomistic structure that successfully incorporates the role of the finite size of a hydrated ion34. Here, \(w\) is a parameter which represents a thickness to the molecular surface in addition to that represented by the vdW surface. Together with the radius of the backbone atoms of \(\approx\) 0.2 nm we may therefore expect a rod of radius \(r\approx\) 0.4 nm to provide a reasonable description of a ssNA molecule within the rod model. We calculated the effective charge, \({q}_{\text{calc}}\), for rods of length \(l={n}_{\text{b}}b\) carrying a total charge of \({q}_{\text{str}}=-\left({n}_{\text{b}}+1\right) e\), where \(b\) is the axial charge spacing (Fig. 5A,C). We focused on rod lengths corresponding to \({n}_{\text{b}}=60\) as this represents the most highly charge renormalizing regime in our experiments. An estimate of the renormalized charge of two ATTO dye molecules was added to the effective charge value calculated for a rod representing a label-free 60 base DNA in order to facilitate comparison with experiment (see “Methods” and Fig. S3). We determined \({q}_{\text{calc}}\) values for rods of various axial spacings \(b\) ranging from 0.35 to 0.65 nm. Setting \({q}_{\text{calc}}={q}_{\text{eff}}\) in the obtained \({q}_{\text{calc}}\) vs \(b\) relationships then yielded an estimate of the value of \(b\) corresponding to each of the two values of rod radius for each molecular species (Fig. 5B).Fig. 5The rigid rod model for a single stranded nucleic acid. (A) Schematic depiction of a flexible ssNA backbone contour consisting of phosphate sites (pink spheres) occupying the center of a tube of finite diameter that represents the volume excluded to the center of mass of an ion. A further thickness of \(w\approx\) 0.2 nm produces an ion accessible surface that successfully incorporates the role of the finite size of a hydrated cation. The equivalent rigid rod is depicted as a grey vertical cylinder with axial inter-charge spacing \(b\). (B) Calculated values of effective charge \({|q}_{\text{calc}}|\) vs axial inter-charge spacing \(b\) for the rigid rod models of radius \(r=\) 0.05 and 0.4 nm and length \(l={n}_{\text{b}}b\). ETe measurements for \({n}_{\text{b}}=\) 60 shown as horizontal coloured bands for sequences poly-dT (blue), -dA (red), mixed DNA sequence (purple) and -rU (orange). Inferred \(b\) values are \({b}_{\text{T}}\approx 0.61-0.71\) nm, and \({b}_{\text{A}}\approx 0.48-0.59\) nm for poly-dT and -dA based on the present range of of \(r\) values (downward pointing arrows). Linear fit equations to the data corresponding to \(r=\) 0.05 and 0.4 nm in the ranges of interest are \({|q}_{\text{calc}}| =44.2 b+13.22\) and \({|q}_{\text{calc}}| =37.36 b+21.25\) respectively. (C) Electrostatic potential distribution obtained by solving the PB equation for a representative oxDNA poly-dT structure (\({n}_{\text{b}}\)= 60) converted into an all-atom model with a width parameter of \(w=\) 0.2 nm (left). The polymer is characterised by an end-to-end distance \(R=22.5\) nm and contour length \({l}_{c}=\) 36 nm. The effective charge of this polymer is matched by an equivalent rod model with dimensions \(r\) = 0.4 nm and \(b\) = 0.5 nm (right). Electrostatic potential contours are shown at \(\phi =\) − 5, − 2.5, − 1.25, − 0.625 and − 0.3125 \({k}_{\text{B}}T/e\). The average electrical potential on the surfaces of the poly-dT structure and rod model are − 5.295 and − 6.58 \({k}_{\text{B}}T/e\) respectively.At the upper limit of rod radius, \(r=\) 0.4 nm, we obtained inferred values of \(b=\) 0.61 nm for poly-dT and \(b\approx\) 0.5 nm for poly-dA and poly-rU (Fig. 5B). These values are comparable with the values of contour length per base, \({b}_{\text{c}}\), obtained for poly-dT and poly-rU in SAXS experiments on 40 base ssNAs of about 0.6 and 0.5 nm respectively24. Specifically, SAXS measurements report \({b}_{\text{c}}=\) 0.56 nm and 0.49 nm for poly-dT and poly-rU respectively (see Table 1) and this trend is in good agreement with our inferences of axial charge spacing based on the rod model. However, we in fact expect a quantitative disparity between \(b\) and \({b}_{\text{c}}\) since the rigid rod model effectively replaces a curvilinear chain representing a flexible polyelectrolyte by a rod with a single rectilinear axis and a projected axial charge spacing, \(b\) (Fig. 5A). For a disordered single-stranded polynucleotide structure therefore, \(b\) is to be interpreted as the mean charge spacing along the rectilinear axis which is expected to be smaller than \({b}_{\text{c}}\)72,73,74 (Fig. 5A,C; see Supplementary Information Sect. S6 for further detail). The fact that our present results indicate \({b\approx b}_{\text{c}}\) may in fact point to a current overestimate of measured \({q}_{\text{eff}}\), as encountered and previously discussed in the comparison of the experimental \({q}_{\text{eff}}\) with \({q}_{\text{calc}}\) from MD models.Table 1 Left: tabulated values of the contour length per base, \({b}_{\text{c}}\), measured experimentally with techniques such as SAXS (*—Ref.24), AFM (†—Ref.98, ‡—Ref.99), transient electric birefringence (TEB, Ref.100), fluorescence correlation spectroscopy data in combination with a mean field theory (FCS, Ref.75) and from X-ray diffraction of ssDNA-protein complexes (XRD, Ref.26). ETe (and ASAXS in one case, Δ—Ref.43) measures a charge renormalization factor, \(\eta\) (bottom rows); right: \(\eta\) values for \({n}_{\text{b}}=60\), the most highly charge renormalizing regime, can be mapped onto a value of the axial base spacing, \(b\), using various polyelectrolyte models, such as the rigid rod model discussed in the main text, or by means of a theoretical model such as that in Ref.40 (pink shaded rows).For the simple line charge description (\(r=\) 0.05 nm) in turn, we inferred \(b\) values of 0.71, 0.65, 0.63 and 0.59 nm for poly-dT, mixed, -rU & -dA sequences respectively (Fig. 5B). These axial charge spacings are higher than most estimates of \({b}_{\text{c}}\) (see Table 1), but do appear to find agreement with some experimental values obtained for the contour length per base for short ssNA oligomers in ssNA-protein complexes as measured by X-ray diffraction (XRD, Ref.26), and inferred for long ssNAs from fluorescence correlation spectroscopy data (FCS, Ref.75). Thus it appears that the rod model may not only provide a meaningful description of the electrostatics, but in doing so appears to be able to quantitatively capture an indication of the interphosphate spacing along the backbone contour, reflecting the essential impact of charge density on the electrostatics problem for ssNAs. This simplified view of the problem may therefore furnish a relatively straightforward modelling framework within which to interpret and parametrise differences in local structure as reflected in charge spacing in disordered biomolecules in solution.

Hot Topics

Related Articles