Understanding epistatic networks in the B1 β-lactamases through coevolutionary statistical modeling and deep mutational scanning

Co-evolutionary model of the B1 family reveals sequence-specific mutational heterogeneitiesClass B1 metallo-β-lactamases are a highly diversified family of enzymes (as low as 20% identity within the family) that degrade β-lactam antibiotics1,41,42, a function for which there is likely a long evolutionary history36,43. The homologs in this family have become highly diversified whilst retaining common functions (e.g., resistance to the same antibiotics1), making this family an excellent model for epistatic networks, as the different homologs likely formed different sets of intramolecular interactions throughout their evolution. We collected all B1 MBL sequences (5516 total, 4046 non-redundant) (Supplementary Data 1), from Interpro44, BLDB45, and the Joint Genomics Institute IMG/M metagenomics database46 (methods). Then, we inferred a global statistical model of the family using DCA (methods) that captures the coevolutionary relationships (direct couplings) between all residue pairs, and the residue conservation (fields). The model can assign a probability P(sequence)~exp{-E(sequence)} to any sequence, proportional to the likelihood of being a functional MBL. To score single mutations, we take the difference in the model’s “energy” (E) between the mutant at a position and the wild-type (WT), as ΔE = −log[P(mut, pos)/P(WT)]47. Thus, the effect of any amino acid substitutions in any background is computable, with sequence context accounted for via coevolutionary couplings.To probe the effects of epistatic networks across the B1 family, we first calculated ΔE for all single mutants of 100 diverse homologs (~70% sequence divergence, Supplementary Fig. 1a, methods). Then, for each individual position inside each homolog, we predict its mutational tolerance as the Shannon Entropy of all amino acid probabilities ~exp(−ΔE), also called context-dependent entropy (CDE)48,49 (Supplementary Data 2). This mutational tolerance is interpretable as the base-2 logarithm of the effective number of tolerated amino acids at a position in a specific WT background, where 0 means one (20, the WT) residue is tolerated, and 4.3 means all twenty (24.3) amino acids are equally tolerated. Since the CDE is calculated from probabilities of all twenty amino acids at a position, the mutational tolerance is independent of the starting WT residue, and any difference between homologs in the CDE at any given position is necessarily a result of epistasis. The resulting mutational tolerance at each aligned position of the 100 homologs is shown Fig. 1b. When examining positions across all homologs there are highly constrained regions in terms of mutations, but also others that are highly tolerant. Between individual sequences, the relationship between sequence divergence and mutational tolerance across all positions can be quantified (Supplementary Fig. 1b). As expected there is a decreasing trend for similarity in sequence-level mutational tolerance with increased divergence, with a notable sharp drop around ~70% divergence. At a per position level, we can also observe the specific differences in mutational tolerance across the 100 homologs (Supplementary Fig. 1c). Hence, this method allows us to identify specific patterns of mutational behaviors in a large number of homologs.Fig. 1: Family-wide residue mutability by direct coupling analysis (DCA).a Schematic of computational and experimental workflow. The B1 β-lactamase enzyme family is isolated through sequence space exploration via a sequence similarity network. The entire set of sequences is used to generate a co-evolutionary model via DCA. Two highly diversified homologs within the family are selected for DMS to generate a large mutational dataset. b Each square in the heat map is colored by the predicted mutational tolerance (measured as context-dependent entropy, CDE) for each position of the 100 aligned homologs. Blank cells represent alignment gaps. The distribution of CDEs by position is presented as box plots (N=number of aligned homologs at position, exact count in source data) on top: 0–100% as whiskers, 25–75% (IQR) as bars. The bars are colored by the median value, with the same color scale as the heatmap. The secondary structure of a representative (VIM-2, PDB ID:5YD7), as well as the active site residues (circles), are depicted under the heat map;. The maximum likelihood phylogenetic tree of the 100 homologs is shown on the left of the heat map; arrows represent β-sheets, bars represent α-helices. c Distributions of the per position spread in mutational tolerance across homologs, measured as the distance (in bits) of the IQR or max-min mutational tolerance. Source data are provided as a Source Data file.There is substantial heterogeneity in behaviors across the 100 homologs, shown by the variability (interquartile range, IQR) in mutational tolerance across homologs at each position (Fig. 1b, c). The considerable spread in mutational tolerance highlights that positions allowing few mutations in one homolog may show substantial mutability in other homologs. A median IQR of 0.87 means that half the positions have a > 1.8-fold (20.87 = 1.83) difference in effective number of tolerated amino acids between homologs. IQR can be as high as 2.51, indicating a 5.7-fold difference in effective amino-acid number. Furthermore, the difference in minimum and maximum mutational tolerance across positions has a median of 2.69 and a max of 3.79 (Fig. 1c), meaning half of the positions in the protein have a 6.5–13.8 fold difference between the most and the least mutationally tolerant sequences. This strong mutational heterogeneity of equivalent positions across homologs is a hallmark of epistasis, and is captured by the coevolutionary couplings in the DCA model.Heterogeneity in mutational behavior is supported by DMS data of NDM-1 and VIM-2To experimentally observe differential mutational effects between homologs, we measured all single-mutational effects on NDM-1 and VIM-2 (homologs of ~30% sequence identity) by DMS. We previously published the DMS dataset of VIM-25,50, and we performed DMS on NDM-1 in an identical manner (Fig. 2a). Briefly, all single amino acid mutants generated for NDM-1 were selected with three different antibiotics: ampicillin (AMP), cefotaxime (CTX), and meropenem (MEM). Plasmid DNA isolated after selection was deep sequenced, and we calculated the fitness conferred by each mutant relative to wtNDM-1 in Eq. (1) (Fig. 2a, methods, Supplementary Data 3). The two replicates show good correlation (R2 of 0.77–0.95) (Fig. 2b, Supplementary Fig. 2). Overall, the three antibiotics behave similarly, and thus, for the best comparison, we use datasets between NDM-1 and VIM-2 with similar selection pressures in AMP50 (Supplementary Fig. 3).Fig. 2: Overview of DMS for NDM-1 and structural similarity to VIM-2.a Workflow for DMS of NDM-1. b Correlation between replicates of the NDM-1 library selected at 256 µg/mL AMP. The R2 and P-value of a linear regression are shown at the top. c Comparisons of mutational tolerance at each aligned position for the NDM-1 and VIM-2 experiments. The R2 and line of best fit for a linear regression are shown. d Distribution of differences in mutational tolerance at the same aligned position in DMS or DCA between NDM-1 and VIM-2, or the difference in DCA between 100 random pairs of homologs. e Comparison of mutational tolerance between DMS and DCA for NDM-1 and VIM-2. The R2 and line of best fit for a linear regression are shown. f Comparison of the mutational tolerance difference between NDM-1 and VIM-2 at each position between DMS and DCA. The data is colored by the IQR of mutational tolerance across the 102 homologs (100 + NDM-1 and VIM-2), with the colors scaled to the distribution (median is the center). The R2 and line of best fit for a linear regression are shown. Source data are provided as a Source Data file.Similar to DCA, the DMS mutational tolerance of each position is calculated as the Shannon entropy of the normalized fitness of all measured mutants (methods). Mutational behaviors in equivalent protein positions are varied between the two homologs, and certain positions can accept only the WT residue in one homolog (entropy near 0) and almost all amino acids in the other homolog (entropy near 4) (Fig. 2c). Comparing DMS and DCA, we find strikingly similar distributions in the entropy differences between NDM-1 and VIM-2 (Fig. 2d). The median difference between homologs across positions is 0.45 in DMS (0.39 in DCA), meaning half of the positions have a > 1.37-fold difference in entropies, up to a maximum of 4.22 (18.7-fold). As a baseline, we also computed the CDE difference between 100 random pairs of homologs (Fig. 2d). We again found a similar distribution, suggesting a common statistical trend where certain positions exhibit similar tolerance (such as conserved sites) while others can greatly vary in mutation tolerance.We check agreement of DMS and DCA data on each specific position within either NDM-1 or VIM-2 (Fig. 2e). There is a significant correlation (\({R}^{2}\) ~ 0.55) between the DMS and DCA values for each of NDM-1 and VIM-2, showing that DCA captures sequence-specific trends to a good degree. We also compare the DMS mutational tolerance to the context independent entropy (CIE), the entropy calculated using just single position information in the MSA without coupling to other positions (Supplementary Fig. 4), and find the DCA mutational tolerance (which uses sequence context) to perform better (R2 for CIE ranges from 0.45 to 0.48). It is important to note, using the sequence context not only improves prediction of mutational tolerance over CIE, but also allows for quantifying sequence specific behaviors that would not be possible with CIE which ignores sequence context. However, DCA predicts fewer variable sites (entropy >3.5) compared to DMS, shown by the points below the diagonal in Fig. 2e. Indeed, it has been observed that DCA predicts a Gaussian-like distribution of mutational effects48, and tends to underestimate the number of neutral mutations which are often one peak in a bimodal distribution3,5,7. The discrepancy may arise since the DMS data only captures a single selection antibiotic, while DCA reflects broader evolutionary constraints. We then find that DCA can also capture the specific differences between homologs in DMS (Fig. 2f); the abundance of correctly predicted small differences weakens the correlation. When the variability (IQR) in DCA mutational tolerance across all 102 homologs (100 + NDM-1 and VIM-2) is overlaid on the plot, we see that positions with the lowest spread in all homologs tend to show lower differences between NDM-1 and VIM-2 in both DMS and DCA, as expected (Fig. 2f). Meanwhile, positions with large differences across all homologs are spread throughout. For positions with large differences in DCA or DMS, the varied behaviors of NDM-1 and VIM-2 match the global variation. Interestingly, there are positions with small differences between NDM-1 and VIM-2, but having high variability across all homologs, underscoring the fact that mutational behaviors can vary strongly between different homologs. Hence, the combination of methods can reveal and reinforce patterns that would not be obvious from just a single approach.Structural basis of mutational tolerance and incompatibilitiesWe investigate the relationship between mutational heterogeneity and structure of the B1 family, using crystal structures of NDM-1 (PDB ID:3SPU) and VIM-2 (5YD7) as representatives. The emphasis placed on structure is based on previous linear modeling of mutational effects on VIM-2, where structural features explained the largest amount of variation in the data, while other factors such as residue size or charge had little explanatory power for this system50. We superimpose the global variability (IQR) and median mutational tolerance of the 102 homologs on the structure (Fig. 3a). We produce equivalent figures using the local DMS (Fig. 3b) and DCA (Fig. 3c) data of NDM-1 and VIM-2, using variability (difference) and mean mutational tolerances. We see similar tendencies in the three figures. Buried regions in the protein core, including the active-site metal-binding residues, tend to have both low mutational tolerance and spread; likely being critical to folding or activity. The most exposed positions generally have high entropy and very low spread in both DMS and DCA, as these positions are highly mutationally tolerant. Finally, some buried residues have higher entropy in DMS than in DCA, which could be due to different residue and spatial arrangements specific to the two homologs compared to the global distribution of the family.Fig. 3: Structural basis of tolerance classifications.a All homologs mutational tolerance data overlaid on the crystal structure of VIM-2 (5YD7), with the thickness of the backbone representing the IQR in the 102 homologs, and colored by the median. b DMS mutational tolerance data overlaid on the crystal structure of VIM-2, with the thickness of the backbone representing the absolute difference between NDM-1 and VIM-2, and colored by the average mutational tolerance. c DCA mutational tolerance of VIM-2 and NDM-1 overlaid on the crystal structure of VIM-2, with the thickness of the backbone representing the absolute difference, and colored by the average. d Scatter plot of the median mutational tolerance values of 102 homologs versus the average ASA of VIM-2 and NDM-1, with positions colored by the mutational tolerance IQR. e Scatter plot of the average DMS mutational tolerance of NDM-1 and VIM-2 DCA CDE versus the average ASA, with the positions colored by the difference in mutational tolerance between NDM-1 and VIM-2. f Same as panel e but for the DCA predictions: scatter plot of the average mutational tolerance of VIM-2 and NDM-1 versus their average ASA, colored by the mutational tolerance differences. In d–f, the ρ2 and p-value of a Spearman correlation between x and y variables are shown. Source data are provided as a Source Data file.To quantify the role of structure, we use the average accessible surface area (ASA) of NDM-1 and VIM-2. Absolute ASA from the NDM-1 and VIM-2 structures measured in PyMol are normalized to an ASA reference set51 before averaging. There is significant correlation between average ASA and the site-specific mutability, evident in both experimental data and model predictions (Fig. 3d–f). Correlation between ASA and mutability is commonly observed3,7,52,53,54, likely as internally situated residues have higher prevalence of structural interactions, amplifying the potential for mutations with adverse effects. Furthermore, we used the ASA as a structural variable to distinguish three classes of residues: very buried (ASA < 0.1), partially exposed (0.1 ≤ ASA < 0.7), and very exposed (ASA ≥ 0.7) (Supplementary Fig. 5). Very buried residues are more mutationally constrained, with low entropy and low spread, where some positions also exhibit large mutational variability between NDM-1 and VIM-2 (Fig. 3e, f). Meanwhile, the very exposed positions typically display high mutational entropy and low spread, displaying a near absence of mutational heterogeneity in VIM-2 and NDM-1. Most importantly, we find an intermediate region of partially exposed residues showing some residues with very high spread in mutational tolerance in both DMS and DCA (Fig. 3d–f). In terms of overall distribution, the partially exposed region tends to have greater mutational heterogeneity in DCA compared to the buried region (1-sided Brunner–Munzel test, p-value < 0.05), but the buried region tends to have greater mutational heterogeneity in DMS compared to the partially exposed region (Supplementary Fig. 5a, b). Again, this appears to be a difference between global and homolog specific effects, as the differences of DMS mutational tolerance between NDM-1 and VIM-2 specifically is much lower for many positions than with the 100 other homologs (also reflected in the DCA distributions comparing just NDM-1 and VIM-2, Supplementary Fig. 5c). Regardless, the positions with the strongest mutational heterogeneity are always in the partially buried regions.These observations are possibly due to more intramolecular interactions around buried residues, whereby a rich intramolecular network reduces the mutability of residues, but also leads to homolog-specific differences of such constraints. However, it is the partially exposed region that exhibits the largest variability in behaviors for all datasets. Possibly, this range corresponds to positions with more mutational freedom than fully buried positions, while still forming interactions with other residues. The outcome of a mutation is therefore strongly dependent on the sequence context, i.e., homolog-specific. The large mutational variability of the partially exposed region is common to all analyses (Fig. 3d–f), affirming the dependence between epistatic networks and the structure.Probing the influence of epistatic networks on specific mutationsTo characterize the epistatic networks and reveal the interdependencies between residues, we compare the effects of individual variants in the DMS of NDM-1 and VIM-2. Variants can be compared directly for positions where VIM-2 and NDM-1 have the same WT amino acid (81 positions), referred to as ‘shared’ positions. However, at ‘differing’ positions with different WT residues (136 positions), the difference in mutational starting point prevents a direct comparison; unlike mutational tolerance, which already accounts for all mutational states (including WT), individual mutational effects relative to the WT need to consider the specific WT residue. We hence analyze mutations in ‘shared’ and ‘differing’ positions using different schemes (Fig. 4a, methods). At differing positions, we first check the effect of swapping WT residues at each position between NDM-1 and VIM-2 (WT reversions, 132 comparisons and 4 reversions not observed), and if the swap is neutral in both homologs (classified as ‘compatible’, total of 50 positions) we treat them like the ‘shared’ positions (131 total including ‘compatible’ positions) (Supplementary Data 4). Between NDM-1 and VIM-2, individual mutational effects are very different (Fig. 4b). To determine if a mutation is epistatic between backgrounds, we conduct z-tests of the difference in fitness (NDM minus VIM) against a null distribution with the same standard deviation as the synonymous variants distribution of NDM-1 (greater standard deviation of the two backgrounds). We find ~28% of individual mutations in the shared positions are epistatic and that differences are not strongly biased in sign or effect size toward a single homolog (Fig. 4c). We compute the fraction of epistatic mutations per position as Nmut epistatic/Ntotal observed, revealing widespread epistasis across the entire structure (Fig. 4d). The median fraction of epistatic mutations at each position is 0.21, meaning that half of the positions have significant epistasis in at least ~4 mutations (0.21 ×19 missense a.a.). Notably, strongly epistatic positions are enriched in partially exposed positions (Fig. 4e), as previously noted (Fig. 3d–f). In the case of individual mutations, the partially exposed distribution tends to have the higher epistatic effects relative to both buried or exposed regions (1-sided Brunner–Munzel test, p-value < 0.05, Supplementary Fig. 6a), in addition to containing some of the highest epistatic effects.Fig. 4: Residue level epistasis between NDM-1 and VIM-2.a Flowchart of epistasis classification. b Correlation of DMS data between NDM-1 and VIM-2 at shared and compatible positions. Dashed lines highlight range of neutral fitness effects (1.96xstandard deviation (SD) of synonymous variants distribution of respective homolog, diagonal uses y-range), centered on 0 (axes) or the 1:1 line (diagonal). The fit line (linear correlation) is shown in black, with R2 and p-value displayed. c Distribution of fitness effect differences between NDM-1 and VIM-2 at shared and compatible positions. Dashed lines represents the range of neutral fitness around 0 (1.96 × SD of synonymous variants for NDM-1). d Fraction of epistatic mutations (Nmut epistatic/Ntotal observed) by position overlaid on the VIM-2 structure (shared and compatible positions, others transparent), colored according to the lower right (with distribution). Thickness of the structure corresponds to the average ASA of the NDM-1 and VIM-2 crystal structures. e Plot of the fraction of epistatic mutations at each position versus the ASA. Only positions highlighted in panel d are included. f Scatter plot of fitness effects for reverting VIM-2 WT to NDM-1 WT (x-axis) and NDM-1 WT to VIM-2 WT (y-axis) for equivalent positions. The vertical dashed line indicates the lower bound of the region of neutral effects for VIM-2 based on the synonymous variant distribution, and the horizontal dashed line shows the lower bound of the region of neutral effects for NDM-1. Quadrants with different behavioral classes are colored as in a. g Differing WT positions between NDM-1 and VIM-2, colored by entrenchment class. The structure thickness corresponds to average ASA and regions outside the classification are transparent. h Scatter plot of ΔE for reverting VIM-2 WT to NDM-1 WT (x-axis) and NDM-1 WT to VIM-2 WT (y-axis) for equivalent positions, colored as in a. The dashed line represents the expected behavior without epistasis. i Bar plot showing the relative fraction of points in each epistatic class at various distances from the diagonal of h. The total number of points in each distance bin is shown on top. Source data are provided as a Source Data file.For differing positions, swapping WT residues between NDM-1 and VIM-2 is deleterious at 82 positions. We classify the positions as ‘1-WT entrenched’ if the swap is incompatible in one background (43 positions) and ‘2-WT entrenched’ if the mutation is incompatible in both backgrounds (39 positions) (Fig. 4f). We use the term ‘entrenchment’ to describe the inability to access a certain amino acid in the presence of other mutations that have already taken place, when that same amino acid is viable in another context; i.e., a residue becomes entrenched after accruing a certain amount of mutations relative to the other background. We note that this is entrenchment based strictly on a hypothetical mutational path between NDM-1 and VIM-2, and is an aggregate of all final mutational interactions regardless of the exact path taken. Again, we see pervasive epistasis, where 42–49% of WT-swapping mutations (65/132 in NDM-1, 56/132 in VIM-2) lead to a significant loss in fitness. Structurally, entrenched positions are scattered across the whole protein (Fig. 4g) with a subtle bias for 1-WT entrenched positions to be less buried than 2-WT entrenched positions (Supplementary Fig 6b). To further explore the role and meaning of entrenched positions, we use DCA to calculate ΔE of swapping entrenched WT residues between NDM-1 and VIM-2 (Fig. 4h). Similar to DMS, mutations concentrate around the neutral center, or populate the lower half-plane defined by the diagonal. As expected, almost no points populate the upper half-plane where the swap of both WTs would result in higher fitness. Without epistasis, the mutations A − > B and B − > A in arbitrary backgrounds should have opposite signs, exhibiting a perfect anti-correlation along the diagonal in Fig. 4h. The distance away from the diagonal is the difference in mutational effect in the two sequence contexts, and thus a rigorous signature of epistasis. We verify coherence between DCA and DMS by checking the fraction of mutations from each entrenchment class at different distances from the diagonal (Fig. 4i), and we do see 2-WT entrenched mutations enriched further from the diagonal, while those closest to the diagonal are mostly compatible.Once again, the model and the experiment complement each other, both confirming epistatic interactions to be pervasive, and associated with structure. Moreover, DCA proves to be quite accurate in predicting epistasis of specific mutants, as all the most distant points from the diagonal are either 1- or 2-WT entrenched according to the experiments (Fig. 4i).Epistatic positions highlight complexity of interaction networksOur data provides the opportunity to dissect specific epistatic interactions in the homologs, as 2-WT entrenched positions may indicate the residues participate in specific interactions. We sought these interactions within the proteins through pairs of positions that are in close proximity, and are both 2-WT entrenched (Supplementary Fig. 7a). One example of shape complementarity can be found between NDM-1 positions 125 and 249 (VIM-2 positions 119 and 239) (Fig. 5a), under the active-site zinc ions. In VIM-2, the Arg at position 125 is paired with Gly at position 249. In contrast, NDM bears a smaller but still positively charged Lys at position 125, which is now neighbored by a larger Ser at position 249. To test if these positions possess significant interactions, we measured the phenotype (EC50 against AMP) of the single and double mutants of each position pair in the NDM-1 background, mutated to the VIM-2 WT residues (Fig. 4c). As VIM-2’s WT residues are deleterious in NDM-1 as single mutants, we expect that mutating both positions may lead to compensation, generating a less negative effect. We also test pairings in the L3 active site loop (NDM-1 positions 67 + 68), L10 active site loop (218 + 266, 211 + 229) and some buried positions beneath the L10 loop (197 + 204, and a triplet of 204 + 246 + 259) (Fig. 5c); the triplet was tested in pairs and in triplet. Combined, our selection of positions covers a variety of positions, with different biochemical properties (size, polarity, charge), different solvent accessibilities and different variability in mutational behaviors across the whole family (Fig. 5c, d).Fig. 5: Testing interactions of entrenched positions in NDM-1.a Example of potentially interacting entrenched WT positions in the crystal structures of NDM-1 and VIM-2. b Experimental scheme for testing single or combined mutational effects in the NDM-1 background. c Entrenched WT positions that were chosen for testing of epistatic interactions. Positions with the same color are mutated together to test for compensation of entrenchment; A204L overlaps 2 sets and is also tested with G192Y (red). d Plot of IQR in mutational tolerance across 102 homologs and the average ASA of NDM-1 and VIM-2 structures, with the tested positions highlighted. Tested combinations are shown as lines. e Scatterplot of DCA energy change of all selected double mutants, with the expected additive single mutant effects in the x-axis, and the observed double mutant effects in the y-axis. Dashed line indicates a 1:1 correlation. f Scatterplot of all tested double (1 triple, purple) mutants, with the expected additive single mutant effects in the x-axis, and the observed double mutant effects in the y-axis. Effects are calculated as fold change relative to wtNDM-1. The line of best fit for a linear correlation is shown in black, with the R2 and p-value displayed. Source data are provided as a Source Data file.The single mutant effects confirm the entrenchment observed from the DMS fitness scores, as all selected WT residues in VIM-2 are deleterious in the NDM-1 background, evenly spanning a wide range of deleterious effects (Supplementary Fig. 7b, Supplementary Data 5). The EC50 and the fitness score form a sigmoidal relationship, consistent with previous observations5. However, when we tested the mutants in combination, we did not observe any significant compensation (Fig. 5f). The log-additive effects of the single mutants (null model for no epistasis) show a distinctly linear correlation with the observed double mutant effects (R2 of 0.85). It appears that entrenched positions cannot be swapped simply by mutating other nearby entrenched residues. This scenario is confirmed by DCA, where larger epistatic effects can only arise as a cumulation of small contributions from many epistatically coupled positions (Fig. 5e). Previous experiments have also shown that direct compensation by nearby residues is expected to be rare13, and compensation often requires more than two mutational steps55. The double mutation effects are therefore additive in DCA (Fig. 5e) and in the experiment (Fig. 5f). Notably, the positions are not globally independent in their effect, as these mutations are fixed in VIM-2 and the deleterious effects have been compensated for during evolution. Thus, these experiments suggest the intramolecular network of residue interactions to be complex, and even entrenched residues are not themselves sufficient to dictate the epistatic networks.

Hot Topics

Related Articles