Evolutionary sequence and structural basis for the distinct conformational landscapes of Tyr and Ser/Thr kinases

Conformational preferences predicted by FEP match Potts sequence-based predictionsPotts statistical energy models can be used to compare the relative stability of individual conformational basins that comprise the native ensemble for individual sequences15,40. The sequence-based statistical energy predictions of relative conformational stability can then be directly compared with thermodynamic observables calculated from all-atom MD free energy simulations. Here we describe and validate the consistency of these two distinct approaches, which we leverage to investigate the sequence and structural basis for the evolutionary divergence in an active → inactive conformational change for TKs versus STKs.Choosing kinases for targeted analysis of conformational differences between TKs and STKsThe following analysis required experimental structures of kinases which have both an active structure in the PDB where the activation loop is extended, and an inactive structure in a classical DFG-out conformation with an autoinhibitory folded activation loop (Fig. 1). For conciseness, we refer to these conformational states as active and inactive, respectively.Fig. 1: Overview of the kinase active and inactive conformations and sequence features that distinguish TKs (Tyr Kinases) from STKs (Ser/Thr Kinases).(top) Illustrations of the active and inactive conformation of the activation loop for a representative TK, INSR. The active conformation is characterized by three main structural motifs—the N-terminal anchor, RD-pocket, and C-terminal anchor. In the inactive conformation, the C-terminal anchor of TKs remains intact while the rest of the activation loop is folded up, with the DFG + 10 residue Y154 mimicking the binding mode of Tyr substrates. (bottom) Sequence logo visualization of key motifs in our MSA of kinase catalytic domains, where the vertical axes represent the raw residue frequency for STKs (blue) and TKs (orange) in our MSA, calculated separately. The residue numbering in this MSA is displayed on the horizontal axis. Only key motifs are displayed for clarity. The conserved triads HRD, DFG, and APE were used as reference points for a more general numbering scheme: for example, DFG + 10 refers to residue 154 in our alignment (written as 154DFG+10 in the main text) and is located ten residues C-terminal from the DFG motif, and the Gly of DFG is located at position 144 in our MSA. Sequence logos were plotted using Logomaker65.To analyze the evolutionary divergence in the active → inactive free energy landscapes of TKs versus STKs, three kinases were chosen with structures in both conformations: INSR which is a TK, and two STKs, BRAF and CDK6 which belong to the TKL and CDK families, respectively (Fig. 2, top). In addition to their locations on the phylogenetic tree, which are relevant to the evolutionary questions considered in this study, these kinases were chosen based on stringent structural criteria:

a.

The PDB contains at least one apo x-ray crystal structure in the inactive conformation, in addition to one or more in the active.

b.

The activation loop has been solved without missing coordinates in both the active and inactive conformations.

c.

The activation loop adopts an extended conformation in the active DFG-in state and a folded conformation in the inactive DFG-out state.

Fig. 2: Choosing kinase targets (top) to analyze differences in the conformational landscapes of TKs and STKs.CDKs are STKs belonging to the CMGC group8, which are distantly related to TKs. BRAF, an STK from the Tyrosine Kinase-Like (TKL) group, is more closely related to TKs. INSR is a typical receptor TK, with active and inactive conformations that are representative of both cytoplasmic and receptor TKs. Icons are displayed to the left to mark the divergence of kinase-containing taxa (from top to bottom—H. sapiens8 and porifera66 (animals), choanoflagellates, bacteria and archaea3). The Potentials of Mean Force (PMFs) shown on the right are artistic illustrations to help visualize the free-energy landscapes of TKs and TKs and STKs described previously15—while the barrier heights are unknown, the relative depths of the two basins at the end-states are accurately depicted. (bottom) The Potts DFG-out penalties (ΔT s) of 90 human TKs (receptor and non-receptor families) and 58 STKs (41 TKL kinases and 21 CDKs) were estimated by threading over structural ensembles of the active and inactive (DFG-out) conformations (see Methods), showing a bias for human TKs (orange) towards inactive relative to STKs (blue). The ΔTs for each kinase can be found in the Source Data file.The active, extended conformation (Fig. 1, top left) of the activation loop is characterized by three anchor points:

(i)

N-terminal anchor13—a structural motif formed by a pair of antiparallel β-strands at the activation loop N-terminus (β9: residues 145DFG+1 through 147DFG+3) and the N-terminal region of the catalytic loop (β6: residues 119HRD-3 through 121HRD). TKs and STKs have notable sequence differences in the β6 strand.

(ii)

RD-pocket13,14—a cluster of basic sidechains involving the activation loop (residues 147DFG+3, 154DFG+10 and/or 155DFG+11) and Arg of the conserved HRD motif (residue 123HRD). Residue 160 from the activation loop C-terminal region is tightly packed against R123HRD in STKs but decoupled from the RD-pocket in TKs.

(iii)

C-terminal anchor13—a structural motif characterized by non-covalent interactions between the C-terminus of the activation loop and the catalytic loop. The C-terminal anchor forms a binding surface for Ser, Thr, or Tyr substrates.

All three of these motifs (i–iii) may be disassembled in the inactive conformation where the activation loop is folded. However, TKs are distinct in that they maintain an intact C-terminal anchor in the inactive conformation. This allows a Tyr residue from the middle-segment of the activation loop (e.g., Y154DFG+10) to occupy the substrate binding site in cis17, a phenomenon we refer to as substrate mimicry (Fig. 1, right).Potts threaded-energy prediction of conformational preferenceNext we establish, based on Potts calculations, that TKs have an increased bias for the inactive state relative to STKs. For an aligned protein sequence (\(S\)), a corresponding protein family Potts model (in this case, the kinase superfamily) can be used to compare the stability of two conformational ensembles, e.g., active (labeled \({\mathbb{A}}\)) and inactive (labeled \({\mathbb{B}}\)), by threading that sequence’s statistical energy couplings \({J}_{{S}_{i}{S}_{j}}^{{ij}}\) over pairs of residues i and j which change their contact frequency \({c}^{{ij}}\) between the two ensembles,$$\Delta {T\left(S\right)}_{{\mathbb{A}}{\mathbb{\to }}{\mathbb{B}}} ={\left\langle T\left(S\right)\right\rangle }_{{\mathbb{B}}}-{\left\langle T\left(S\right)\right\rangle }_{{\mathbb{A}}}\\ =-{\sum _{i < j}^{L}}{J}_{{S}_{i}{S}_{j}}^{{ij}}\left({c}_{{\mathbb{A}}}^{{ij}}-{c}_{{\mathbb{B}}}^{{ij}}\right)=-{\sum _{i < j}^{L}}{J}_{{S}_{i}{S}_{j}}^{{ij}}\Delta {c}_{{\mathbb{A}}{\mathbb{\to }}{\mathbb{B}}}^{{ij}},$$
(1)
where \(S\) denotes a single kinase domain sequence aligned to a MSA of length L (number of columns). \(\Delta T\) can be considered a Potts statistical energy analog of the reorganization free-energy \(\Delta {G}_{{reorg}}\), a thermodynamic quantity that reflects the relative free-energy of the two conformational basins,$$\Delta {G}_{{reorg}}^{S}\left({\mathbb{A}}{{\to }}{\mathbb{B}}\right)=\Delta {G}_{u\to f}^{o}(S{{,}}\, {\mathbb{B}})-\Delta {G}_{u\to f}^{o}(S{{,}}\, {\mathbb{A}}).$$
(2)
In this expression \(\Delta {{G}_{{u} {\to} {f}}^{o}}{({S},\, {\mathbb{A}})}\) is the free energy change associated with the protein unfolded → protein folded transition where the definition of folded protein is restricted to a particular conformational basin \({\mathbb{A}}\) (active DFG-in, activation loop extended), and likewise for \(\Delta {G}_{u\to f}^{o}(S,\, {\mathbb{B}})\) which instead restricts the folded state definition to basin \({\mathbb{B}}\) (inactive DFG-out, activation loop folded).MD simulations of the three target kinases in their active and inactive states, CDK6, BRAF, and INSR, were used to calculate contact frequencies (see Methods), and these contact frequencies were used to thread homologous kinase sequences from the three families (human TKs, TKLs, and CDKs) using Eq. (1). Plotted in Fig. 2, the \(\Delta T\) distributions for these families recapitulate our previous observations over a much larger set of sequences15 that, on-average, TKs are intrinsically more balanced between the active → inactive conformational basins due to common features of their sequences that distinguish them from STKs (including TKLs and CDKs, among other groups). Specifically, we found that the average difference in Potts threaded energy scores between TKs and STKs is approximately 4 kcal/mol when scaled to physical free-energy units15. At this scale, the most populated bin in the \(\Delta T\) histogram of TKs from Fig. 2 (\(\Delta T\, \approx \, 1\)) is approximately 1.3 kcal/mol. This corresponds to a 1:10 ratio of inactive:active conformers in solution (10% inactive), consistent with experimental NMR populations for an isolated TK catalytic domain27 (7% inactive).Potts-generated library of mutational effects on conformational biasProteins can explore functional space by accruing mutations that shift the conformational equilibrium of the folded protein between different native-like conformations12,41,42. This can be caused by changes in non-covalent interactions involving the mutated residue and nearby sidechains in the two different protein environments (e.g., active vs inactive conformations) and changes in the physicochemical nature of these interactions upon mutation. In terms of Potts threaded energy, we define the net mutational effect on the conformational preference as \(\Delta \Delta T\),$$\Delta \Delta {T}_{{\mathbb{A}}{\mathbb{B}}}\left(w,\, m\right) =\Delta {T}_{{\mathbb{A}}{\mathbb{B}}}\left(m\right)-\Delta {T}_{{\mathbb{A}}{\mathbb{B}}}\left(w\right)\\ ={\sum _{i < j\,}^{L}}\left({J}_{{m}_{i}{m}_{j}}^{{ij}}-{J}_{{w}_{i}{w}_{j}}^{{ij}}\right)\Delta {c}_{{\mathbb{A}}{\mathbb{B}}}^{{ij}}\,,$$
(3)
where w and m denote two distinct sequences of length L which differ at one or more positions. Only positions i or j which are mutated in sequence m relative to sequence \(w\) contribute to \(\Delta \Delta T\) (the net shift in \(\Delta T\)), depending on how the threaded couplings of these positions with other residues in the protein are changed upon mutation. In this way, the Potts model can be used to scan for mutations which have significant effects on the conformational free-energy landscape.For our target TKs and STKs (CDK6, BRAF and INSR) we scanned for mutations in the Potts model that are predicted by Eq. 3 to induce the largest changes in conformational preference for the active vs inactive state (Fig. 3A) and find that the largest projected mutational effects are mostly observed in association with structural motifs that are important for distinguishing the functional landscapes of TKs vs STKs15. To predict mutations with the most significant effects on the active (\({\mathbb{A}}\)) \(\rightleftharpoons\) inactive (\({\mathbb{B}}\)) conformational equilibrium of TKs and STKs, \(\Delta \Delta T\) (Eq. 3) was evaluated for \(\approx {10}^{4}\) mutations of STKs and TKs, using CDK6, BRAF and INSR as representative systems.Fig. 3: Potts statistical energy ΔΔT s for the effects of mutations are consistent with corresponding ΔΔGreorg from FEP.A Results of scanning double and single mutations in the Potts model, plotted as a histogram of raw mutant ΔT s (\(\Delta {T}_{{mut}}=\Delta \Delta T+\Delta {T}_{{wt}}\)) for \( > {10}^{4}\) mutations from each kinase (see Methods). Mutations chosen for FEP simulations are marked with triangles and scored with vertical lines. (see Source Data for a full list of mutations chosen for FEP). B Thermodynamic cycle (left) which we used to calculate each of the 54 ΔΔGs in (A). The vertical legs represent the alchemical transformations performed in FEP simulations in the active basin \({\mathbb{A}}\) and the inactive basin \({\mathbb{B}}\), while the horizontal legs represent the physical free energy of reorganization between the two basins for wildtype (top) and mutant (bottom). C Plot of \(\varDelta \varDelta T\) calculated from the Potts model vs \(\varDelta \varDelta {G}_{{reorg}}\) for 54 mutations calculated from 108 FEP simulations in the active and inactive conformations (see Methods). The data for this plot can be found in Source Data. \(\varDelta \varDelta {G}_{{reorg}}\) and \(\varDelta \varDelta T\) share a sign convention; positive values indicate a shift in conformational stability towards the active conformation, and negative values indicate a shift towards the inactive conformation. Hysteresis errors calculated from the cycle closure of the double mutant cycles are displayed as error bars. The Mean Unsigned Error (MUE) of 1.68, displayed as a gray error band, was calculated by taking the average absolute difference between \(\varDelta \varDelta {G}_{{reorg}}\) (y-axis values) and the linear fit of \(\varDelta \varDelta T\) (x-axis values). Calculated using SciPy67, the 2-tailed P-value of 10-15 indicates a very low probability of observing a Pearson coefficient of this magnitude (r = 0.81) under the null hypothesis that the data was sampled from uncorrelated distributions.FEP predictions of mutational effects on conformational stabilityTo validate the Potts model predicted effects of these mutations, FEP simulations were employed using a thermodynamic cycle to calculate \(\Delta \Delta {G}_{{reorg}}\) (Fig. 3B)—a structural thermodynamic observable for which \(\Delta \Delta T\) is a sequence-based analog15. \(\Delta \Delta {G}_{{reorg}}\) represents the change in free-energy difference between active (\({\mathbb{A}}\)) and inactive (\({\mathbb{B}}\)) that occurs upon mutation, and can be calculated by alchemically mutating the target residue(s) in-place within the two conformational basins:$$\Delta \Delta {G}_{{reorg}}^{w\to m}({\mathbb{A}}{{\to }}{\mathbb{B}}) =\Delta {G}_{{reorg}}^{m}-\Delta {G}_{{reorg}}^{w}\\ =\Delta {G}_{w\to m}^{{\mathbb{B}}}-\Delta {G}_{w\to m}^{{\mathbb{A}}}.$$
(4)
In this notation, the subscript of ΔG indicates the free-energy change associated with an equilibrium process, e.g., protein conformational reorganization or mutation of a sidechain via alchemical FEP simulations, and the superscript identifies the protein sequence (wildtype or mutant) or protein conformation (basin \({\mathbb{A}}\) or \({\mathbb{B}}\)) that the process is performed on. It is technically possible to use the first line of Eq. (4) to calculate \(\Delta \Delta {G}_{{reorg}}\), but in practice the free-energy term \(\Delta {G}_{{reorg}}\) is very difficult (if not impossible) to converge from direct MD free-energy simulations of the activation loop reorganization along a physical path. When using the second line to calculate \(\Delta \Delta {G}_{{reorg}}\), only a structural perturbation of the sidechain in two different protein environments, basins \({\mathbb{A}}\) and \({\mathbb{B}}\), is required (Fig. 3B). Hence, the alchemical approach is more reliable and computationally efficient, provided the physical end states are stable at the simulated timescales or the appropriate restraint procedures are deployed37. Conformational restraints were not required for this study as the energy barriers separating active DFG-in with activation loop extended and inactive DFG-out with activation loop folded are estimated to be large27, preventing interconversion between the two states on the timescale of our simulations.Correspondence of sequence- and structure-based methodsBy calculating \(\Delta \Delta {G}_{{reorg}}\) for landscape-altering mutations and comparing with the sequence-based Potts model analog \(\Delta \Delta T\) (Fig. 3C), we were able to confirm that the structural basis for the divergent conformational landscapes of TKs and STKs originates from residues that are associated with their distinct substrate binding functionalities. As described in Methods, 56 mutations with the largest magnitude \(\Delta \Delta T\) s were selected for comparison with \(\Delta \Delta {G}_{{reorg}}\) from FEP simulations, resulting in 108 free-energy simulations for mutations of TKs and STKs in the active and inactive conformations. As shown in Fig. 3C, results from the sequence based/Potts and structure based/FEP approaches are highly consistent (Pearson correlation of 0.81 with p value ≈ 10-15).The slope of the linear fit can be considered an approximate conversion factor from statistical energy units to kcal/mol which from Fig. 3C is ≈1.3 kcal/mol, close to our previous estimate15. This suggests the Potts coupling terms in Eq. 3 are capturing physical information about the free-energy balance of sidechain interactions between two different conformational basins. This correspondence is noteworthy considering the Potts Hamiltonian is a strictly sequence-based information theoretic model, i.e., the potential function is not explicitly trained on structural information. The only structural information used in the Potts threading calculation is the definition of active and inactive macrostates from PDB or MD ensembles, and the determination of residue-residue contacts that break and form between the two ensembles (\(\Delta {c}^{{ij}}\), see Methods). It is the threaded energy terms from Eq. 3 that provide information about how the two distinct protein environments, active and inactive, respond differently to mutations.Confirming the structural basis for the evolutionarily divergent conformational landscapes of TKs vs STKsThe mutations with the largest effects on the active → inactive free-energy landscape from Fig. 3C tend to cluster in structural motifs in the active and inactive conformations that were previously identified to shape the distinctive landscapes of TKs compared with STKs due to sequence differences15. Mutations of these motifs can increase the bias in the free-energy landscape towards the active conformation, characteristic of STKs, or decrease the relative stability of the active conformation, a signature phenotype of TKs. We note that adoption of the active conformation is necessary but not sufficient for kinase activity, and the mutations identified to stabilize the structurally active conformation may not necessarily increase the catalytic activity of the kinase.In the active conformation, we find there are three main structural motifs which, due to average sequence differences between the two clades, broadly contribute to the divergent conformational landscapes of TKs vs STKs: the N-terminal anchor, RD-pocket, and C-terminal anchor. Here, we describe the structural basis for these effects on the stability of the active conformational basin, focusing on prominent examples of mutations which introduce a TK-prevalent residue into a wildtype STK sequence (CDK6), and test the effect of mutations at these locations using the Potts model and FEP. By comparison of observed residue positioning and physical interactions in PDB structures of TKs versus STKs, we establish structural rationales for the predicted mutational effects.The N-terminal anchor is destabilized in the active conformation in TKs relative to STKsThe N-terminal anchor (Fig. 4A) is a motif formed in the active-state by a pair of antiparallel β-strands involving the activation loop N-terminus (β9) and the stretch of three residues located N-terminal to the HRD motif (β6)13. The N-terminal anchor can be identified by sequence positions 120 (HRD-2) and 121 (HRD-1) in the β6 strand, and 146 (DFG + 2) through 148 (DFG + 4) in the β9 strand in our MSA numbering (see Supplementary Data 1 for equivalent PDB residue numbering). This strand pairing only occurs when the activation loop is extended as in the active conformation13,43.Fig. 4: Mutating STK residues to those found in TKs destabilizes the active conformation relative to inactive.A Top: surface representation of wildtype CDK6 (PDB: 1XO2 chain B) in the active conformation (see Methods), with the N-terminal anchor highlighted. Peptides from TK (light green) and STK (dark green) holoenzymes are superimposed for reference—STK holoenzymes rely on interactions between peptides and the N-terminal anchor, in contrast to TKs which bind peptides further away. Bottom: van der Waals (vdW) space-filling models of the CDK6 wildtype (left) and mutant (right) N-terminal anchor residue 146DFG+2 and its interaction partner 120HRD-2, showing the loss of favorable vdW contacts between the Cβ atom of 146DFG+2 and Cγ of 120HRD-2. Backbone hydrogen bonding patterns that define the β-strands are shown with dashed lines. B Top: same as A but with the RD-pocket highlighted. As before, peptides from TK and STK co-crystal structures are superimposed for reference. Bottom: vdW space-filling models of residue 160APE-7 in the wildtype (left) and mutant structure (right), located in the activation loop C-terminus, which stabilizes the RD-pocket in STKs. Small aliphatic residues like V160APE-7 in CDK6 pack favorably against the HRD-Arg, while bulky sidechains, e.g., K160APE-7 (seen in TKs), decouple from the RD-pocket and flip out in our MD free-energy simulations to interact with solvent (or peptides in the TK holoenzyme only).In STKs, the N-terminal anchor is located inside the cleft between the N-lobe and C-lobe where residues of the holoenzyme interact with bound peptides and protein substrates in co-crystal structures16 as illustrated in Fig. 4A. By contrast, substrates co-crystallized with TK holoenzymes adopt a different orientation which places residues of the bound substrate further away from the N-terminal anchor16. Consistent with this paradigm, our results based on MD free energy simulations and Potts calculations described next suggest the N-terminal anchor (β6-β9 pairing) evolved under weaker stability constraints in TKs compared with STKs. The β6 and β9 strands contain divergent sequence features which, as shown in TKs are relatively unstable compared with STKs.Cross-strand sidechain interactions between position 120HRD-2 in the β6 strand and 146DFG+2 in the β9 strand are diverged between TKs and STKs due to sequence differences15; 77% of STKs in our MSA have, simultaneously, a branched aliphatic sidechain at 120HRD-2 (Val or Ile) and a small unbranched sidechain at 146DFG+2 (Ala or Ser), whilst fewer than half of TKs (39%) contain these residue pairs. Many TKs (23%) instead have a large aromatic sidechain (Phe) at β6/HRD-2, which by contrast is very rare for STKs (4%) and predicted by the Potts model to destabilize the N-terminal anchor. Specifically, when using the Potts model to mutate the β6-strand from V120HRD-2 to Phe in CDK6, or I120HRD-2 to Phe in BRAF, there is a large predicted shift in conformational equilibrium that results from destabilizing the N-terminal anchor (\(\Delta \Delta T\approx -2\)). This appears to be a consequence of mutations abrogating favorable cross-strand contacts between the branched V120HRD-2 and unbranched A146DFG+2 or S146DFG+2 sidechains that are frequently present in wildtype STKs and display complementary packing between the Cβ atom of A146DFG+2 or S146DFG+2 and the branched Cγ atom of I120HRD-2 or V120HRD-2 (Fig. 4A). These complementary combinations of residue pairs are rare in TKs15, making their wildtype N-terminal anchors intrinsically less-stable relative to STKs. The Potts model predictions of these β6-β9 cross-strand interaction constraints were validated using FEP simulations; for CDK6 (an STK which contains A146DFG+2), mutating V120HRD-2 from a branched sidechain to Phe (V120F) abrogates favorable Cγ-Cβ sidechain interactions with A146DFG+2, destabilizing the N-terminal anchor in the active conformation by nearly 4 kcal/mol relative to the inactive conformation (\(\Delta \Delta {G}_{{reorg}}=-3.7\pm 0.1\) kcal/mol), consistent with the Potts threaded-energy predictions (ΔΔT ≈ −2). As a further validation of this selection rule, for INSR (a TK bearing F120HRD-2 and T146DFG+2 as the wildtype) we find that substituting F120HRD-2 with a branched sidechain, F120I, stabilizes the active conformation as observed for STKs (\(\Delta \Delta {{{\rm{G}}}}_{{{\rm{reorg}}}}=2.5\pm 0.2\) kcal/mol) only if the activation loop residue T146DFG+2 is first mutated to Ala (the STK-prevalent residue) to satisfy the interaction constraint described above, otherwise the Cγ atoms of wildtype T146DFG+2 and mutant F120I will clash, resulting in \(\Delta \Delta {G}_{{reorg}}\, \approx \, -0.8\pm 0.2\) kcal/mol.The RD-pocket is destabilized in the active conformation in TKs relative to STKsThe RD-pocket (Fig. 4B) plays a distinctive functional role in STKs compared with TKs. Similar to what is seen for the N-terminal anchor in STKs, the RD-pocket in the active conformation of STKs directly interfaces with co-crystallized substrates, in contrast to TKs which bind peptides in a different binding mode16. The RD-pocket is a dynamically assembled and positively charged pocket formed by a cluster of basic and hydrophobic sidechains originating from the HRD-Arg (R123HRD), the activation loop N-terminus (145DFG+1 through 147DFG+3) and C-terminal anchor (159APE-8 – 161APE-7). The αC-helix also contributes basic residues to the RD-pocket in some kinases. For both TKs and STKs, the net-charge of this pocket places an additional contingency on the stability of the active conformation relative to inactive that is controlled by the biological environment: when Tyr, Ser, or Thr residues in the activation loop (153DFG+9 – 155DFG+11) are phosphorylated by another kinase, the acidic/phosphorylated sidechain buries itself into this pocket in cis and stabilizes the active conformation14. For unphosphorylated kinases, this pocket can also be stabilized in trans by anions from the surrounding buffer as observed in our simulations (Fig. 4B).The apo, active conformation of STKs is stabilized by favorable packing between RD-pocket residues, namely those originating from the activation loop N-terminal and C-terminal residues (e.g., L145DFG+1 and V160APE-7 in CDK6: see Fig. 4B). This is enabled in-part by residue 160APE-7 which, in STKs, has a small aliphatic sidechain, allowing it to pack tightly against the sidechain of residue 145DFG+1 and R123HRD, stabilizing the RD-pocket. In TKs however, residue 160APE-7 located in the C-terminal anchor is usually a bulky and/or positively charged residue (Lys or Arg) with the sidechain flipped out from the RD-pocket and exposed to solvent, e.g, K400 in Abl kinase numbering (PDG: 2G2I). Using FEP simulations of CDK6 to perform these substitutions, V160KAPE-7 and V160RAPE-7 which are frequently observed in TKs, we find an (unfavorable) increase in the free energy of the active conformation relative to the inactive conformation, resulting in \(\Delta \Delta {G}_{{reorg}}=-1.6\pm 0.1\) and \(-2.6\pm 0.1\) kcal/mol, respectively. This appears due to the structural decoupling of residue 160APE-7 from the RD-pocket, in concordance with the Potts calculated \(\Delta \Delta T\) s (\(\Delta \Delta T=-2.3\) and \(-3.1\), respectively). These results suggest the RD-pocket is relatively less stable when TKs are in the active conformation compared with STKs.The C-terminal anchor stabilizes the active conformation in STKsThe third structural motif in the activation loop, the C-terminal anchor, is directly involved with the binding of substrates to the active conformation in trans for both STKs and TKs11,13 by providing a complementary surface that stabilizes catalytic interactions between the substrate hydroxyl and the conserved HRD-Asp residue (D124HRD) in the catalytic loop11. In STKs, this surface is formed in the active conformation by a conserved hydrogen bond between T162APE-5 (or S162APE-5) and the catalytic base K126HRD+2 (HRDxKxxN) (Fig. 5A).Fig. 5: Divergent features of the active → inactive conformational change for TKs vs STKs.Key residues are displayed with α-carbon spheres and sticks representation. A Active conformation of CDK6 (PDB: 1XO2), an STK. In the active state the activation loop (dark blue) is extended and the C-terminal anchor is formed by a hydrogen bond between K126HRD+2 in the catalytic loop and T162APE-5 in the activation loop C-terminus. The activation loop C-terminus is also anchored in-place via the stacked residues 161APE-7 and 166APE-6. B Viewing the inactive DFG-out activation loop of CDK6 (PDB: 1G3N)—a large rotation about T162 can be seen which distorts the C-terminal anchor and breaks the contact between 161 and 166. C Viewing the active holoenzyme of INSR (PDB: 3BU3) with a peptide substrate bound to the active kinase in the C-lobe binding mode. D Viewing the inactive activation loop of INSR (PDB: 3ETA). Unphosphorylated Y154 in the activation loops of TKs (light blue) acts as a pseudo-substrate, forming the same interactions as the substrate phosphoacceptor in (C). E Depiction of the active ↔ inactive landscape suggested by the Potts model and structural observations, for STKs (solid line) and TKs (dashed). The barrier heights are unknown and were drawn for the sake of illustration, while the relative depths of the active and inactive free energy basins were drawn descriptively, based on estimates of \(\Delta {G}_{{reorg}}\) (see ref. 15) and \(\Delta \Delta {G}_{{reorg}}\).When an STK is reorganized to the inactive DFG-out activation loop-folded conformation15 (Fig. 5B), this hydrogen bond is broken as in CDK6 (active PDB 1XO2:B, inactive PDB 1G3N:A). This is due to a large torsion of the activation loop backbone about 162APE-5, an effect which can be seen by comparing Fig. 5A and Fig. 5B. We previously found that breaking this hydrogen bond is suggested by the Potts model to come at a large free-energy cost15. Additionally, we found other sidechain interactions in the C-terminal anchor which are also broken upon reorganization of the activation loop in STKs; due to rotation of the backbone about 162APE-5, 161APE-6 is swung outwards in the inactive conformation (Fig. 5B) and breaks its contact with 166APE-1 which incurs an additional energetic penalty according to the Potts model15.The C-terminal anchor stabilizes both the active and inactive conformation in TKsThe C-terminal anchor of TKs is strikingly different from STKs; TKs have a conserved Pro at position 162APE-5 which rigidifies the local backbone and simultaneously provides a complementary surface in the active conformation for the binding of bulky Tyr sidechains in trans (Fig. 5C). In TKs, the catalytic base is typically R128HRD+4 in place of the STK-conserved K126HRD+2. Additionally, in TKs the sidechains of contact-pair 161APE-6 and 166APE-1 in the C-terminal anchor form part of the C-lobe binding site located underneath the activation loop where substrate peptides are found co-crystallized with TKs in the active conformation16.Unlike STKs, the C-terminal anchors of experimentally solved TKs in the inactive conformation are generally seen intact while the activation loop N-terminus and middle region are folded over (Fig. 5D), likely due to the enhanced rigidity of the C-terminal anchor in TKs. This allows the flexible middle-region of the activation loops to fold onto the C-terminal anchor in cis as a pseudosubstrate. The flexible middle-region of TK activation loops typically contain one or more Tyr residues, e.g., Y154DFG+10 which, in the inactive conformation, stacks onto P162APE-5 in the C-terminal anchor and interacts with the kinase’s own catalytic machinery17. For inactive INSR structures (e.g., PDB: 3ETA), cis pseudo substrate interactions are identified by hydrogen bonds between the Y154DFG+10 hydroxyl and D124HRD and R128HRD+4 in the catalytic loop (Fig. 5D). The stability of these interactions was corroborated at the sequence level using threaded energies from the Potts model which predict favorable interactions between Y154DFG+10, the TK catalytic base (R128HRD+4), and P162APE-5 in the C-terminal anchor. As described below, the sequence-based predictions were confirmed at the structural level using FEP simulations to mutate Y154DFG+10 to residues seen in STKs.STK activation loops are rarely observed with Tyr at the 154DFG+10 position and instead have a variable residue, Leu being the most frequent which appears in nearly 10% of STKs and only 3% of TKs. By contrast, Y154DFG+10 is highly prevalent in the activation loops of TKs (~30%). The mutation Y154L in INSR was predicted by the Potts model to result in one of the largest possible single-mutant shifts in stability away from the inactive conformation and towards the active conformation. This predicted effect was confirmed by performing FEP simulations in both the active and inactive conformations, resulting in \(\Delta \Delta {G}_{{reorg}}=3.3\pm 0.5\) kcal/mol which is consistent with the Potts-calculated \(\Delta \Delta T=1.1\) (Fig. 5E). A large fraction of this free-energy penalty to the inactive state may be due to the elimination of pseudo-substrate hydrogen bonds with D124HRD and R128HRD+4, as suggested by the result of mutating Y154DFG+10 to Phe (Y1162F in INSR numbering). Similar to Y154L, Y154F tilts the free energy balance of the active and inactive basins in favor of the active basin, consistent with experiments that report an increase in basal activity for this mutant17,39. These results validate the proposed regulatory role17 of unphosphorylated activation loop Tyr for optimizing the stability of the inactive substrate mimetic conformation commonly seen in x-ray diffraction23 and NMR structures27 of TKs.Mutating the regulatory spine alters the stability of the active conformation relative to inactiveThe regulatory spine or R-spine is a structurally conserved column of (typically) nonpolar, stacked sidechains located inside the kinase hydrophobic core that connects the N-lobe and C-lobe when the kinase is active44,45 and is thought to orchestrate correlated dynamics of the kinase domain which are required for catalysis46,47. The R-spine is disassembled upon displacement of the DFG motif from DFG-in to DFG-out, or the αC-helix from αC-in to αC-out, and in this way the R-spine can be stabilized or destabilized by allosteric signals from the αC-helix and activation loop. The R-spine itself is structurally conserved throughout eukaryotic kinase families and plays a common functional role in both TKs and STKs. Our results from the Potts model and free energy simulations described next confirm the regulatory role of this structural motif in both kinase families.In the active conformation, the DFG motif participates in the R-spine via F143DFG located at the base of the R-spine stack. Moving up the spine, the next residue belongs to the αC-helix located at position 53 in our MSA numbering, and it is usually a bulky sidechain (e.g., M53αC or L53αC) that packs against the aromatic ring of F143DFG in the active conformation. Several mutations of residues located in and around the R-spine were identified in our Potts mutational scan to result in large shifts in conformational preference between active and inactive: namely residues F143DFG and M53αC in INSR, and L53αC in BRAF. We calculated \(\Delta \Delta T\) and \(\Delta \Delta {G}_{{reorg}}\) for these mutations, the results for which are presented in Fig. 3C and reported in Source Data.DLG is the most commonly observed variation of the DFG motif, seen in ~10% of wildtype STKs and TKs, although its functional purpose is not well understood. The mutation DFG to DLG (F143LDFG) was predicted by the Potts model and confirmed with FEP simulations to have an activating effect in BRAF (\(\Delta \Delta {G}_{{reorg}}=0.9\pm 0.4\)) and a larger inactivating effect in INSR (\(\Delta \Delta {G}_{{reorg}}=-2.9\pm 0.4\) kcal/mol) whereby the active/assembled R-spine is stabilized in the former and destabilized in the latter. Analysis of Potts threaded energies in the wildtype and mutant (F143L) sequences provides a structural rationale for the differential effect of this mutation in INSR compared with BRAF in our FEP simulations (see Supplementary Fig. 2), which suggests the effect of the DLG substitution depends on the identity of residues surrounding the R-spine in the C-lobe. For INSR in the active DFG-in basin, Potts threaded energies predict favorable interactions between wildtype F143DFG and residue 56αC at the C-terminus of the αC-helix. Structurally, INSR has a bulky hydrophobic sidechain at this position (F56αC) which packs closely with F143DFG in the R-spine (Supplementary Fig. 2), and the F143LDFG mutation is predicted by the Potts model to weaken this interaction, thereby destabilizing the active conformation. In contrast, BRAF has a small Thr sidechain (T56αC) which does not interact with F143DFG as closely, consistent with the Potts threaded energy for this interaction in the active state, which is very weak, and therefore the F143LDFG mutation destabilizes the active conformation of INSR but not BRAF.

Hot Topics

Related Articles