Unmasking AlphaFold to integrate experiments and predictions in multimeric complexes

We test AF_unmasked on a series of cases derived from the PDB, from a dataset made of challenging multimeric targets from CASP15 as well as cryo-EM datasets of large protein complexes. First, we show how the multimeric template information is particularly useful in cases where the standard version of AlphaFold is unable to build the true complex. Furthermore, we show that even when imperfect templates are used, e.g. multimeric templates with clashing interfaces or missing parts, AF_unmasked improves on these inputs by remodelling parts or filling in the gaps by structural inpainting.Proof of conceptThe first question is whether AlphaFold is at all capable of using cross-chain information derived from multimeric templates to build protein complexes. After all, the neural network was not trained to take distances across chains in account when building assemblies. Therefore we perform a number of proof-of-concept tests on a common benchmark, i.e. the PDB benchmark set (see Methods).This benchmark set is made of heterodimeric structures that have not been used in the training of the latest version of Alphafold-Multimer (v2.3). First, we set a baseline by running the standard version of Alphafold-Multimer with its default template strategy and all parameters set to their default, thus producing 25 predictions per target.We then test twice AF_unmasked with different sets of templates. In this test, we use as templates in AF_unmasked the same deposited structures that we wish to predict (i.e. ideal templates). We perform the same prediction task twice, once with the default template strategy (Masked), once with the new template strategy (Unmasked). This simple test allows to assess whether cross-chain distance constraints from ideal templates can inform the prediction task. In order to assess the impact of including or excluding MSA information during the prediction task, we also run both combinations either while using the complete MSA or with MSAs that were clipped to include only the target sequences (i.e. deleting all evolutionary inputs).We score all predictions against the natives with DockQ15. In Fig. S1, we show the distributions of DockQ scores for all 251 targets in each test. For each target and test, we only score the prediction with highest ranking confidence predicted by AlphaFold. The Masked predictions are generally better than the Standard predictions both with or without evolutionary inputs (MSA). This should not surprise, as inputting ideal (monomeric) templates should help AlphaFold in finding the right arrangement of units. Interestingly though, disabling evolutionary inputs will make predictions worse as the cross-chain evolutionary coupling information is lost.Conversely, enabling our proposed template mechanism improves predictions, making them almost perfect when no evolutionary information is used. This demonstrates that cross-chain distance constraints from templates can inform the prediction task, and should be used whenever possible.When comparing Unmasked predictions done with/without evolutionary inputs, we notice an interesting pattern in the resulting scores, as seen in Fig. S2: while the predicted ranking confidence tends to be lower when the evolutionary inputs are missing, whenever the confidence increases the corresponding predictions are also better in quality. The fact that the confidence is lower in most cases is a hint that AlphaFold is not blindly trusting the templates, i.e. the predicted quality scores are not biased by the template inputs. On the other hand, those cases where the MSA input might be noisy, possibly due to lack of cross-chain evolutionary signal, the predicted quality score jumps up as soon as this input is eliminated, resulting in higher quality predictions as AlphaFold relies more on the templates. This means that the change in the neural network does not preclude the possibility to use AlphaFold’s predicted quality scores to sift out good predictions, even in comparison to those obtained from the standard implementation of AlphaFold-Multimer.Homology modellingSince ideal templates are often unavailable, we also assess AF_unmasked in the case where homologous templates that are at least somewhat informative are used instead. In this test we use the Homologous PDB set (see Methods) to produce predictions for 28 challenging targets where AlphaFold-Multimer cannot produce a correct top-ranked prediction. We follow the same testing protocol as described above, predicting from homologous templates with or without MSA information.In Fig. 2 we compare predictions on these hard targets from homologous templates (Unmasked-Homologs) to those from the previous test. These predictions show a marked improvement over the Standard and Masked predictions from the previous test, underlining the usefulness of providing AlphaFold with cross-chain information from homologous templates. Moreover, predictions from homologous templates are only slightly worse than those made from ideal templates (Unmasked, Fig. 2).Fig. 2: Box plot comparison of various template strategies when predicting a subset of the PDB set of heterodimeric complexes.Each box represents the inter-quartile range (IQR), with the median represented as a horizontal line. Whiskers extend to up to 1.5  × IQR beyond the box. Diamonds represent outlier samples. The subset in this test is made of heterodimers (n = 28) where good homologous templates could be found in the PDB and the predictions by AlphaFold-Multimer (Standard) are incorrect. We evaluate AF_unmasked on ideal, native templates without and with cross-chain restraints (Masked and Unmasked, respectively). Then we switch from ideal to homologous templates (Unmasked-Homologs). Only one the top-ranked prediction by ranking confidence, out of 25, is evaluated for each heterodimer. Though results are slightly worse than when providing an ideal template, the cross-chain information from homologous templates helps making better predictions than on Standard and Masked predictors.In Fig. S3 we compare the predictions generated with homologous templates against standard AlphaFold predictions and AF_unmasked predictions based on ideal templates. The figures show that there is no clear correlation between the quality of the predictions and the sequence identity between target and template sequences. This indicates that sequence similarity is not biasing the predictions, and that AF_unmasked is useful even when using remote homologs. Turning the evolutionary inputs off (no MSA predictions) on this dataset does not seem to have much of an impact on the quality of the predictions. We also test turning dropout on in this scenario, but given the limited number of predictions generated (only 25 per target) we see a small improvement on the overall quality that is not statistically significant.Using imperfect templatesNext, we test AF_unmasked on templates that are a coarse representation of a protein complex. This is a common scenario when performing molecular replacement or fitting densities in experiments, where users might generate predictions separately for unbound monomers and manually dock them according to the data. These are rough models that might include, e.g. clashes at the interface or loops that are incorrectly modelled. We want to test whether AlphaFold can generate a correct structure from such imperfect templates. We perform this test with CASP15 target H1142, an antigen-nanobody complex with stoichiometry A1B1. This type of complex is a good test case, as AlphaFold is relatively weak in modelling antigen-antibody interactions3, likely due to a lack of evolutionary coupling signal in the MSA16.We take AlphaFold predictions for each chain from incorrect models of the complex that were submitted by NBIS-AF2-multimer and position them in a roughly correct manner by superimposing each prediction to the corresponding unit in the experimental structure. The template obtained this way has DockQ score of 0.64 (medium quality) against the experimental structure. Since the chains were extracted from incorrect predictions, even when they are positioned correctly with respect to each other, the interface is incorrect: the RMSD of interfacial residues (iRMSD) is 2.1 Å and 12 interfacial residues are clashing according to DockQ (Fig. 3a). We input this imperfect template into AF_unmasked while clipping the MSAs to a single sequence and predict 500 structures for each test. Results show that, when using both intra-chain and cross-chain restraints (Unmasked), the clash is fixed but the interface is not perfect. In a second test, we turn off intra-chain restraints while keeping only cross-chain restraints active. This should allow AlphaFold to rearrange each monomer wherever necessary while keeping the distances between the chains within the boundaries of the template, which results in a more diverse set of predictions (Fig. 3c) and a much-improved interface. Lastly, we generate a third set of predictions by letting AF_unmasked automatically detect and delete clashing sets of residues in the template structures during the template preparation step (by setting the appropriate flag –inpaint_clashes). These deleted residues are regenerated (inpainted) during the prediction step. This seems the best approach, resulting in even more diverse predictions. Here, the prediction with highest ranking confidence is also the best overall model (Fig. 3b). Results for the whole set of 1500 predictions are shown in supplementary Fig. S4.Fig. 3: CASP15 target H1142 is an antibody-antigen complex.The template was obtained by superimposing unbound structures from CASP15 predictions onto the native to simulate an imperfect template. a In this case, some of the residues at the interface are clashing in the template. We test AF_unmasked either by feeding this imperfect template (a) or by deleting the clashing interfacial residues to let AF_unmasked inpaint them (b). Results show that (c) using both cross- and intra-chain restraints (Unmasked) from the imperfect template does not perform as well as using cross-chain restraints alone (Unmasked, cross-chain). The best overall strategy is to delete the clashes and perform inpainting (Unmasked, inpainting), which results in more extensive sampling of the space of conformations. Regardless of the strategy, the best model by ranking confidence was also the best model by DockQ.The correlation between AlphaFold’s ranking confidence and the prediction quality measures, already noticeable when using the DockQ score in Fig. 3b, is almost perfect when evaluating the RMSD of the interfacial residues (iRMSD) as calculated in DockQ (Fig. S5). Since the change in the interface is rather subtle between the original clashing template and the desired configuration, observing the iRMSD better highlights how AlphaFold is able to recognise correct interfaces and rank them accordingly. Inpainting the interface allows AlphaFold to explore more configurations and find the best possible, both by ranking confidence and overall quality.We also assess whether AF_unmasked can retrieve the correct conformation from imperfect templates obtained by perturbing the position of one chain with respect to the other. The perturbation is done by taking the native complex and running RosettaDock17 300 times with a dock perturbation flag so that the two monomers are randomly roto-translated from the initial position following a normal distribution centered at zero and with standard deviations of 5 Å for the translation and 11 degrees for the rotation ( −dock_pert 5 11 flag). This generates a set of templates of varying quality, depending on the magnitude of the random perturbation. We evaluate the initial quality of these artificially perturbed templates by DockQ score against the native conformation. We then run AF_unmasked 300 times by using a different perturbed template each time and score the best prediction, as ranked by AlphaFold’s predicted ranking confidence, with DockQ. Since with H1142 evolutionary information is not useful to make a good prediction, we clip the MSAs in this case as well and let AF_unmasked rely on the perturbed template alone. This will give an idea of how close to the correct conformation the template needs to be to get a good prediction. In Fig. S6a we show the results from this test by comparing the initial DockQ score for a perturbed template and that of the highest confidence model generated from that template. Each point is a perturbed template, colored by its quality if it were scored according to the Critical Assessment of PRedicted Interactions (CAPRI) criteria18. According to such criteria, the perturbed templates have qualities ranging from Incorrect to High. Results show that AF_unmasked is always able to take a template of medium quality (initial DockQ score ≥ 0.4919) or better and use it to generate a high quality (DockQ score ≥ 0.8) prediction. Predictably, as the template quality degrades, so does the quality of the predictions. Still, AF_unmasked generates high quality predictions for 129 out of 171 templates of Acceptable quality (initial DockQ score ≥ 0.23) and for 10 out of 60 templates of Incorrect quality. As perturbed templates get farther away from the right solution (template DockQ score < 0.19), AF_unmasked fails to generate good predictions. In Fig. S6b we show the template with lowest initial quality (template DockQ score: 0.19) where AF_unmasked could still predict the correct conformation (prediction DockQ score: 0.87).Inpainting of very large structuresA known limitation of AlphaFold is its capability to generate models for large proteins, mostly due to computational limitations in terms of GPU memory. This is, of course, a significant limitation as many interesting protein complexes are large.For example, CASP15 target H1111 is a 27-mer with stoichiometry A9B9C9 and 8460 amino acids in total. DeepMind, who modelled the complex post-CASP15 on a more efficient version (v2.3) of AlphaFold than was available to the public, could not perform the modelling of target H1111 in one go and assembled multiple structures with A3B3C3 stoichiometry instead by using a template (PDB ID: 7ALW) as guide20. Here, we show that it is possible to overcome this limitation with AF_unmasked while limiting the depth of the final MSA to a maximum of 512 total sequences.We use again the deposited structure itself (PDB ID: 7QIJ) as template. This is a hard prediction task, as the first 362 residues of the largest subunits (C9) from the membrane-bound domain in the complex are not in the deposited structure, so AlphaFold has to inpaint this gap leveraging the evolutionary information coming from the MSA inputs. We generated 25 structures following this protocol (results shown in Fig. S7), which takes around 10h of GPU time per predicted structure (NVIDIA A100, 80GB RAM), and select the top three models by ranking confidence. As we can see in Fig. 4, the portion of the structure that is covered by the template stays the same across the three models, while the inpainted membrane region (in green) appears in a variety of conformations, from closed to open. This is, to the best of our knowledge, the largest structure ever generated in one shot using AlphaFold.Fig. 4: CASP15 target H1111 is a very large complex (27 chains, 8460 amino acids) of a secretion export gate from Yersinia enterocolitica.We use the CASP15 native structure (PDB ID: 7QIJ) as partial template (bottom ring) to guide the assembly and let AlphaFold inpaint the trans-membrane region. The top three models by ranking confidence are all near-identical to the template in the area covered by it, while the trans-membrane region show diverse and potentially biologically relevant conformations: closed (a), intermediate (b), open (c).Predicting the impact of mutationsAlphaFold is not trained to predict the effect of mutations on the folding of a protein, and it cannot predict the impact of single-point mutations on protein stability21. This might be due to the fact that few mutations on a target sequence result in virtually identical MSAs being used as input, which might mislead the neural network into inferring incorrect restraints.For example, T1110o and T1109o are two closely related homodimeric CASP15 targets. They are, respectively, a wild-type and a mutant construct of Isocyanide hydratase, where the single-point mutation D183A causes a rearrangement of the C-terminus loops at the interface, as shown in Fig. 5a/b. We test whether AF_unmasked is capable of correctly switching between the two loop configurations by encouraging sampling around the region of interest. In order to do this, we use a structural template, obtained by looking for structural homologs in the PDB, where 20 residues in the loops in question are missing (PDB ID: 4K2H). The RMSD between the template and the native, excluding the loops, is 2.1 Å. The sequence identity between the target and template sequences is below 20%, so we align target and template structurally with TM-align. We clip the input MSAs to a single sequence, which means that AlphaFold should follow the template wherever possible, and attempt to model the loops ab initio since neither structural nor evolutionary inputs are given in that area of the structure.Fig. 5: CASP15 target T1110o is a homodimer of the isocyanide hydratase.a Target T1109o is a mutant of T1110o where a single-point mutation causes a rearrangement of the C-termini (b). The template was obtained by homology against the PDB, and among a set of candidates we selected a template where the C-termini loops were missing entirely (a, b). We utilise this template as is, the mapping between target and template amino acid sequences was performed by structural superposition between unbound models and the template with TM-align. Using this incomplete template allows AF_unmasked to perform sampling of a number of different loop conformations through inpainting. The top-ranking structures by confidence score show the correct loop arrangement both in T1109o (c, Unmasked) and T1110o (d, Unmasked) for mutant and wildtype sequences, while the default template strategy (Masked) tends to assign to the mutant the same arrangement as in the wildtype.Results show that for both T1110o (wildtype) and T1109o (mutant), AF_unmasked correctly arranges the loops in the model with highest ranking confidence (Fig. 5). In Fig. 5d, we compare DockQ scores for the top 10 T1109o models by ranking confidence against the models submitted by Wallner at CASP15. The top-ranked AF_unmasked model for the mutant is the best overall (DockQ: 0.804). We also test the default intra-chain constraint setting (i.e.: Masked) when using the same template and find that none of the top 5 models beat the new template strategy, while two correct models are in the top 10 (best DockQ: 0.803). Results for all predictions are shown in Fig. S8 and S9.This suggests that including cross-template information puts AlphaFold closer to the correct solution, thus allowing for better sampling of the remaining space of configurations.Since the changes in the structure are fairly subtle as the loops rearrange, we also show how AlphaFold’s average predicted LDDT scores (pLDDT) in the inpainted loop regions alone are highest in the models with highest quality both for T1109o and T1110o (Fig. S10 and S11). This confirms that in cases where the best loop arrangement is unclear, or where mutations might cause local variations in the structure, the inpainting procedure produces better models, provided that the increase in quality is reflected by the predicted quality scores in the areas of interest.In Fig. 5c, we compare the models obtained for T1110o on the new template strategy against those submitted by Wallner in CASP15 and see that even in this case, the top selected model is also the one with the highest DockQ (Fig. 5a). Two of the top Wallner models have slightly higher DockQ.Cryo-EM test casesIn each of the biological systems we used, Rubisco, ClpB and Neurofibromin, we aimed at inpainting missing regions and identifying areas of possible conformational variability. We interpreted the results based on previous biochemical, structural and biophysical knowledge. Tens to hundreds of models were generated with AF_unmasked for each of the cases analysed, but we only display some representative examples for ease of description. The full set of models obtained, along with those shown in the figures and discussion, are available in the supplementary material.RubiscoRubisco plays a crucial role in CO2 fixation, making it responsible for the majority of organic carbon in the biosphere. Understanding the function and control of Rubisco remains a significant area of research, with the aim of enhancing photosynthesis efficiency in agriculture and green biotechnology. The most prevalent form of Rubisco (Form I) comprises eight large and eight small subunits, and it exists in plants, algae, and other organisms. Although the active sites of Rubisco are situated in the large subunits, the expression of the small subunit regulates the size of the Rubisco pool in plants and can impact the overall catalytic efficiency of the Rubisco complex. For this reason, the small subunit is a potential target for bioengineering and biochemical studies have been performed to generate chimeras of large and small Rubisco subunits that could enhance Rubisco’s performance22.We use AF_unmasked to predict a chimera of Rubisco composed of large subunits from Arabidopsis thaliana23 and small subunits taken from another organism. We use a cryo-EM reconstruction obtained from own data (resolution: 2.06 Å) for this chimera molecule to assess the quality of predictions from the standard version of AlphaFold-Multimer (v2.3) and from AF_unmasked. We are particularly interested in how well AlphaFold can predict the small subunits, as the inner loops of the subunits were challenging to reconstruct from the experimental density. The standard predictions do not agree with the experimental data in the area of interest, as the inner loops appear in a tighter conformation when compared to the experimental model density map (Fig. 6a, area circled in yellow). So we attempt a modelling step with AF_unmasked to improve this prediction. In this test, we provide AF_unmasked with the deposited structure from Arabidopsis (PDB ID: 5IU0) as template and let AF_unmasked transfer the homologous information from the template onto the chimeric sequence. We also delete a stretch of 20 amino acids from the inner loops in the PDB template in order to let AF_unmasked inpaint this region. Results (Fig. 6a, right) show that the best AF_unmasked inpainted model by ranking confidence is closer to the experimental structure in the area of interest when compared to the standard AlphaFold-Multimer predictions. The small subunits predicted by AF_unmasked fit better within the EM density, with a Q-score of 0.8 (comparable to that of the experimental structure: 0.81) which is higher than that of the standard AlphaFold-Multimer prediction (0.41) and are a closer match to the experimental structure in the inpainted loop region (Fig. 6b).Fig. 6: Comparison of AF_unmasked and standard AlphaFold-Multimer predictions of chimeric rubisco protein.Flexible loops in the smaller subunit at the center of the complex have been inpainted with AF_unmasked. a Global superposition of best standard AlphaFold-Multimer (v2.3) prediction by ranking confidence on the experimental cryo-EM structure (left) and comparison with the best AF_unmasked prediction (right). The circled area highlights how the inner loops are predicted in a tighter and symmetrical conformation compared to the experimental model. The AF_unmasked model, where the same inner loops were inpainted, shows better agreement with the experimental model. b Comparison of predictions against the density obtained from cryo-EM data after optimisation of the superposition between predicted loops and one loop from the deposited model. The circled area shows a cross-section of one of the inner loops of interest. The resulting inpainted loop fits better within the density and is a closer match to the final refined model when compared to the standard prediction.We also assess if the average pLDDT for the stretch of inpainted residues in the loop correlates with the quality of the predicted small unit. In Fig. S12 we show that inpainted loops with higher pLDDT will coincide with predictions that have lower RMSD when superimposed with the best matching chain from the experimental model.ClpBThe bacterial chaperone ClpB is able to recover proteins from large aggregates and, together with the cognate DnaKJ system24, to refold them into their active form. ClpB plays a pivotal role in protein homeostasis of bacterial cells (reviewed in25,26). ClpB works by using ATP hydrolysis power to thread aggregates into a channel made upon oligomerisation of six identical copies27. The ClpB hexamer is therefore a very dynamic complex28,29 that needs to recognise and then move to unfold the aggregated substrate.A wealth of cryo-EM structural information about the ClpB hexamer is available30,31,32,33, but large and highly-mobile domains are poorly defined. There are over 30 cryo-EM structures of the hexameric ClpB and eukaryotic homologue Hsp104, plus several X-ray crystal structures of ClpB/Hsp104 monomers, deposited in the Protein Data Bank. The ClpB crystal structure with PDB ID 1QVR shows two different localisations of the N-terminal domain relative to the rest of ClpB body and this is in good agreement with a number of biochemical studies showing that the N-terminal domain is involved in engagement of the misfolded substrate that will be then threaded through the ClpB hexameric channel26. In few cryo-EM maps, out of the six N-terminal domains, only two or three are visible at resolution lower than the rest of the ClpB body33, thus indicating a high flexibility of this region. The dynamic nature of the N-terminal domains has been shown also for systems analogous to ClpB, such as ClpX, ClpC, ClpA, 26S proteasomes, VCP among others.We gave AF_unmasked a template derived from cryo-EM data (PDB ID: 5OG1) and performed inpainting of the missing regions. A total of 50 predictions were performed. Results show that, not only AF_unmasked could inpaint the missing N-terminal domains, but it also predicted them in multiple conformations within the same hexamer. This is different to AlphaFold-Multimer predictions, which are always highly symmetrical (Fig. S13a–b). Furthermore, the ranking confidence of AF_unmasked predictions correlates almost perfectly with the DockQ score of all interfaces in the hexamer (Fig. S14).Another flexible ClpB region is the coiled-coil M-domain, known to be important in the regulation of ClpB and in its interaction with DnaKJ34,35,36. This is present in most ClpB structures, at least partially, and mutational studies35,37 show that this domain can assume many different orientations and that such orientations are related to activation states of the ATPase. These orientations are reflected in the inpainted models, together with intermediate conformational steps, showing tilting of the long coiled-coil in ways that are plausible given the existing structural information, single molecule FRET-spectroscopy data37,38 and coarse-grained Molecular Dynamics (MD) studies35 (Fig. 7a–b).Fig. 7: Analysis of ClpB hexamer using AF_unmasked.a Given template and inpainted N-termini. N-termini are shown as surfaces while other domains are as cartoon. Each subunit of the hexamer is coloured differently. b Inpainting on the M-domains, shown as surfaces. The arrow shows the possible motion of the M-domain. c Inpainting of the interaction between ClpB and casein. The asterisk shows the newly predicted interaction area. d View of the hydrophobic regions of ClpB termini interacting with casein. e Models of ClpB and casein and relative confidence scores.To investigate the ability of predicting the interaction between ClpB and the commonly used substrate casein, we used a PDB template (PDB ID: 6RN3) where a stretch of amino acids from casein is engaged inside the ClpB pore. While most of 50 predictions generated this way failed to dock the casein inside ClpB (Fig. 7e), the top models by ranking confidence do show casein engaged inside the pore, and in one model, casein is in contact with one of the six N-terminal domains (Fig. 7c–d). The interaction between casein and N-terminal domain is predicted via hydrophobic patches (Fig. 7d), in good agreement with NMR data39. Predictions where the ClpB interacts with casein have higher pLDDT scores than those where ClpB is not engaged with casein (Fig. 7e). Once more, the ranking confidence correlates well with the DockQ score of all interfaces (Fig. S15).NeurofibrominNeurofibromin (NF1) is a downregulator of the oncogenic protein RAS and is ubiquitously expressed in the central neural system40. Neurofibromin plays therefore a very important role in tumor growth and its mutations cause the pleomorphic disease neurofibromatosis type 141 and are found in up to 10% of all cancers42.NF1 is an homodimer of around 600KDa and has a unique oligomeric arrangement made of a bi-lobate platform, composed of ARM and HEAT repeats43,44,45,46. The RAS-binding domain GRD and the membrane-binding domain Sec14-PH are anchored to this helical platform by long loops that allow large movements of these domains43,45. Cryo-EM structural studies43,45,46 show two main conformations of the GRD and Sec14-PH domain that go from a so-called closed auto-inhibited conformation to an open conformation, which can bind RAS.The standard version of AlphaFold-Multimer fails to find the right conformation of both monomeric and dimeric NF1 arrangements. In order to test whether AF_unmasked could reproduce the movements of GRD and Sec14-PH that are visible in the experimental data, we decided to remove these domains from the available deposited PDB structure (PDB ID: 7PGU) and perform inpainting on them. Given the considerable size of the inpainted domains, as well as the considerable scale of the movements involving these domains, we perform additional sampling and produce roughly 1000 models. Results show that AF_unmasked, using the bilobate helical part of the complex as template, was able to inpaint the missing GRD and Sec14-PH domains (Fig. 8a) as well as all the loops missing in the cryo-EM models (Fig. 8b). The inpainted domains are in good agreement with the cryo-EM and the crystal structures of the GRD and Sec14-PH domains (PDB ID: 6OB3, 1NF1, 3PEG, see also Supplementary Data). Results also shown good correlation between ranking confidence and DockQ scores (Fig. S16).Fig. 8: Analysis of the Neurofibromin (NF1) dimer using AF_unmasked.a Inpainting of the GRD and Sec14-PH domain of NF1 in an intermediate conformation in between the experimentally observed closed and open states. b Comparison of the closed experimental conformation of NF1 isoform 2 (on the left) with a AF_unmasked conformation. Important regions are highlighted. c Superimposition of several AF_unmasked predictions where the GRD and Sec14-PH domain were modelled in intermediate positions, suggesting a possible motion path for these domains. d Three different AF_unmasked predictions showing a bending of the helical NF1 platform, also represented with differently coloured curved lines.However, possibly due to the complexity of this inpainting task and the fact that the closed, auto-inhibited conformation of NF1 is stabilised by a coordinated Zn atom that cannot be taken in consideration here (Fig. 8b), none of the predicted models captures the state with one monomer in the closed conformation and one in the open, as seen in one of the deposited structures. On the other hand, the predicted conformations of GRD-Sec14-PH could be intermediate states of transition from the closed to the open NF1 conformations found experimentally in the cryo-EM studies. Indeed, the structures predicted this way can be used to aid in generating a morphing between the closed and open positions of GRD and Sec14-PH, which looks very different and structurally more likely (Supplementary Movies 1–2) than the simple morphing from closed to open43. These movements are only interpolations between predicted states that cannot be confirmed experimentally yet, but they represent solid hypotheses to be confirmed with further biochemical experiments and molecular dynamics simulations.Interestingly, depending on the neural network model, different types of variability are picked. For example, we were also able to model different bending states of the helical platform (Fig. 8d) that are in good agreement with the 3D variability analysis performed in cryo-EM (Supplementary Movie 3).

Hot Topics

Related Articles