Protein interactions in human pathogens revealed through deep learning

Computational pipeline for proteome-wide PPI identificationTo screen through hundreds of millions of protein pairs for PPIs, we first sought to increase the computational efficiency of PPI identification without compromising accuracy. We previously developed a two-track RoseTTAFold (RF 2-track) network that is a simplified version of RoseTTAFold14, which predicts 3D protein structure from amino acid sequence. Although RF 2-track was not trained to model protein complexes or distinguish interacting from non-interacting proteins, residue–residue distograms produced by this network enable the detection of PPIs on a proteome-wide scale at an accuracy that far exceeds statistical analysis of coevolution between proteins9. Similarly, we and others have used AlphaFold (AF)15 to evaluate interactions identified in lower-accuracy large-scale screens9,10,11,16,17; the computational cost of AF prohibits its application on a proteome-wide scale. AF-multimer (AFmm)18 was trained to model three-dimensional (3D) structures of known protein complexes, and consequently, it tends to predict PPIs between non-interacting pairs, showing a worse performance than AF in distinguishing true PPIs from random pairs (Fig. 1b, top).Fig. 1: PPI identification by coevolution and deep learning methods.a, Overview of the RF2-Lite network architecture. FAPE, frame-aligned point error. b, Benchmark performance of PPI prediction methods. Top: precision and recall curves of DCA (grey), RF 2-track (black), RF2-Lite (blue), AF (green) and AF-multimer (purple) in distinguishing true PPIs from random protein pairs. For different methods, we used the pMSAs generated by our bioinformatic pipeline (Supplementary Methods). We applied each method on a benchmark set of 1,000 randomly selected positive control pairs and 10,000 negative control pairs (Supplementary Methods). The precision and recall curve for this benchmark is in Supplementary Fig. 6a. Real signal-to-noise ratio for the PPI screen is on the order of 1:1,0001; to reflect the impact of a much larger set of non-interacting pairs, we upsampled the negative control set to 1,000,000 by randomly sampling 100 ‘pseudo’ interacting probabilities from the Gaussian distribution around each real interacting probability we obtained for the negative controls with a standard deviation of 0.1. Bottom: runtime comparison of different PPI identification methods. c, Schematic overview of our PPI screen pipeline. d, Precision and recall curves at different stages in the pipeline. Top: DCA on PPI prediction; solid black vertical line represents the recall cut-off in this stage. Middle: RF2-Lite screen procedure on the ‘pilot set’; solid black vertical line indicates the recall cut-off at this stage. Bottom: AF screen procedure on the ‘pilot set’; dashed horizontal line shows the precision cut-off, that is, 0.95. e, Summary of predicted PPIs for the ‘pilot set’ that focuses on essential genes and virulence factors. Left: interactions between interacting essential genes in the ‘pilot set’ based on different evidence: blue, green and orange circles represent our predicted pairs, functional interactions according to STRING (total score ≥900 and experimental score ≥400) and interacting pairs according to PDB (BLAST hit to complex in PDB e ≤ 0.00001, sequence identity ≥50% and coverage ≥50%), respectively. Right: PPIs involving virulence factors in the ‘pilot set’ supported by difference evidence: red, purple and yellow circles represent our predictions, pairs according to STRING and pairs according to PDB.We hypothesized that a dedicated lighter-weight network trained on both interacting and non-interacting protein pairs that balances accuracy with speed could assist proteome-wide PPI screens. We revised the original RoseTTAFold network by introducing architectural improvements to increase accuracy while reducing the number of layers to enable the rapid computation necessary for large-scale screens (Fig. 1a and Supplementary Methods). We trained this network using a combination of (1) monomeric protein structures from Protein Data Bank (PDB), (2) AF models of UniRef50 sequences, (3) pairwise protein complex structures extracted from PDB and (4) random non-interacting protein pairs. The four types of training data were mixed at a ratio of 1:3:2:2 (Supplementary Table 3). The model was trained using the masked language model loss, distogram prediction loss, frame-aligned point error loss, accuracy estimation loss, bond geometry loss and van der Waals energy loss. For the negative interaction examples, we ignored the inter-chain region for frame-aligned point error calculation and required the network to predict the distogram to be in the ‘non-interacting bin’ for the inter-chain region. We designate the resulting network as RoseTTAFold2-Lite (RF2-Lite) as it resembles the RoseTTAFold2 architecture but has many fewer parameters, because we reduced the number of parameters by chunking the number of blocks19. RF2-Lite has improved performance in distinguishing true PPIs over the previous RF 2-track at the same precision, the recall for true PPIs by RF2-Lite is in between RF 2-track and AF (Fig. 1b, top, and Supplementary Figs. 6 and 7). Despite this increase in accuracy, RF2-Lite’s speed is still comparable to RF 2-track, and it requires about 20-fold less compute time than AF (Fig. 1b, bottom).We combined direct coupling analysis (DCA)20, RF2-Lite and AF (Fig. 1c and Supplementary Methods) to identify and model interacting proteins and applied this pipeline to the 19 human pathogens listed in Supplementary Table 2. To monitor the performance of our pipeline, we assembled a set of positive controls and an ~700-fold larger negative set based on information from the STRING protein-protein interaction database (Supplementary Methods).We constructed a database of 44,871 representative bacterial proteomes/genomes (one per species) obtained from the National Center for Biotechnology Information (NCBI) and used the reciprocal best hit criteria21 to identify an orthologue for every protein in each proteome (Supplementary Fig. 1). We aligned these orthologous sequences22,23, and for each protein pair in each of the 19 pathogens (Supplementary Fig. 2), we concatenated their multiple sequence alignments (MSAs) by connecting sequences of the same species to generate pMSAs (Supplementary Fig. 3). We removed proteins whose monomeric structure could not be confidently modelled by AF (average predicted local distance difference (pLDDT) test <50 in AFDB) and filtered the pMSAs based on their depth and quality (Supplementary Figs. 4 and 5): of the total 140.2 million protein pairs, we selected 77.9 million (56%) with higher monomer structure and MSA quality.We assessed the residue–residue coevolution for the selected pairs using DCA and found that the 7.7 million (10%) high-scoring protein pairs by DCA contained 79% of the positive controls (Fig. 1d, top). Among these 7.7 million pairs, we initially focused on a ‘pilot set’ of 0.14 million pairs involving at least one virulence factor (according to VFDB) and 0.83 million pairs of essential genes (according to the Database of Essential Genes). We removed redundancy in this set by clustering proteins from the 19 species into orthologous groups using OrthoMCL v.6.10 (ref. 24). If the orthologues of a protein pair were present in multiple species, we selected only one pair with the highest DCA score, resulting in a total of 457,310 representative PPI candidates.We used RF2-Lite to identify confident PPIs from the ‘pilot set’ and observed that we could achieve a recall of 28% at a precision of 95% when an RF2-Lite contact probability cut-off of 0.74 was used (Fig. 1d, middle). We investigated whether using a loose RF2-Lite cut-off (contact probability 0.05) to select candidate PPIs (46,609, around 10% selected) for AF could improve recall. The RF2-Lite → AF pipeline only improved the recall to 29% at 95% precision (Fig. 1d, bottom) at the cost of using 3 times more computer resources than simply relying on RF2-Lite to detect PPIs (Supplementary Table 4). Thus, the contribution of AF in distinguishing true PPIs from random pairs is limited, but it remains essential for obtaining high-quality 3D structures for the predicted protein complexes.The successive use of DCA (selecting top 10%), RF2-Lite (cut-off, 0.05) and AF (cut-off, 0.9) collectively reduced the total number of random pairs by nearly 10,000-fold, resulting in 562 highly confident predictions from the ‘pilot set’. The identified binary protein complexes include 461 protein complexes involving essential genes (Fig. 1e, left) and 115 involving virulence factors (Fig. 1e, right). Further investigation of these interactions may be useful for understanding the mechanisms of pathogenicity and developing disease prevention and treatment strategies. The vast majority (19%) of predicted protein complexes from the ‘pilot set’ did not have experimental 3D structures in PDB (BLAST e ≤ 0.00001, identity ≥50% and coverage ≥50% for both proteins), and half do not have confident experimental support according to STRING25 (Supplementary Table 5).To gain more structural and functional insights into these pathogens, we applied the RF2-Lite to AF pipeline to an additional 3.82 million pairs involving essential proteins and biological processes of therapeutic interest, such as the outer-membrane machinery (Supplementary Table 6). This search resulted in an additional 3,051 predicted PPIs. To facilitate downstream studies, we deposited all confident models to ModelArchive (see the Data availability statement) and provided additional metadata in the Supplementary Data 1. Inspection of the predicted PPIs revealed a small number of proteins (in particular ferredoxin and rubredoxin) with predicted interactions between many random proteins, likely constituting small false-positive hubs. We removed 405 PPIs involving such potential false-positive hubs before deposition to ModelArchive.It is difficult to cover even a small fraction of the biological insights that can be revealed from these 3D structures of protein complexes in one paper. In the following sections, we first describe experimental validation for a subset of predictions and then highlight examples that illustrate some of the biological insights revealed by the identification of putative PPIs and computational modelling of protein complexes.Experimental validationTo corroborate our benchmarking analyses, which suggest that our predicted interactions should be quite accurate, we selected two sets of predicted interactions for experimental characterization. We biased these selections towards PPIs with no previous experimental evidence or strong functional associations because validating such interactions could provide new biological insights. The first set (Supplementary Table 7) was selected based on statistical methods (GREMLIN) for PPI detection, before the development and application of the deep learning methods. This set was used to probe the accuracy of statistical (DCA and GREMLIN26) versus deep learning methods for PPI detection. The second set (Supplementary Table 8) was selected from our final set of predicted 3,613 PPIs, with a goal of evaluating the accuracy of our current entire pipeline.We selected the first dataset using the following criteria: (1) at least 20 kb apart (with a minimum of 20 intervening genes), (2) not having homologous complexes in the PDB, (3) not predicted to have the same molecular function, (4) not annotated as part of the same biological pathway and (5) not strongly supported by STRING (combined score of <800). All 11 pairs show strong coevolution according to DCA and GREMLIN, but five pairs were not predicted to interact by RF2-Lite or AF (Supplementary Fig. 11). A bacterial two-hybrid (B2H) system27 coupled with a quantitative β-galactosidase assay28 was used to measure interactions for these 11 pairs (Supplementary Fig. 12).Despite the strong support by DCA and GREMLIN, the five pairs not predicted to interact by RF2-Lite or AF did not show evidence of interaction using the B2H assay (Supplementary Fig. 11). Among the six pairs supported by RF2-Lite or AF, reporter activation indicative of interaction was detected for two: one is between iron-sulfur cluster binding protein lpg2881 (Uniprot: Q5ZRK0) and uncharacterized protein lpg0371 (Uniprot: Q5ZYK1) from L. pneumophila; another is between ribosomal silencing factor RsfS (PA4005; Uniprot: Q9HX22) and PhoH-like protein domain-containing protein YbeZ (PA3981; Uniprot: Q9HX38) from P. aeruginosa (Fig. 2a). For one additional pair, nucleoid-associated protein lmo2703 (Uniprot: Q8Y3X6) and signal recognition particle protein Ffh (Uniprot: Q8Y695) from L. monocytogenes, we were unable to assess the interaction experimentally due to false-positive reporter activation when only one protein was expressed (Supplementary Fig. 12). The remaining three pairs failed to generate a positive reporter signal; however, false-negative results from B2H assays do not necessarily rule out the existence of a genuine interaction due to possible failures in protein expression and folding of the fusion proteins, and lack of sensitivity of the screen to weak and transient interactions.Fig. 2: Experimental validation of selected PPIs.a, Interactions assessed by B2H that measures β-galactosidase activity resulting from activation of the lacZ reporter gene due to the interaction between two tested proteins that are fused to two domains of a transcription activator. E. coli expressing T25-zip and T18-zip fusion proteins was used as a positive control (+ control), and E. coli harbouring empty T25 and T18 plasmids was used as a negative control (− control). m/m, mix-and-match control. RU, relative unit (luminescence per optical density at 600 nm per h). Error bars indicate ±s.d. (n = 2 biological replicates each with 2 technical replicates). Computed models of experimentally validated PPIs (‘lpg2881 + lpg0371’ and ‘RsfS + YbeZ’) are shown on the right—top: iron-sulfur cluster binding protein lpg2881 (Q5ZRK0) and uncharacterized protein lpg0371 (Q5ZYK1) from L. pneumophila; bottom: ribosomal silencing factor RsfS (Q9HX22) and PhoH-like protein domain-containing protein YbeZ (Q9HX38) from P. aeruginosa. b–e, Interactions validated by Co-IP/pull-down. Predicted interacting partners in each PPI pair are heterologously expressed and tagged (–H, hexahistidine; –V, VSV-G epitope). A random bait protein was included as a negative control for each experiment. Control lanes correspond to samples with prey proteins and beads added without any bait proteins. Each positive interaction is supported by two independent Co-IP/pull-down experiments. b, Ubiquinone biosynthesis C-methyltransferase UbiE (P0A887) and protein of unknown function YcaR (P0AAZ7) from E. coli. c, Uncharacterized protein PA4106 (Q9HWS2) and a putative transcriptional factor PA4105 (Q9HWS3) from P. aeruginosa. d, lpg2881 and lpg0371 from L. pneumophila, a pair that is tested positive by B2H as well. e, Putative imidazole glycerol phosphate synthase subunit hisF2 (P72139) and lipopolysaccharide biosynthesis protein WbpG (Q9HZ78) from P. aeruginosa. In all the panels, connecting green bars are between representative residue–residue contacts at the interfaces predicted from the summed AF probability for distance bins below 12 Å. Ni-NTA, nickel-nitrilotriacetic acid; VSV-G, vesicular stomatitis virus glycoprotein epitope.For both PPIs validated by our B2H assays, there are no published data directly supporting functional or physical interactions between the two proteins. However, in both cases, existing evidence indirectly suggests that the interactions could be biologically relevant. The pair of proteins from L. pneumophila (lpg2881–lpg0371; Q5ZRK0–Q5ZYK1) are homologous to proteins of the Rnf electron transport complex (RnfB with 53% sequence identity and RnfH with 36% sequence identity, respectively). The function of these proteins in L. pneumophila is unclear because this species appears to lack the other components of the complex, and one of the proteins, lpg0371, also shares homology with the antitoxin component of the RatAB toxin–antitoxin module. However, in species that encode the complete Rnf complex, RnfB and RnfH directly interact29. The interacting pair from P. aeruginosa consists of the ribosomal silencing factor RsfS and the PhoH-like protein domain-containing protein YbeZ. Under nutrient depletion or during stationary phase growth, RsfS binds to ribosomal protein L14, ultimately preventing the association of the 30S and 50S ribosomal subunits and repressing translation30. This facilitates adaptation to low-nutrient conditions and promotes survival during the stationary phase. The function of YbeZ is less well characterized, but it interacts with the RNase YbeY, and both proteins are required for processing and maturation of the 16S ribosomal RNA31. Our finding that YbeZ and RsfS interact suggests that the regulation of ribosome assembly and ribosome subunit processing may be linked in P. aeruginosa.The second validation set, selected using the deep learning methods, consists of six protein pairs (Supplementary Table 8) lacking homologous protein complexes in the PDB, with little support in STRING (only one pair STRING > 600) and distant in the genome (separated by >100 genes) in half the cases. We focused on proteins consisting primarily of globular domains (percentage of residues from non-globular domains, <20%), as such proteins are more amenable to heterologous expression-based assays. Using co-immunoprecipitation (Co-IP) assays, we detected an interaction between four of the six pairs (Fig. 2b–e). These include a pair we had previously validated by B2H, Q5ZRK0–Q5ZYK1, a distally encoded pair from Escherichia coli, and two proximally encoded pairs from P. aeruginosa. E. coli UbiE catalyses a carbon-methyl transfer reaction in the biosynthesis of ubiquinone (coenzyme Q) and menaquinone (vitamin K2)32, while YcaR is a small protein detected as differentially expressed in multiple proteomics studies but to which no function has been assigned33,34. P. aeruginosa PA4105–PA4106 (Q9HWS3–Q9HWS2) are uncharacterized proteins with no clear homologues of known functions based on primary sequence comparisons, but a FoldSeek v.8 search35 revealed structural similarity between these proteins and TglI and TglH from Pseudomonas syringae pv. maculicola (P. syringae) which form a complex that catalyses the removal of cysteine β-methylene (β-CH2) from TglA–Cys, a step in the biosynthesis of the natural product 3-thiaglutamate (3-thiaGlu)36,37. P. aeruginosa Q9HZ78–P72139 are an amidotransferase essential in B-band lipopolysaccharide biosynthesis (WbpG, Q9HZ78) and a predicted imidazole glycerol phosphate synthase subunit (HisF2, P72139). It was previously proposed that HisF2, together with HisH2, delivers ammonia to WbpG38, a hypothesis our interaction finding supports. The PtsH–PtsN (Q9HVV2-Q9HVV4) pair with the highest support by STRING (score = 959) failed to generate a positive Co-IP signal (Supplementary Fig. 13); PtsH is a histidine-phosphorylatable phosphocarrier protein encoded adjacent to PtsN, a nitrogen regulatory protein with a phosphotransferase component, and the interaction between these proteins may be transient and thus difficult to detect by Co-IP.These experimental data support the in silico benchmark in suggesting that the deep learning methods have greater accuracy than statistical methods in PPI discovery, identifying additional components for well-known biological pathways and accelerating the characterization of proteins of unknown function. In the following sections, we provide an overview of the much larger set of interactions predicted by the deep learning methods but not yet experimentally validated; to illustrate the insights that can be gained from these data, we provide biological context for selected interaction pairs and higher-order assemblies.Binary interactionsFrom the total set of 3,613 predicted binary PPIs, 1,686 (47%) have homologous complexes in PDB (BLAST e ≤ 0.00001 for both proteins), 1,862 (52%) are supported by strong functional association according to STRING (total score ≥900), and 1,284 (36%) are supported by both PDB and STRING; the remaining 1,349 (37%, 3,613 − (1,686 + 1,862 − 1,284)), to our knowledge, are unknown PPIs. Although such previously unsupported PPIs might contain a higher fraction of false predictions, the high precision on our benchmark sets suggests the majority of the new predictions are likely correct. We identify 166 putative interactions that involve uncharacterized proteins (all Pfam domains are uncharacterized; Supplementary Methods), the majority of these pairs (149) include an interaction partner of known functional domains, and 131 (117 with known partners) not well described previously (STRING combined score <900 and BLAST e value to PDB chains >0.00001).Of the predicted PPIs, 1,923 include one or more essential genes. Examples of predicted interactions among essential genes without homologous complexes in the PDB are highlighted in Fig. 3a–j and Supplementary Table 9. In some cases, the predicted PPIs support previous findings from the literature. For example, we predict an interaction between glucose-6-phosphate 1-dehydrogenase 2 (G6PD2) and OxPP (oxidative pentose pathway) cycle protein OpcA (Fig. 3a). G6PD2 is an isozyme of G6PD, a member of the pentose phosphate pathway, catalysing the oxidation of G6P to 6-phosphogluconolactone while converting NADP+ to NADPH and protecting cells from oxidative stress39. OpcA has been implicated as an allosteric activator of G6PD40, but, to our knowledge, the binding site remains unknown. Our predicted interface places OpcA away from the active site of G6PD, consistent with allosteric modulation of activity (Supplementary Fig. 19). We predict an interaction between 30S ribosomal protein S11 (rpsK), a surface-exposed ribosomal protein that forms part of the messenger RNA binding cleft which recognizes the Shine–Dalgarno sequence41,42, and YbeY, a highly conserved endoribonuclease which has been linked to numerous processes such as 16S rRNA maturation, 70S control, and regulation of mRNA43 (Fig. 3f). In some bacteria, YbeY plays a key role in virulence and cell stress44. Our predicted structure of S11–YbeY with an interface mediated by S11 β-strands agrees with previous work that identified S11–YbeY interaction by bacteria 2-hybrid, Co-IP and mutational analyses45. The 3D model of the S11–YbeY complex may lend further insights into how YbeY coordinates cleavage of the rRNA precursor during 16S maturation.Fig. 3: Computed models of binary protein complexes.a–j, Interactions involving essential genes. a, Interaction with an enzyme where the enzymatic site is highlighted in light green with an NAD moiety. b–d, Additional interactions involving essential genes. e,j Interactions involving transport pathways. f–i, Transcription and translation. k–t, Interactions involving virulence factors. u–y, Interactions with uncharacterized proteins. In all models, the first protein is in blue, and the second is in gold. Green bars are between representative residue–residue contacts at the interfaces predicted from the summed AF probability for distance bins below 12 Å. Additional information (organisms and UniProt annotations) is in Supplementary Table 9.Of the predicted PPIs, 256 contain virulence factors (according to VFDB and Uniprot Keywords) that participate in pathogen colonization, nutrient acquisition and evasion of host immunity46. Secreted virulence factors rarely interact with endogenous proteins of a pathogen; consistent with this, we did not detect many PPIs involving virulence factors, and those we did identify mostly involve structural components of flagella (considered virulence factors in many bacteria40) and bacterial secretion systems (Fig. 3k–t). We also identified other interactions related to flagella function, for example, between the anti-sigma factor FlgM, a negative regulator of flagellin synthesis, and flagellar secretion chaperone (FliS) (Fig. 3p), an interaction supported by a previous experimental study47 but without 3D structure information. Our 3D models, in agreement with previous observations47, revealed that FlgM can compete with flagellin (FliC, major structural component of the flagella) for the same interface on FliS; FlgM uses its C-terminal helices to interact with FliS, which could prevent its interaction with the flagellar sigma factor FliA. The FliS–FlgM interaction might provide a negative feedback mechanism to control the expression of flagellin: when intracellular flagellin is abundant, it outcompetes FlgM in binding the anti-sigma factor FlgM, and the release of FlgM antagonizes the activity of sigma factor FliA, turning off the expression of late-stage flagellar genes, including flagellin (FliC).We identify 149 putative interactions (Fig. 3u–y) between uncharacterized proteins (according to Pfam domains) and functionally annotated binding partners such as ketol-acid reductoisomerase IlvC, thiosulfate sulfurtransferase GlpE, ubiquinol-cytochrome c reductase, cell division protein FtsZ and bifunctional guanosine pentaphosphate [(p)ppGpp] synthase/hydrolase RelA. These predicted interaction partners provide contextual hypotheses about the function of these uncharacterized proteins, 72 of which are essential to pathogen survival, to guide further experimental studies aimed at elucidating their functions.Multicomponent protein complexesIn many cases, the predicted binary interactions form larger sets, suggesting the formation of higher-order assemblies. For example, in our set of 3,613 predicted interactions, we found 206 trimeric protein complexes where each component is predicted to directly interact with the other two. Of the predicted binary interactions, 1,545 (40%) involve proteins that have multiple interacting partners, which allows us to build higher-order protein complexes by concatenating the MSAs of multiple proteins and modelling them together through AF.Transfer RNA modification and sulfur transfers in the 2-thio modification complex of E. coli
Transfer RNAs (tRNA) play critical roles in protein synthesis and are often decorated with post-transcriptional modifications that contribute to efficient protein synthesis48. Wobble positions are hotspots of such modifications. In glutamate, glutamine and lysine tRNAs, the wobble uridine is modified to 5-methylaminomethyl-2-thiouridine (mnm5s2U) by tRNA 2-thiouridine synthesizing proteins (Tus)49; which include TusA, TusB, TusC, TusD, TusE and tRNA-specific 2-thiouridylase (MnmA). Cysteine desulfurase (IscS) is essential for 2-thio modification in E. coli49. IscS transfers sulfur from cysteine to TusA, which is transferred to TusD of the TusBCD complex via TusE and subsequently to MnmA, which incorporates the sulfur into the tRNA49,50. The structure of the IscS–TusA dimer and sulfur transfer mediating heterohexameric complex, TusBCD, has been co-crystallized50,51, but structural details for other components of this system are poorly understood. We predicted the structures of the TusE–MnmA and TusE–TusC complexes (Supplementary Fig. 20) and assembled a model of the full TusBCDE heterotetramer which contextualizes the interaction of TusE with TusBCD (Fig. 4a and Supplementary Fig. 21). Our model places TusE close to TusC and TusD, with a confidently predicted TusC–TusE interface (Supplementary Fig. 21e), and is consistent with the hypothesis that Cys108 of TusE accepts sulfur from Cys78 of TusD49,50 but also suggests that TusC serves as a scaffold to bring TusD and TusE to close proximity. We also predict the structure of the TusE–MnmA interaction and find that TusE cannot interact with TusBCD and MnmA simultaneously due to overlap in the interfaces with MnmA and TusD (Supplementary Fig. 20a–f).Fig. 4: Computed models for multi-component protein complexes.a, H. pylori tRNA 2-thiouridine synthesizing protein complex. Left: a model of the TusE(blue)–TusB(gold)–TusC(green)–TusD(pink) complex overlaid with the TusBCD PDB structure (2D1P, shown in semi-transparent grey). Right: an alternative view of this complex. b, The UreAB–UreFGH complex (coloured in cyan, pink, blue, gold and green, respectively) in H. pylori assembled through multiple subcomplexes: UreFGH, UreAB and UreAH. c, Accessory components of the Sec translocon. Top: P. aeruginosa SecG(blue)–SecY(gold)–PpiD(green) complex. Bottom: M. tuberculosis SecY(blue)–SecG(gold)–SecE(green)–CrgA(pink) complex. d, Accessory components of the P. aeruginosa and S. typhimurium outer-membrane β-barrel assembly machinery. Left: interaction between SurA (yellow) and Bam proteins (BamA, blue; BamB, gold; BamE, green). Middle: BamA (blue) and PA1005 (gold), a putative BepA orthologue. Right: interaction between TolC (blue) and BamD (gold). In all schematics, green, red, yellow and magenta bars connect representative residue–residue contacts at the interfaces predicted from the summed AF probability for distance bins below 12 Å.A two-step nickel transfer in H. pylori urease complexUrease hydrolyses urea into ammonia and is broadly conserved in bacteria and eukaryotes. In H. pylori, urease neutralizes gastric acid and facilitates gut colonization52, and thus proteins in the urease complex are considered virulence factors. While most bacterial ureases have three chains (UreA, UreB and UreC), H. Pylori urease has two due to the fusion of UreA and UreB orthologues53. The UreAB(C) system has four accessory proteins: UreE, UreF, UreG and UreH54. We predict a UreA–UreH interaction and use it to assemble a model of a UreAB–UreFGH pentamer (Fig. 4b and Supplementary Fig. 22e). The UreAB(C) and UreFGH substructures have been determined experimentally53,55, and our predictions are consistent with these (Supplementary Fig. 22a–d). During urease maturation, UreFGH receives nickel from UreE, but how this occurs remains poorly understood. Two hypotheses are (a) that UreE transfers nickel to UreFGH complex56 and (b) that upon binding guanosine-5′-triphosphate (GTP), UreG dissociates from UreFGH, receives nickel from UreE and subsequently interacts with the inactive UreFH to activate the complex55. Superimposing our UreE–UreG model onto the UreFGH complex shows that UreE clashes with UreF, indicating that UreE cannot directly interact with the UreFGH complex. Therefore, our observation supports the latter hypothesis wherein UreG likely receives nickel separately from UreFH (Supplementary Fig. 23)55.Sec translocon interactorsThe Sec translocon machinery transports proteins across the plasma membrane. The Sec translocon channel is a heterotrimeric complex composed of SecYEG, which operates in tandem with SecA, a RecA-like ATPase that moves peptides through the SecY channel in a process similar to Sec61 translocon in eukaryotes57. We predict interactions between the Sec translocon and peptidyl-prolyl cis/trans isomerase D (ppiD) (Fig. 4c, top, and Supplementary Fig. 24), which has been identified as the most prominent interactor of SecYEG by affinity purification coupled with mass spectrometry58 and Co-IP59. In our model of the SecYG-ppiD trimer, ppiD primarily interacts with SecY through the transmembrane helices while coming close to SecG via a small loop. We also predict interactions between Sec and CrgA, a transmembrane protein and a component of the divisome (Fig. 4c, bottom). We find that the CrgA–SecY interface occurs near the lateral gate of SecY60 (Supplementary Fig. 25a), potentially occluding Sec translocation. We hypothesize that during bacterial division, CrgA binds Sec to regulate and recruit translocation machinery near the cell division site; this latter hypothesis is further supported by the predicted interaction between CrgA and SecE (Supplementary Fig. 25) and a less confident prediction of CrgA–SecG interaction that fell slightly below our cut-off.Outer-membrane β-barrel assembly machinery of P. aeruginosa and Vibrio cholerae
In Gram-negative bacteria, the β-barrel assembly machinery (BAM) is essential for the folding and insertion of outer-membrane β-barrel proteins61,62. BAM consists of an outer-membrane-spanning β-barrel, BamA, that interacts with four periplasmic lipoproteins, BamB, BamC, BamD and BamE, to form a five-component complex (computed interactions and structures agree with known experimental data (Supplementary Fig. 26))61,62,63,64,65. This complex has recently garnered increased attention as a potential therapeutic target, especially since the discovery of darobactin, a novel antimicrobial compound that binds along the lateral gate of BamA to inhibit outer-membrane protein (OMP) biogenesis66,67.The function of BAM is assisted by several other proteins, including the chaperone survival factor A (SurA) and periplasmic chaperone 17 kDa protein (Skp). SurA plays an important role in facilitating the recruitment of unfolded OMPs from the periplasm to the BAM complex68. Both our BAM–SurA model and a recently published study using an orthogonal approach to ours69 place SurA in the same position to simultaneously interact with BamA, BamB and BamE (Fig. 4d, left). In addition, we predict an interaction between Skp and SurA (Supplementary Fig. 27), which, in addition to their roles in maintaining the solubility of unfolded OMP proteins, may act in tandem to disassemble oligomeric OMPs that have aggregated70.We also predict an interaction between BamA and PA1005 (Uniprot: Q9I4W8) (Fig. 4d, middle), a possible orthologue of β-barrel assembly-enhancing protease (BepA) (Supplementary Fig. 28). E. coli BepA is a periplasmic zinc-metallopeptidase with an important role in outer-membrane homeostasis and is involved in the degradation of BamA in the absence of SurA71. BepA has been shown to interact with BAM72, and further cross-linking experiments suggest that BepA C-terminal tetratricopeptide repeat (TPR) domain is inserted into the periplasmic region of BamA, below the β-barrel71. Our computed model agrees with the proposed broad interface between BamA and BepA, provides structural details into the BamA-BepA interaction and also suggests that when BepA is in complex with BamA, BAM is unable to assemble into its active form due to steric clashes between BepA and periplasmic Bam lipoproteins.TolC is an OMP that homo-trimerizes to form a large outer-membrane export tunnel that interacts with inner-membrane translocases73,74. The catalytic β-barrel domain of BamA binds substrates along the β-barrel seam during OMP folding, and in this process, the N-terminal of the β-barrel likely swings outward75,76. The interaction between BamA and TolC has been recognized as an essential step in the assembly of TolC which occurs in a SurA-independent manner77,78. We predict an interaction between BamD and TolC (Fig. 4d, right), which, when superimposed onto the BAM complex (Supplementary Fig. 29), depicts how the β-sheets of TolC interact with the N-terminal strand of the BamA β-barrel seam. Our computed model shows how TolC could be folded by the BAM complex and suggests that BamD may potentially replace SurA to stabilize or recruit TolC to BAM.

Protein interactions in human pathogens revealed through deep learning

Genome-scale models in human metabologenomics

With departments and courses facing closures UK chemistry needs a new hero | News

What makes songbirds different in their breeding cycles? – Functional Ecologists

A week of Bulk RNA-Seq at the University of Minnesota

University of Nebraska researchers using RNA sequencing for rapid hybrid development

Hot Topics

Genome-scale models in human metabologenomics

With departments and courses facing closures UK chemistry needs a new hero | News

What makes songbirds different in their breeding cycles? – Functional Ecologists

Related Articles

Balancing Act: Pregnancy and Bipolar Disorder

Cohesion at the cellular level: flexible yet stable

Gut bacteria influence responses to immunotherapy in patients with asbestos related cancer

Quick Links

Must Read

Genome-scale models in human metabologenomics

With departments and courses facing closures UK chemistry needs a new hero | News

What makes songbirds different in their breeding cycles? – Functional Ecologists

A week of Bulk RNA-Seq at the University of Minnesota

Popular Articles

Genome-scale models in human metabologenomics

With departments and courses facing closures UK chemistry needs a new hero | News

What makes songbirds different in their breeding cycles? – Functional Ecologists