Computational analysis of human gut microbial prolyl oligopeptidases (POPs) reveal candidate genes as therapeutics for celiac disease

Distribution of POP family (S9 family proteins) in gut microbiota across bacterial phylaThe cumulative dataset generated from the six databases contained around 4400 proteomic assemblies distributed among multiple bacterial phyla. Figure 2 shows the distribution of POPs that could be identified in the gut microbiome as present in our dataset. Studies have shown that gut microbiome consists of bacteria from four predominant phyla—namely Bacteroidetes, Actinobacteria, Firmicutes and Proteobacteria20. Our study also reports similar results (Fig. 2 and Additional file 3). Along with the above mentioned four phyla, bacteria from two other phyla (namely Fusobacteria and Verrucomicrobia) were also found to contain POP domain. However, no genes of the POP family (featuring the distinctive dual-domain architecture) were detected within these two phyla. Consequently, only genes coding for POP originating from the dominant four phyla were included in the analysis presented in Fig. 2. We observe that nearly 50% of the identified POP homologous sequences are from phylum Bacteroidetes. Proteobacteria and Firmicutes constitute almost the same number of POP family genes (~ 20%). Finally, the phylum Actinobacteria has around (13%) of POP family genes (Fig. 2A). Figure 2B represents the distribution of bacterial phyla when data is segregated per species. We notice most genomes from Firmicutes have a single gene coding for a protein belonging to the POP family. In contrast, for the phyla Actinobacteria and Proteobacteria, the number varies from 1 to 3. Finally, the phyla Bacteroidetes have many genomes that encode for multiple POPs, where the number of genes varies from 1 to 10; however, most of the genomes were observed to have more than four POP genes (Fig. 2B).Figure 2(A) Proportion of POP homologs in 4 dominant gut bacteria phyla. (B) The distribution of bacterial phyla when data is segregated per species.Gut microbiome genes containing POP domains show diverse domain architectureDiverse domain architectures were found in the genes of the gut microbiome that contain the sequence signature of POP domains (Fig. 3). The most common one constitutes sequences having only a catalytic domain. Sequences having domain architecture DPP_IV and catalytic domain (DPPIV_N-Peptidase_S9), as well as sequences having domain architecture as propeller domain and catalytic domain (Peptidase_S9_N-Peptidase_S9) are also common. Other frequently occurring domains are alpha–beta hydrolase 2, PD40, and esterase.Figure 3Domain architecture of the POP homologues. Numbers on the left indicate the number of sequences having that particular domain architecture.Alpha–beta hydrolase and esterase are protease sequences which share remote homology with S9 family proteins or act as intermediates. Hence, it is not uncommon that some of these domains are found in S9 family peptidases. In addition, many other domains were also present in very low frequency (present in one or two proteins) (details in Additional File 4). However, we chose candidate POPs which contain the classic two-domain architecture (N-terminal propeller domain and catalytic domain) for structural studies, which were fulfilled by 734 of the proteins.Modelled structure of POP domain shows stable conformationThe structure of POP from the bacteria Aeromonas caviae is known in both open and closed conformation. This domain (closed form: PDB ID 3IVM) showed nearly 30 to 40% identity to the selected sequences and was hence considered as a template for structure modelling of the sequences. From the 734 sequences, 10 of the protein sequences which were found in probiotic bacteria belonged to diverse phyla, had more than 30% identity, and more than 90% query coverage were taken for modelling. The alignment of the query and template shows the preservation of conserved residues, along with the catalytic triad, which is present towards the C-terminal of the sequence in the catalytic domain (Fig. 5). There was good agreement between the secondary structural elements. More than 98% of residues were in the core allowed region of the Ramachandran plot of the chosen models, indicating the better quality of models, while few residues were in the disallowed region (Fig. 4). A representative model from Prevotella stercorea is shown in Fig. 4B. The modelled catalytic domain is shown in blue while the modelled propeller domain is shown in red. The superimposed structure of the model with the template structure (Fig. 4A) shows an RMSD value of 0.18, which indicates that the modelled structure is quite similar to the template structure. The per residue energy of the modelled structure is negative (Fig. 4C), and the Ramachandran plot (Fig. 4D) shows only six residues in the disallowed region. The statistics of all the other modelled structures are available in Additional File 5.Figure 4(A) Superimposed structure of the template and model. (B) Modelled structure. (C) Per residue energy of the Modelled structure. (D) Ramachandran map of the modelled structure.The selected sequences are either probiotics or show probiotic potentialProbiotic products typically take the form of oral supplements or food-based products containing microorganisms, typically bacteria. We hypothesize that a gut bacterium could be regarded as a potential therapeutic in this context if it is generally used as a probiotic and also retain genes which have sequence signature of POP domains. These criteria could suggest that such enzymes are capable of degrading the harmful celiac disease-containing peptides. We evaluate the effectiveness of six potential bacterial sequences (Table 1) to degrade the known celiac disease epitopes21 (Fig. 6C). The list of species taken into consideration and the reasoning is mentioned in Additional file 9. The most commonly used strain for gram-negative probiotics is Escherichia coli Nissle 191722. Escherichia coli is also one of the common bacteria of the human gut Microbiome. We have evaluated the sequence WP_021564606 from E.coli.Table 1 The sequence IDs of the six selected POP sequences, species names and phylum of the bacteria chosen for study.Figure 5Conservation of active site residues and other stabilising residues: The multiple sequence alignment of selected POP sequences with crystal sequence 3IVM shows the equivalent residues. The active site residues are marked in red while the other three stabilising residues are marked in blue. Only the relevant section of the alignment from the catalytic domain encompassing these six residues are shown. All six residues are conserved in POP sequences apart from the sequence identified from Bacillus subtilis which is an oligopeptidase B.Figure 6(A) The representative docking showing hydrogen bonding with important residues. (B) The representative structure of docking. (C) Heatmap of binding energy between the peptides and modelled enzymes.Bacteria within Prevotella species are one of the most important members of the gut microbiome, especially in Indian populations23. Studies show that the genomes of Prevotella species recovered from human gut metagenomes show that the most prevalent and relatively abundant species are primarily related to P. copri and P. stercorea. These species possess distinct gene repertoires likely reflecting adaptations to metabolic niches24.We could not find any genes with potential POP domain in any of the strains of Prevotella copri through our sequence searches, but genes coding for POP domains with specific two-domain architecture were found in P. stercorea and other Prevotella species.Recent studies also suggest that Prevotella species could be used as next-generation probiotics25.We have evaluated the sequence WP_040596915 from Prevotella Stercorea and WP_081457517 from Prevotella salivae for our analysis.Growing attention is being directed towards Faecalibacterium prausnitzii, an abundantly prevalent bacterial species within the gut. This heightened interest arises from its potential significance in fostering gastrointestinal well-being. This species is shown to have anti-inflammatory properties as it can induce a tolerogenic cytokine profile26.Research shows that Faecalibacterium prausnitzii could be used as a next generation probiotic in gut disease improvement27. Hence, we have modelled the sequence VDR34335 of Faecalibacterium prausnitzii, corresponding to the POP domain, for our analysis. Clostridium species are a predominant cluster of commensal bacteria in our gut, and they exert considerable effects on our intestinal homeostasis. Clostridium cadaveris is usually considered non-pathogenic, and has been shown to have probiotic potential28. Likewise, we have modelled the POP domain in one sequence (WP_035771429) from Clostridium cadaveris for our analysis.The commercial Bacillus probiotic strains in use are B. cereus, B. clausii, B. coagulans, B. licheniformis, B. polyfermenticus, B. pumilus, and B. subtilis. These probiotic bacteria have demonstrated their ability to significantly improve digestive health, and they even have a role in preventive healthcare with their ability to lower lipid levels and improve immune response4.Almost all the POP sequences from Bacillus strains found in our analysis from gut microbiome consists of only the catalytic domain. However, we could discover POP sequence from B. subtilis, which contain the two-domain architecture and hence can be considered for protein modelling and subsequent study. This sequence belongs to S9A, and it is an oligopeptidase B, identified by the conserved sequence motif (GGSXGGLL) around the active site serine. Oligopeptidase B also has a similar domain architecture as POP but differs in the motif around the active site (as previously mentioned in the introduction). Studies show that Oligopeptidase B can also cleave proline-rich peptides29. Considering the widespread use of Bacillus strains as probiotics, we have included the following oligopeptidase B sequence from B. subtilis for docking and molecular dynamics study. Although the strain from which the sequence is selected is not found in the above six studies, other studies show that B. subtilis is a part of the gut microbiome30. We have included the POP domain in the sequence MCF7751424 from B. subtilis for our analysis.Inhibitor and substrate dockingFour of the potent celiac disease-causing epitopes (Proline containing sequence derived from Gluten subtype α and γ) which are recognised by HLA DQ2.5 were considered for peptide docking analysis.Z-Pro-Prolinal (ZPR) (detailed structure in Additional File 6) is known to be a natural inhibitor for POPs. We first analysed the crystal structure of the 3IVM (ligand (ZPR) bound form of POP). Apart from the active site residue, the closed form is shown to bind to three other stabilizing residues (Tyr 458, Trp 579 and Arg 624). It is known from the literature that Trp 579 is the critical residue that packs with the Proline ring of the substrate, and Arg 624 is known to be involved in hydrogen bonding to the substrate3. In the docking analysis, only those docking poses which show interaction with the catalytic site and the stabilizing residues were considered. ZPR and the four most potent celiac disease-causing peptides were considered for docking analysis. Due to the presence of slight insertions and deletions within the analysed POP sequences, there is no exact correspondence between the residue numbers of sequences and the crystal structure residues. To address this disparity, equivalent residues were determined through a multiple sequence alignment encompassing all selected sequences and the template (Fig. 5 and Table 1). Notably, all active site residues and stabilizing residues exhibit 100% conservation across the POP sequences. The only exception is in an Oligopeptidase B in Bacillus subtilis, where Arginine and Tryptophan are not conserved, although the catalytic triad remains preserved.The docking of all the selected sequences with above mentioned celiac disease-causing epitopes were performed using Schrodinger software (for details, please refer to Methods section). A representative 2D-docking diagram is shown in Fig. 6A, along with the hydrogen bonding patterns with the critical residues Arg, His, Tyr and Ser. The 3D-model of protein-peptide docking is shown in Fig. 6B. The heatmaps for docking score for all peptides and sequences are shown in Fig. 6C. Bacteria in Prevotella and Faecuilibacterium show the lowest average binding energy, considering all the peptides. All the peptides show similar average binding energy for all the sequences.Molecular dynamics simulation shows stable binding of peptidesNext, MD simulation was run for 100 ns on all six protein-peptide complexes in order to assess the stability of docking complexes. The peptide PQPELPYPQ, which is a part of the gluten, was selected as a representative peptide for MD analysis. A representative MD run is shown for a Prevotella POP in Fig. 7. The RMSD plot shows the stability of the complex over the 100 ns simulation period (Fig. 7A). It was also observed that the simulation converges, and the RMSD values stabilize around 20 ns and do not fluctuate much for a duration of 100 ns. Ligfit prot (Fig. 7A) indicated that the RMSD of the protein–ligand complex remains stable and is comparable to the protein RMSD, and hence, it indicates a stable protein–ligand complex for the whole duration of 100 ns.Figure 7(A) Protein and ligand RMSD values from the MD simulation of the identified POP from Prevotella. (B) Interactions between protein and peptide that persisted for more than 30% of the simulation time. (C) Bar graph showing protein–ligand contacts for every protein residue, coloured by time and scaled by fraction of simulation time. (D) Total number of protein–ligand contacts over the duration of the simulation time.The major interacting residues which form H-bonds with the ligand and remain for more than 50% of the simulation time are Lys 150, Lys 197, Asp 273, Phe 482, Asn 699 and Trp 600 (Fig. 7B,C). Trp 600 is one of the essential residues known to bind the ligand. Another essential residue mentioned earlier was Arg 645. A hydrogen bond with Arg 645 was observed, but not for more than 50% of the simulation time. However, Arg 645 was found to be generally interacting for almost 90% of the simulation time, including hydrophobic interactions. Few other residues interact through water bridges or hydrophobic interaction as well (Fig. 7C). At each point during simulation time, the protein remains in contact with the ligand (Fig. 7D), which suggests the stability of the complex.MD simulation for 100 ns was run for other protein-peptide sequences, and the complexes were found to be stable for the entire simulation time. The data is shown in Additional File 7. The equivalent residues of Tyr 479, Trp 600 and Arg 645 are found to be interacting in other selected POPs as well (details in Additional file 7). As mentioned earlier, these peptides are the actual epitopes which act as T-cell epitopes, which eventually cause CD.In our analysis, we observed that POPs from the above-mentioned sequences are able to bind to these epitopes. Additionally, our molecular dynamics simulation study indicates the stability of these interactions. Hence, these POP sequences could be potentially used to degrade CD-causing peptides. Additionally, since all of the selected sequences are from probiotics/gut bacteria, it would be easier to design experiments and should be safe for human consumption. Consequently, we propose that these POPs from these select probiotic bacteria could potentially act as therapeutics and offer relief from the symptoms of CD.MD simulation of Prevotella in acidic environmentOne requirement for a candidate for effective therapeutics for CD is that the protein should be able to show activity in both the acidic environment of the stomach and at neutral pH in the intestine. It is absolutely important that a candidate therapeutic for CD should be stable in both acidic (in the stomach) and in neutral pH (in the intestine). Therefore, we next evaluated the stability in an acidic environment and chose the POP identified from Prevotella. The MD simulation was redone at 2.5 pH, simulating the acidic environment of the stomach for 200 ns. The RMSD diagram reveals that the protein-peptide complex is stable for the whole duration of 200 ns (Fig. 8A). In addition, the percentage conservation of secondary structure shows that most of them remain stable during the simulation time (Fig. 8B). The protein–ligand interacting graph (Fig. 8C and D) shows that several important residues, such as Arg 645, Tyr 479, and Ser 559(Active site residue), retain hydrogen bond interaction and are in contact with the ligand for a significant amount of time (40–90% of the time).Figure 8(A) RMSD of protein and ligand complex, (B) Conservation of secondary structures throughout the simulation time (200 ns), (C,D) The interaction of peptide residues with POP sequence and interaction time as calculated as fraction of simulation time.

Hot Topics

Related Articles