Computational screening combined with well-tempered metadynamics simulations identifies potential TMPRSS2 inhibitors

The serine protease domain of TMPRSS2 is highly conserved and confirms the canonical chymotrypsin/trypsin fold harboring the catalytic Ser-His-Asp triad26 (Fig. 1A). Our study identifies the active site blockers of TMPRSS2 through virtual screening and validates their interaction strategies, and binding free energies through all-atom MD and well-tempered metadynamics simulations.Figure 1(A) Structure of TMPRSS2 ectodomain complex with nafamostat. The inset shows nafamostat interactions and the catalytic triad of TMPRSS2 (PDB ID: 7MEQ), (B) The schematic of the e-pharmacophore hypothesis (A1A2A3D4) and the view of the protein active site, (C) pharmacophore-based screening of ligand molecules, (D) ligand interactions after XP docking; (i) TMPRSS2-nafamostat complex; (ii) TMPRSS2-vicenin-2 complex; (iii) TMPRSS2-hesperidin complex; (iv) TMPRSS2-neohesperidin complex; (v) TMPRSS2-naringin complex; and (vi) TMPRSS2-rhoifolin complex.Homology modellingThe human TMPRSS2 ectodomain structure was homology-modeled, showing 98% sequence identity with the template PDB ID: 7MEQ. The Ramachandran plot revealed that 95% of amino acids are in the allowed region. A QMEAN global score of 0.9 ± 0.05 and a MolProbity score of 1.24 also ensured the quality of the model (Supplementary Figure 1). The Protein Reliability Report in Maestro was also used to assess the reliability of the model (Supplementary Figure 1). The model showed in-silico parametric stability suitable for further in-silico analysis.Generation of e-pharmacophore modelWe used the homology-modeled ectodomain structure of TMPRSS2 to generate an energy-optimized pharmacophore hypothesis (e-pharmacophore). The protein structure was pre-processed using the protein preparation wizard35 of the Schrodinger suite and subjected to restrained energy minimization using the OPLS436 force field. The e-pharmacophore model was developed using the ‘Develop Pharmacophore from protein–ligand complex’ option in the Phase module of the Schrodinger suite. The amino acids in the receptor binding site were analyzed, and a four-featured pharmacophore hypothesis, AAAD, was generated (Fig. 1B, C) and used as a 3-D search query to screen the molecules from the natural product database LOTUS37, ChEMBL (https://www.ebi.ac.uk/chembl/) database, and OTAVA chemicals (https://www.otavachemicals.com/) with similar pharmacophore features. The phase module analyzed the fitness of the compounds with the query hypothesis and ranked them based on the fitness scores, selecting compounds for virtual screening.All molecular docking experiments were carried out using the GLIDE38,39 module of Maestro 11.4 software. The catalytic centre amino acids of TMPRSS2 (Ser441-His296-Asp345) were centered to create a receptor grid with a grid box dimension of 15 × 15 × 15 Å. The ligand molecules were ranked based on their binding affinities using GLIDE scores (G-scores). High-throughput virtual screening (HTVS) was used for the initial screening, followed by standard precision (SP) docking and additional extra precision (XP) docking. Five molecules (vicenin-2, neohesperidin, naringin, rhoifolin, and hesperidin) selected from the XP docking had a more significant docking score than the well-known inhibitor of TMPRSS2, nafamostat (Table 1).
Table 1 Molecules with the highest binding energy after extra-precision (XP) docking.The results showed that vicenin-2, a flavonoid glycoside, has a higher glide score (− 10.314 kcal/mol) than nafamostat (− 6.815 kcal/mol). Nafamostat is a synthetic serine protease inhibitor that has been proven to be effective against systemic inflammation, certain types of cancer and pancreatitis. It has also been recently reported to be active against SARS CoV-2 by inhibiting the TMPRSS240. However, its easy hydrolysability and poor specificity reduce its efficacy. Here, we used nafamostat as a positive control. The docking studies showed that nafamostat interacts with TMPRSS2 via Ser436, Cys437, Pro471 and Cys297 through hydrogen bonding, and a salt bridge is formed between Asp435 and the guanidino group of nafamostat. 6-amidino-2-naphthol of nafamostat forms the pi-cationic and pi-pi stacking interaction with His296 (Fig. 1D i). Vicenin-2 forms hydrogen bond interactions with TMPRSS2 residues Lys390, Ser436, Cys437, and Gly462 through the glucopyranosyl moiety (Fig. 1D ii). Hesperidin forms hydrogen bond interactions with Ser436 and Gly464 (Fig. 1D iii). Neohesperidin forms hydrogen bond interactions with Ser460, His296, Gly464 and Glu389 (Fig. 1D iv). Naringin makes hydrogen bonding interactions with Gly462, Gly464, Gln438 and Lys340, and hydrophobic contacts with Cys437, Trp461 and Cys465. It also forms a pi-pi stacking interaction with His296 (Fig. 1D v). Rhoifolin forms hydrogen bonds with Lys390, His296 and Gly464 (Fig. 1D vi). The results show that all these compounds interact with the active site residues through hydrogen bonding, hindering TMPRSS2 activity. The binding free energy of these high-scored molecules was calculated through the MM-GBSA (Molecular mechanics with generalised Born and surface area solvation) method. Nafamostat showed binding energy of − 55.49 kcal/mol, while the docked ligand complexes showed binding energy values of − 69.36 kcal/mol (neohesperidin), − 50.84 kcal/mol (naringin), − 59.79 kcal/mol (vicenin-2), − 43.22 kcal/mol (hesperidin), and − 37.68 kcal/mol (rhoifolin).MD simulationsMD simulation is an important tool for studying the dynamic behavior of proteins. In this study, 300 ns MD simulations were conducted to investigate the stability of the TMPRSS2-ligand complexes that are identified through virtual screening. The binding stability of the ligand molecules (vicenin-2, hesperidin, naringin, neohesperidin, and rhoifolin) with TMPRSS2 was assessed through MD simulations. The conformational changes of the protein due to ligand binding can be expressed in terms of the root mean square deviation (RMSD). Comparison of the trajectories of apo-TMPRSS2 (without ligand) and ligand-bound TMPRSS2 simulations showed that apo-TMPRSS2 undergoes higher RMSD changes than the ligand-bound TMPRSS2, suggesting that these ligands are tightly binding to the TMPRSS2 active site restricting the motions of structural elements in the ectodomain. The RMSD changes for the Cα atoms were computed over a 300 ns MD simulation. The TMPRSS2-nafamostat, vicenin-2, neohesperidin, naringin, and rhoifolin complexes showed a mean RMSD profile below 2 Å (Fig. 2A–E), except for the TMPRSS2-hesperidin complex, which showed an RMSD profile above 4 Å (Fig. 2F), indicating that except for hesperidin, other molecules steadily bind to the ectodomain of TMPRSS2. It was observed that the apo-TMPRSS2 RMSD is below 3 Å, whereas other ligand-bound TMPRSS2 complexes have an RMSD below 2 Å. Apoprotein fluctuates up to 90 ns and then stabilizes after 100 ns at around 2.8 Å till the end of the simulation. In the ligand-bound TMPRSS2 complexes, except for the hesperidin-TMPRSS2 complex, the rest of the complexes didn’t show any significant deviations or fluctuations in RMSD, and the systems are fully stabilized up to 300 ns. It indicates that, except hesperidin, all other ligands make stable contacts with the active site of the TMPRSS2 ectodomain. Hence, the TMPRSS2-hesperidin complex is not considered for further analysis.Figure 2(A–F) Plot of RMSD distribution during 300 ns MD simulations, (G) The dynamic stability of the protein–ligand complex accessed by RMSF.We analyzed the dynamics of the TMPRSS2 ectodomain using Root Mean Square Fluctuation (RMSF) analysis of the residues in TMPRSS2 ligand-bound complexes and apo-TMPRSS2. The stability of protein–ligand complexes is largely determined by individual interacting amino acids and their flexibility can be calculated through RMSF analysis. The magnitude of the RMSF value implies either high or low flexibility. High RMSF was found in flexible protein regions, with the most fluctuating parts being the loop regions. The overall values of RMSF fall within 4.5 Å. It was observed that the catalytic residues Ser441, His296, and Asp345, as well as the residues Asp435, Ser460, and Gly462, were less fluctuating due to strong protein–ligand contacts, indicating the structural stability of TMPRSS2-ligand complexes. A low RMSF value of these active site residues indicates stable interactions of these amino acids with the ligands during MD simulations. It was also noted that the N-terminal region of the TMPRSS2 ectodomain had higher residual fluctuations than the C-terminal region (Fig. 2G). The RMSD and RMSF analyses suggest that the TMPRSS2-ligand complexes (vicenin-2, neohesperidin, naringin, and rhoifolin) remained stable throughout the 300 ns MD simulations.We also conducted an interaction analysis on TMPRSS2-ligand complexes and found that both hydrogen bonding and hydrophobic interactions are stable. The intermolecular hydrogen bonds played a crucial role in determining the strength of the ligands binding to the protein (Fig. 3). Along the simulation trajectory, the guanidino group of nafamostat forms hydrogen bonds with the amino acid residues His296, Glu299, Asp435, Ser436, and Gly464. It also forms a salt bridge interaction with Asp435 throughout the simulation (Fig. 3A, B). The interactions of vicenin-2 with catalytic residues of TMPRSS2 during the MD simulations were found to be persistent. The hydrogen bonds stabilizing the binding of vicenin-2 and the residues His296, Glu389, Asp435, Ser436, Cys437, Gln438, Asp440, and Ser460 were persistent during the entire course of the MD simulations (Fig. 3C, D). The hydrogen bond interactions with the catalytic triad remain stable at most instants of MD simulations indicating that vicenin-2 might potentially interfere with the TMPRSS2 activity. Naringin also forms stabilizing hydrogen bonds with the catalytic residues Glu389, Glu439, Ser441, Gly464, and Lys467 (Fig. 3E, F). The neohesperidin-TMPRSS2 interaction was stabilized by hydrogen bond interactions with Lys390, Gly462, Gly464, and Arg470 (Fig. 3G, H). Similarly, the rhoifolin-TMPRSS2 interaction was also stabilized by hydrogen bond interactions with His296, Asp435, Ser436, Ser463, Gly464 and Arg470 (Fig. 3I, J). The interactions fraction indicates the fraction for which a particular amino acid is forming an interaction with the ligand during the simulation. The stacked bar charts are normalized throughout the trajectory and a value of 0.7 implies that 70% of the simulation time the specific interaction is maintained. Values over 1.0 are possible as some protein residue may make multiple contacts of the same subtype with the ligand. In Fig. 3B, Ser436, Gly464; in Fig. 3D, Cys437, Gln438; in Fig. 3F, Ser463, Gly464, Arg470; in Fig. 3H, Glu389, Arg470 and in Fig. 3J, Asp435, Ser463, Gly464, Arg470 implies that the specific interaction is maintained throughout 70% (0.7 on the graph) of the simulation time. The highest value for any given interaction is 1. If the interaction fraction is more than 1, the amino acid is likely forming more than one interaction with the ligand. In such a case, the interaction fraction can go up to 2 or 3 depending on the number of interactions it is forming. In Fig. 3B interaction fraction between Asp435 and Cys464 are more than 1, which means Asp435 and Cys464 have multiple interactions.Figure 3Ligand interactions and the bar plots showing TMPRSS2-ligands contacts averaged over 300 ns simulation time (A–J). (A, B) TMPRSS2-nafamostat complex, (C, D) TMPRSS2-vicenin-2 complex, (E, F) TMPRSS2-neohesperidin complex, (G, H) TMPRSS2-naringin complex and (I, J) TMPRSS2-rhoifolin complex. Ligands are represented as yellow sticks, and the interacting residues are represented as white sticks. The green color in the bar diagram represents hydrogen bonds, and the blue color represents water bridges.Principal component analysisPrincipal component analysis (PCA) is a widely used method to study MD simulation trajectories. PCA is used to determine the meaningful global motions in the protein during simulations. Protein dynamics involve changes in molecular structure and conformation over time, which are best represented by a vector space that spans a large number of dimensions equal to the number of degrees of freedom (DOF). PCA extracts the most important elements in the trajectory using a covariance matrix or a correlation matrix constructed from atomic coordinates that describe the accessible DOF of the protein41,42,43.To analyze the crucial collective motions of TMPRSS2 with and without ligands, covariance matrices of the Cα atoms were built to calculate principal modes. The positive values indicated correlated movements, while the negative values indicated anti-correlated motions between Cα atoms. Principal component 1 (PC1) represents the highest variation in the protein internal motion, and principal component 2 (PC2) accounts for the second most variation in the protein dynamics data. The TMPRSS2-nafamostat occupied most of the apo-TMPRSS2 conformational spaces (Fig. 4A) whereas the TMPRSS2-vicenin-2 complex (Fig. 4B) was completely superimposed in apo-TMPRSS2 conformational spaces, causing no structural changes. This represents the concerted motions of different regions of TMPRSS2 protein both in the nafamostat and vicenin 2 protein complexes. However, the other three lead compounds occupied fewer regions, indicating conformational changes during the binding process (Fig. 4C–E).Figure 4Projection of the TMPRSS2 ectodomain motion along PC1 and PC2 for TMPRSS2-Ligand complexes. (A) TMPRSS2-nafamostat complex, (B) TMPRSS2-vicenin 2 complex, (C) TMPRSS2-noehesperidin complex, (D) TMPRSS2-naringin complex, (E) TMPRSS2-rhoifolin complex.Understanding the atomic motions and their collective behavior in proteins is crucial for understanding their biological function. This information can be extracted from MD trajectories. To determine the presence of correlated motions between the apo-TMPRSS2 and ligand-bound TMPRSS2 complexes, dynamic cross-correlation maps (DCCM) analysis was carried out on the Cα atoms in each simulation. DCCM maps illustrate the inter-residual movements calculated using MD trajectories. The blue and red coloured patches represent strongly correlated (positive) and anticorrelated (negative) motions between residues, respectively. Positive correlation demonstrates the strong stability of the protein–ligand binding (Fig. 5A–F). Positively correlated movements were particularly increased in the TMPRSS2-ligand bound complexes in the range of residues 150 to 350 (active site region), and a higher percentage of pairwise-connected residues indicated stable binding of ligands with the TMPRSS2.Figure 5DCCM maps of Apo-TMPRSS2 and ligand-bound TMPRSS2 complexes. (A) Apo-TMPRSS2, (B) TMPRSS2 with nafamostat, (C) TMPRSS2 with vicenin-2, (D) TMPRSS2 with neohesperidin, (E) TMPRSS2 with naringin, and (F) TMPRSS2 with rhoifolin.The blue and red coloured patches represent strongly correlated (positive) and anticorrelated (negative) motions between residues, respectively.Binding free energy analysisThe binding free energy of the ligands was analyzed through the MM-GBSA method using MD trajectories extracted at intervals of 10 ns. The binding energy, along with other binding energy components such as electrostatic, covalent, Van der Waals, hydrogen bonding, lipophilic, Generalized Born electrostatic solvation, and pi-pi packing energies were calculated (Table 2). The highest binding energy was obtained for neo-hesperidin during the MD simulation (− 73.19 kcal/mol), followed by naringin (− 70.29 kcal/mol) and rhoifolin (− 71.50 kcal/mol). Vicenin-2 had a binding free energy of − 59.95 kcal/mol and the known inhibitor nafamostat had a binding free energy of − 58.39 kcal/mol. The results indicated that these molecules show higher binding free energy and stable interactions with the active site residues of TMPRSS2, indicating that these molecules could be potent inhibitors of TMPRSS2.
Table 2 Binding energy obtained post-MD simulation and the different energy components that contributed to the total binding energy.TMPRSS2-spike interaction studiesWe also created the TMPRSS2-Spike protein complex using the ClusPro 2.0 protein–protein docking webserver and docked these ligands to the catalytic site of TMPRSS2. For ClusPro protein–protein docking, the atomic coordinates of spike protein (PDB ID: 6CS2)44 were downloaded, and missing residues were modeled using UCSF Chimera45. The protein–protein docking was carried out using the modeled TMPRSS2 ectodomain (Supplementary Figure 2). The docked complex was selected for further docking with vicenin-2, neohesperidin, naringin, and rhoifolin using the GLIDE module of the Schrodinger suite. The glide docking scores for the molecules were − 13.294 kcal/mol, − 9.71 kcal/mol, − 9.337 kcal/mol and − 8.531 kcal/mol for vicenin-2, neohesperidin, naringin, and rhoifolin, respectively. Each of these molecules showed high binding affinity, suggesting that these molecules could inhibit the activity of TMPRSS2, which can be further validated by in vitro assays.Well-tempered metadynamics simulationsTo understand the binding free energies of each of the ligands in the active site of the TMPRSS2, the ligand dissociation processes were carried out using well-tempered metadynamics simulations. The TMPRSS2-ligand complexes obtained after 300 ns MD simulations were subjected to well-tempered metadynamics simulations both in the case of nafamostat and the lead molecules vicenin-2, neohesperidin, naringin and rhoifolin using two collective variables: Distance, d and Number of Contacts, Nc. The schematic representation of collective variables is shown in Fig. 6A, B. The lower distance and higher number of contacts indicate the ligand’s proximity to the protein, while the higher distance and lower number of contacts represent the farness of the ligand from the TMPRSS2 active site.Figure 6Schematic representation of the collective variables used for the well-tempered metadynamics simulations. (A) Collective variable, Distance (d) represents the distance between the centre of mass (COM) of the amino acids in the active site (white-coloured surface representation) and the COM of the drug (yellow-coloured surface representation), (B) the collective variable, number of contacts (Nc) represents the number of contacts (blue dots) between the drug and interacting atoms of the amino acids within a 4 Å radius of the atoms of the drug, (C) Bar diagram showing the binding free energy values of different small molecules (light orange) in comparison to that for the positive inhibitor Nafamostat mesylate (blue).We compared the binding free energy of these ligands with respect to the positive control nafamostat (Fig. 6C). The results showed that the lead molecules exhibited a comparable binding affinity towards TMPRSS2. The binding free energy of vicenin-2 (-16.3 kcal/mol) is nearly identical to that of nafamostat. All other lead molecules also showed comparable binding free energy, like nafamostat. The one-dimensional (1D) free energy plot for all lead molecules compared to the TMPRSS2 inhibitor nafamostat is shown in Supplementary Figure 3.The two-dimensional (2D) binding free energy surface contour plot and different conformational states of nafamostat and vicenin-2 during the ligand dissociation process from the active site of TMPRSS2 are shown in Fig. 7A, B respectively. The binding free energy of nafamostat obtained through well-tempered metadynamics simulations is − 16.4 kcal/mol. The average structure of the ligand dissociation process is represented from states I to IV. Nafamostat binding mode at the active site is represented by the average structures of state I (d = 0.62 nm; Nc = 125), followed by state II (d = 1.20 nm; Nc = 125) and state III (d = 2.06 nm; Nc = 50) which represent the intermediate states of the ligand dissociation process and the State IV represents the completely dissociated state (Fig. 7C).Figure 7(A) 2D binding free energy surface of nafamostat, (B) 2D binding free energy surface of vicenin-2, (C) dissociation process of nafamostat with TMPRSS2, (D) dissociation process of vicenin-2 with TMPRSS2. Different states of the dissociation process of the nafamostat and vicenin-2 concerning the active site of TMPRSS2 (surface representation) along the free energy surface (depicted by the Roman letters I–IV). The contour lines depicts the binding free energy in kcal/mol.The binding free energy of vicenin-2 is − 16.3 kcal/mol, which is very close to that of nafamostat, indicating that vicenin-2 could be a promising lead molecule for the inhibition of TMPRSS2 activity. The different states of the dissociation process of vicenin-2 are also shown in Fig. 7D. State I is the average binding mode of vicenin-2 at the active site (d = 0.56 nm; Nc = 217). State II (d = 2.31 nm; Nc = 160) and state III (d = 3.3 nm; Nc = 110) represent the intermediate states during the ligand dissociation process and State IV represents the completely dissociated state (Fig. 7D). The 2D binding free energy surfaces of naringin, rhoifolin, and neohesperidin are shown in Supplementary Figure 4, and the different states of the ligand dissociation process are also mentioned, along with the collective variable values representing each state.Well-tempered metadynamics simulation suggests that vicenin-2 is a promising lead molecule that could be further evaluated for its inhibitory potency against SARS-CoV-2. The other leading phytocompounds, such as neohesperidin, naringin, and rhoifolin are also worth studying further to evaluate their antiviral efficacy. Our study provides in-silico evidence supporting the potential effectiveness of vicenin-2 targeting TMPRSS2. The proposed mechanism involves the inhibition of the TMPRSS2 activity of the receptor, which is critical for SARS-CoV-2 entry into host cells.

Computational screening combined with well-tempered metadynamics simulations identifies potential TMPRSS2 inhibitors

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis

Chemistry wordoku #062 | Puzzle

Multi-output prediction of dose–response curves enables drug repositioning and biomarker discovery

Hot Topics

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis

Related Articles

Balancing Act: Pregnancy and Bipolar Disorder

Cohesion at the cellular level: flexible yet stable

Gut bacteria influence responses to immunotherapy in patients with asbestos related cancer

Quick Links

Must Read

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis

Chemistry wordoku #062 | Puzzle

Popular Articles

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis