Computational analysis of pathogen-host interactome for fast and low-risk in-silico drug repurposing in emerging viral threats like Mpox

The graph-based DL computational model we have created for predicting potential PHIs has 3348 positive PHIs in the MPXV-Human PPIN. Potential candidates for treating Mpox may be found by analyzing the proteins implicated in these PHIs via a drug repurposing study followed by docking. The sources of input and the resulting outputs are always of great importance in any computational model, and this holds for our suggested model as well.Overview of the data setsHIPPIE databaseWe apply our method by employing a PPIN obtained from the HIPPIE database31. The network comprises 5497 proteins linked by 13,163 interactions. The PPIN is perceived as a graph, where proteins function as nodes and the connections between proteins constitute the edges. This database is used for both positive and negative data choices that are utilized as input for our built graph-based DL computational model. All datasets are available at: https://github.com/CMATERJU-BIOINFO/In-Silico-Drug-Repurposing-Methodology-To-Suggest-Therapies-For-Emerging-Threats-like-Mpox.MPXV-Human PPINThe MPXV-Human PPIN is generated by combining the Human–human interactions acquired from the HIPPIE database with the anticipated interaction network of MPXV-Human from32. The interaction network may be shown as a graph consisting of 7576 nodes and 17,647 edges. The collection has a total of 7576 nodes, with 10 nodes representing the MPXV protein32 and 7566 nodes representing human proteins. All datasets are available at: https://github.com/CMATERJU-BIOINFO/In-Silico-Drug-Repurposing-Methodology-To-Suggest-Therapies-For-Emerging-Threats-like-Mpox.Potential Mpox FDA drugsNADH23, Fostamatinib35, Glutamic acid36, Cannabidiol37, Copper38, and Zinc39 have been classified as Level I drugs according to the DrugBank database. In addition, there are further drugs classified as Level II and Level III (see Table S1 in the supplementary). In this study, the most superior drugs are then used for validation via literary evidence and molecular docking.Prediction of PPI using ensemble feature on HIPPIE datasetThis research work centers on a novel graph-based DL approach that leverages ensemble features to build a robust classifier for predicting PPIs. The ensemble feature comprises graphlet property (up to 5 nodes) and a range of amino acid composition-based features. The 13,163 interactions observed among the protein nodes in the PPIN of the HIPPIE database are regarded as positive data. The negative edges are discovered by choosing all potential pairs of proteins that do not have an interaction edge between them. By utilizing the negative sampling technique, an equivalent number of negative edges have been created to form the negative dataset for training the model. A few evaluation criteria considered in this work for assessing PPI classification models include precision, recall (also referred to as sensitivity), Matthew’s correlation coefficient (MCC) score, F1-score, Area Under the Precision-Recall Curve (AUPRC), and “Area Under the Curve” of the “Receiver Operating Characteristic” curve (AUC-ROC) score28. A thorough examination of three widely used feature selection techniques: (1) Fast Independent Component Analysis (ICA), (2) Principal Component Analysis (PCA), and (3) Variance Threshold with a threshold parameter of 0.13, is performed. The goal is to identify the most effective strategy in terms of feature selection performance. Since the Variance threshold feature selection technique with a threshold value of 0.13 and 269 features has shown better performance compared to other threshold values, the same number of features is picked for comparison in all other feature selection methods. Although all approaches have used the same set of 269 features, Variance Threshold with a value of 0.13 showed higher discriminatory ability compared to FastICA and PCA in terms of selecting effective features, as seen in Table 1.Table 1 Performance of our proposed model with FastICA, PCA, and variance threshold feature selection method on HIPPIE dataset31.To do a comparative analysis of our model, it is trained and evaluated using three distinct feature sets. Initially, just graphlet characteristics are used, and subsequently, only sequence-based features are employed. Ultimately, the ensemble mode incorporates both graphlet and sequence properties. Table 2a demonstrates the performance of the proposed model with different node attributes. As reflected, the performance of the proposed model using the ensemble feature is superior to the performances achieved by any individual type of feature, such as a graphlet or sequence-based feature. The primary objective of our experimental investigation is to compare and evaluate our graphSAGE26-based model with the current state-of-the-art GNNs24 variants, specifically GCN and GAT40,41. We have conducted a comparative analysis using ensemble features to evaluate their respective performances, ensuring an identical experimental setup to that of our proposed model. Figure 2 shows the analysis of different performance metrics for the three models GCN, GAT, and GraphSAGE26. The results indicate that our proposed model, which utilizes graphSAGE, has shown significantly better performance compared to other variants of GNN. This is evident from the results of the test set, where the precision, recall, F1-score, AUC-ROC Score, and MCC Score are 0.8647, 0.7622, 0.8103, 0.8215, and 0.6476, respectively.Table 2 (a) Performance scores of our proposed model with graphlet feature, sequence-based feature, and ensemble feature on HIPPIE dataset31; (b) performance scores of our proposed model with graphlet feature, sequence-based feature, and ensemble feature on MPXV-Human dataset32.Figure 2Performance metrics for three GNN variants on the HIPPIE dataset. Here, the colour purple represents the GCN model, yellow represents for GAT model and green represents the proposed GraphSAGE model. The comparative illustration clearly shows that the proposed model has outperformed other variants across all evaluation metrics, including precision (PR), recall (RE), F1 Score (F1), AUC-ROC score, and MCC score. This highlights the effectiveness of the proposed approach.Prediction of PHIs using ensemble feature on MPXV-Human PPINA comprehensive experiment is implemented utilizing MPXV-Human PPIN data to evaluate the efficacy of several feature extraction techniques, such as graphlet-based features, sequence-based features, and ensemble approaches. The findings from Table 2b demonstrate a significant advantage of the ensemble technique compared to both graphlet-based and sequence-based features in terms of all assessment measures. This comprehensive performance advantage underscores the effectiveness of the ensemble approach in capturing and leveraging essential patterns within the Mpox data. It can be seen from Fig. 3a that the model has performed well in correctly identifying positive instances (True Positive (TP): 3348) and negative instances (True Negative (TN): 3094). However, it has made some minor errors, incorrectly classifying some instances as positive when they are negative (False Positive (FP): 436) and vice versa (False Negative (FN): 182). Moreover, AUC-ROC comparison plots are plotted for our proposed model using three distinct features: ensemble feature, sequence-based feature, and graphlet feature along with their scores, which have been reported in Fig. 3b. In the figure, the colour blue represents the ensemble feature, the colour green represents the sequence-based feature, and the colour red represents the graphlet feature. The results revealed that the ensemble feature outperformed the other features, achieving the highest Area Under the Receiver Operating Characteristic (AUC-ROC) score of 0.91. In contrast, the sequence-based feature exhibited moderate performance with an AUC-ROC score of 0.82, while the graphlet feature demonstrated lower performance with a score of 0.53. The predicted positive MPXV-Human PPIN interactions from the ensemble approach have a total of 2069 edges, with 8 nodes representing the MPXV protein and 2069 nodes representing target human proteins. All datasets are available at: https://github.com/CMATERJU-BIOINFO/In-Silico-Drug-Repurposing-Methodology-To-Suggest-Therapies-For-Emerging-Threats-like-Mpox.Figure 3(a) Confusion matrix for the proposed model. (b) AUC-ROC plot of the proposed model with ensemble feature, sequence-based feature, and graphlet feature with the AUC-ROC score.Drug repurposing study on MPXV-Human PPINThe drug components identified through the ProcessDrugData algorithm (see Algorithm S1 in the supplementary document) have been listed in Table S1 in the supplementary document along with their DrugBank Accession Number, Groups in DrugBank, and computed DCS. The drugs are ranked in descending order based on their DCS, with NADH being the most computationally efficient treatment for MPXV, with a score of 35. NADH is used as a nutrient in some supplement products. We have categorized the repurposed drugs into three domains, namely Level I, Level II, and Level III, based on their DCS. The medications that get the best scores in computational efficiency for treating Mpox have been classified as Level I. Subsequently, repurposed drugs, which exhibit lower efficacy (worse scores) compared to Level I treatments, are categorized as Level II. Ultimately, drugs that have attained worse scores (in comparison to Level II drugs) in computational efficiency for treating Mpox have been categorized under Level III. The pharmacological results have been supported by many pieces of evidence documented in the subsequent portion of the available literature in Sects. S1, S2, and S3 in the supplementary document. The drugs that have been found are compared with the results produced by SAveRUNNER42. SAveRUNNER is an R-based program utilised for predicting medication disease associations. There is a substantial overlap of 1, 8, and 56 medicines that are projected to be repurposed for MPXV at Level I, Level II, and Level III. Please refer to Fig. S1 in the supplementary paper for further details. The comprehensive drug names and their overlap can be accessed online here. SAveRUNNER is a tool executed only on network/topological-based approach whereas this work focuses not only on the network but also the sequence and ensemble methodology of both network and sequence as well.Computational docking of potential drugs for MPXV protein structuresA detailed analysis of molecular docking between NADH having the highest DCS (35) and MPXV methyltransferase VP39 in complex with inhibitor TO427 (PDB ID: 8CEQ)43 has been done using AutoDock Vina34 version 1.5.6. MPXV protein crystal structure is obtained from Protein Data Bank (PDB) in .pdb format. Figure 4a presents the crystal structures of 8CEQ. The molecular structure of NADH obtained from DrugBank is shown in Fig. 4b. After successful docking, nine different modes of drug-protein interactions are produced along with specific docking scores, which represent the binding energy. The binding mode with the lowest binding energy is considered the optimal binding mode, as it signifies the most stable interaction for the ligand. The best affinity score obtained is − 9.7 (kcal/mol). Root Mean Square Deviation (RMSD) values are computed relative to the best mode and exclusively involve movable heavy atoms. Two forms of RMSD metrics are offered: rmsd/lb (RMSD lower bound) and rmsd/ub (RMSD upper bound), which differ in the method of matching atoms during distance calculation34. The summarized results of the binding energy observations of the nine modes along with their distances from the best orientation can be found in Table S2 in the supplementary document. The identification of amino acids in the protein’s active site has been conducted using Biovia Discovery Studio 4.544. The optimization of the MPXV protein has been performed by the removal of water and other atoms, followed by the addition of a polar hydrogen group. The interaction of NADH with the MPXV protein shows a high-affinity interaction in Fig. 4c as the ligand fits inside the core pocket region of the protein. This is further supported by the hydrogen bonding observed between the oxygen atom of NADH with TYR 189 and SER 141. The details of the various interactions along with the active sites such as ASP 187, SER 141, LEU 154, LEU 221, SER 165, HIS 98, ARG 97, ALA 158, TYR 189, GLY 96, PHE 188 have been shown in Fig. 4d. Thus, the docking results indicate a favorable binding interaction between NADH and the MPXV protein. This further highlights that this widely available natural compound holds promise as a potential anti-Mpox medication, warranting further investigation, which can benefit both patients and public health initiatives. The virtual interaction of 8CEQ with the other Level I repurposed drugs i.e. Fostamatinib, Glutamic acid, and Cannabidiol has been measured using the AutoDock Vina suite. A similar docking study as NADH has been performed on these remaining top three levels I drugs. The binding energy results obtained from the docking of 8CEQ with these ligands are shown in Table S3 in the supplementary document. It is observed from the result that while NADH registers the highest DCS of 35 in the drug repurposing study, it still attains the highest position in comparison to the other repurposed drugs in the molecular docking study with an Affinity score of − 9.7 kcal/mol. The docking scores of Fostamatinib, Glutamic acid, and Cannabidiol are found to be − 8.1, − 4.9, and − 8.7 kcal/mol respectively. The docking result suggests that these drug molecules have a greater capability to inhibit MPXV since they have demonstrated high-affinity interactions with 8CEQ. Consequently, this creates opportunities for pharmaceutical businesses to capitalize on existing drug libraries thereby speeding up the introduction of groundbreaking Mpox treatments.Figure 4(a) Crystal structure of MPXV methyltransferase VP39 in complex with inhibitor TO427 retrieved from PDB. (b) Molecular structure of NADH retrieved from DrugBank. (c) NADH docked in the MPXV methyltransferase VP39 in complex with inhibitor TO427 (PDB ID: 8CEQ): best binding mode in the protein pocket. (d) 8CEQ: the binding interaction of NADH with amino acids with various bonds (along with bond length).Molecular dynamic simulation of three potential identified drugs for MPXV protein structuresThe study employs GROMACS45 for conducting molecular dynamics (MD) simulations46. The process consists of seven consecutive steps: (1) Create network topology files for both the protein and ligand molecules. (2) Specify the dimensions of the simulation box and add solvent molecules to it. (3) Introduce ions into the system. (4) Perform energy minimization to optimise the molecular structure. (5) Conduct equilibration to stabilise the system. (6) Run production molecular dynamics simulations. (7) Analyse the obtained data. In chemistry, the phrase “boxing” refers to the act of enclosing a molecule or collection of molecules within a specific region of space. Solvate, in contrast, denotes the process of encircling a solute molecule with solvent molecules, resulting in the formation of a solvation shell. The simulation is conducted on 8CEQ, as well as three other repurposed drugs: Fostamatinib, Cannabidiol, and Glutamic acid. At first, distinct network topology files are created for the 8CEQ and best-posed ligand structure of each of the three drugs, which are found through docking using AutoDock Vina. Next, the ligands are individually coupled with 8CEQ to create three merged structures of MPXV-ligand bound form. Then, “box” and “solvate” are defined for the form in which MPXV is linked to a ligand. This is followed by the addition of ions. During the energy minimization step, the energy of the MPXV-ligand bound state is reduced, and this is followed by minimising the temperature and pressure during the equilibration phase. Subsequently, a molecular dynamics simulation is conducted and the obtained data is evaluated. All three medications exhibit significant stability in terms of RMSD. The findings, together with other graphical displays, of this whole MD simulation is shown in the supplementary document S1.

Computational analysis of pathogen-host interactome for fast and low-risk in-silico drug repurposing in emerging viral threats like Mpox

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis

Chemistry wordoku #062 | Puzzle

Multi-output prediction of dose–response curves enables drug repositioning and biomarker discovery

Hot Topics

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis

Related Articles

Balancing Act: Pregnancy and Bipolar Disorder

Cohesion at the cellular level: flexible yet stable

Gut bacteria influence responses to immunotherapy in patients with asbestos related cancer

Quick Links

Must Read

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis

Chemistry wordoku #062 | Puzzle

Popular Articles

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis