Accurate prediction of drug-target interactions in Chinese and western medicine by the CWI-DTI model

Computational prediction of drug-target interactions (DTIs) plays a crucial role in drug discovery and development. Existing models primarily focused on Western medicines, leveraging single-component and explicit relationships. However, Chinese medicines pose unique challenges due to their numerous and complex components, along with limited known relationships between components and targets. Consequently, there is a pressing need for efficient prediction models capable of extracting the potential features of drug components and accommodating diverse data distributions. To construct a computational model suitable for DTI prediction in homogeneous datasets, we collected and preprocessed data on the components and targets of both Chinese and Western medicines. Homogeneous drugs often exhibit similarities in their target proteins, making the measurement of drug-drug and target-target similarities crucial for computational prediction of DTIs. In this study, we aimed to extract multidimensional and complementary information for drugs or targets by utilizing multiple similarity measures (Table S1), including the kernel function of chemical structure fingerprints and protein sequences. We propose the CWI-DTI model, which leverages a stack hybrid autoencoder to automatically fuse multiple similarities and learn advanced features for predicting drug-target interactions. Our model incorporates a fusion mechanism consisting of three blocks (Fig. 1) to capture the combined representation of the multiple similarity measures. To construct a computational model suitable for DTI prediction in homogeneous datasets, we initially collected and preprocessed data on the components and targets of both Chinese and Western medicines. Subsequently, we employed a deep autoencoder to fuse multiple drug (target) similarity matrices. By hybridizing denoised, sparse, and stacked blocks within our model, we effectively extract low-dimensional features, thereby enhancing the accuracy and generalization of DTI prediction. Finally, we employ fully connected layers to calculate binding scores, serving as indicators of the likelihood of interaction between drugs and targets.Dataset collection and preparationTo evaluate the performance of the CWI-DTI method in predicting drug-target interactions (DTIs), we conducted evaluations on ten datasets. Each dataset consisted of three types of information: (1) drug-target interaction data, (2) multiple similarity data of drugs, and (3) multiple similarity data of targets. Table 1 provides an overview of the statistics for these ten datasets. It is important to note that the ratio of known (positive) to nonexistent (unknown, negative) DTIs varied across the datasets, reflecting the reality that the number of true DTIs is considerably smaller than the number of non-interacting drug targets. The datasets were divided into three main parts. The first part comprised the Western medicine dataset, which included DRUGBANK28, TTD29, and CHEMBL30. The second part consisted of the Traditional Chinese Medicine (TCM) dataset, which encompassed HERB31, TCMIO32, HIT33, and NPASS34. Lastly, the third part included the summary of all datasets, namely TCM_ALL, WEST_ALL, and TW_ALL. The data for these datasets were obtained from the respective database websites or collected through web crawlers and manual sorting. For instance, the HERB database, accessible at http://herb.ac.cn/, required manual collation of herbal-ingredient-target association data along with associated structural information, such as chemical structure details, for 12,933 targets, 7,263 Chinese herbs, and 49,258 ingredients. Table 1 provides an overview of the Chinese and Western drug-target interactions within the ten datasets. Notably, the data in these datasets are unbalanced, with a higher number of negative interactions compared to positive interactions. This imbalance can negatively impact the predictive performance of classifiers. To address this issue, we applied the Synthetic Minority Oversampling Technique (SMOTE) to the unbalanced datasets. SMOTE generates synthetic positive samples to balance the minority category and improve the predictive efficiency of the classifier.Table 1 Summary of samples on the ten data sets.Problem descriptionIn this study, we address the problem of drug-target interactions (DTIs), which involves a set of drugs D={di, i = 1… nd} and a set of targets T={tj, j = 1… ., nt}. Here, nd represents the number of drugs, and nt represents the number of targets. We represent the interaction between D and T as a binary matrix P with elements that can take the values 0 or 1, denoted as \(\:P\in\:{\mathbb{R}}^{{n}_{d}\times\:{n}_{t}}\). The matrix P consists of nd drugs as rows and nt targets as columns, where Pij=1 indicates an interaction between drug di and target tj, while Pij=0 indicates no interaction. Additionally, we define the similarity matrix set between drugs in D as SD, represented as \(\:{S}_{D}\in\:{\mathbb{R}}^{{n}_{d}\times\:{n}_{d}}\). Similarly, we define the set of similarity matrices between targets in T as ST, denoted as \(\:{S}_{T}\in\:{\mathbb{R}}^{{n}_{t}\times\:{n}_{t}}\). The values within the similarity matrices reflect the degree of similarity between drugs or targets based on various measures. All elements within the matrices fall within the range of [0,1]. The objective of our study is to uncover the underlying factors associated with drug-target pairs [di,tj] and predict new interactions in P (i.e., unknown interactions) based on the similarity matrices of drugs in SD and targets in ST.Preprocessing of multiple similarity measuresTo account for the distinctive characteristics of Western medicine (single relationship, limited targets) and traditional Chinese medicine (diverse relationships, multiple targets), we employ molecular SMILES structures and protein amino acid sequences for preprocessing. By assessing their relative similarity across multiple dimensions, we derive nuclear structural representations for the molecular fingerprints and targets of Chinese and Western medicines. These representations incorporate local and global features, including atom characteristics and global topological information. Subsequently, an efficient association strategy is developed based on these representations. We obtain drug target information from Chinese and Western drug databases through manual collation or crawling. The data undergoes quality control, correcting missing or non-standard molecules or sequences by searching additional databases. The similarity of drug and target structures is calculated by hashing molecular paths and forming molecular fingerprints based on atomic type, aromaticity, and bond type. The Tanimoto coefficient is then computed from the molecular fingerprints to determine the similarity between drugs:$$\:{T}_{{d}_{i}{d}_{j}}=\frac{c}{a+b-c}$$Where, \(\:{T}_{{d}_{i}{d}_{j}}\)represents the T fraction of drug i and drug j, a and b are the sum of the number of binary bits located at any position in the two fingerprints respectively, and c is the sum of the number of these binary bits are all 1.The similarity matrix of eight structural fingerprints, including RDK, MACCS, EC4, FC4, EC6, FC6, TOPTOR, and AP, is calculated using the Fingerprint function of the Rdkit package in Python. For targets, similarity measures are extracted by comparing differences between amino acid sequences and mapping them to a high-dimensional vector space. Mismatch and Spectrum kernels are used to compute the similarity between sequences based on k-mers and occurrence frequencies.The global structure of drugs and targets is processed using Restart Random Walk (RWR) and Positive Pointwise Mutual Information (PPMI)35 to calculate the topological similarity of drugs in each similarity network. This yields the global structural information of the similarity network, which serves as input for the model.Feature learning with SDSAEThe feature learning process for drugs and targets in the CWI-DTI model involves Stacked Denoising Sparse Autoencoders (SDSAE). The denoise block introduces Gaussian noise to the input data, ensuring robust feature learning. The model aims to minimize reconstruction error, learning a mapping function from noisy data to original data. Sparsity constraints are incorporated through sparse blocks, preventing overfitting and generating more explanatory and generalizable sparse representations (detailed descriptions in the supplementary materials).The model’s architecture includes stack blocks, combining multiple SDSAEs to form a multi-layer deep neural network for abstract feature extraction. Combining stack denoising sparse autoencoders with CNN, the model becomes an end-to-end deep learning framework for Drug-Target Interaction (DTI) prediction. CNN is utilized for local feature extraction, down-sampling, feature selection, and classification.The CWI-DTI model is developed by integrating noise reduction, sparse, and stack blocks. The training process involves denoising noisy input, applying sparse blocks for encoding and reconstruction, and stacking blocks for subsequent layers. The loss function of the CWI-DTI model incorporates logistic regression loss, weight attenuation, and L1 norm to enforce sparsity in features, contributing to effective DTI prediction.Improved convolutional neural network for DTIs predictionTo predict drug-target interactions, we constructed an end-to-end deep learning model by combining the stack denoising sparse autoencoder with a Convolutional Neural Network (CNN). We treated the feature matrices of drugs and targets as two-dimensional images or convolution kernels for extract feature extraction and classification predictions. By combining the stacked denoising sparse autoencoder with the CNN model, we enhance the local feature extraction ability of CNN, thereby improving the classification performance and generalization ability of the model.CWI-DTI model trainingCWI-DTI utilizes a model combined with a stacked denoising sparse autoencoder and an improved CNN, which is used to predict DTI. The training process of the model can be summarized in the following steps:

1.

Pre-training of the stack denoising sparse autoencoder: The feature matrices of drugs and targets served as input data. Pre-training was conducted to obtain the parameters of the encoder and decoder for the stack denoising sparse autoencoder.

2.

Training of CNN: The encoder obtained from pre-training was utilized as the input layer of the CNN, along with several convolutional layers, pooling layers, and fully connected layers. The feature matrices of drugs and targets were used as input data. Training of the CNN was performed using the cross-entropy loss function and the backpropagation algorithm.

3.

Fine-tuning of the overall model: The parameters of the encoder and CNN obtained from pre-training were fine-tuned. The cross-entropy loss function and the backpropagation algorithm were applied to fine-tune the overall model, further enhancing its prediction performance.

The cross-entropy loss function for the overall model, combining the stack denoising sparse autoencoder and improved CNN, is defined as follows:$$\:\mathcal{L}\left(x,y\right)=-\sum\:_{k=1}^{K}{y}_{k}\text{l}\text{o}\text{g}\widehat{y}\text{*}k+{\lambda\:}_{1}\sum\:\text{*}{l=1}^{L}{\Vert{W}_{l}\Vert}_{2}^{2}+{\lambda\:}_{2}\sum\:\text{*}{l=1}^{L}\sum\:_{j=1}^{{n}_{l}}KL\left({\rho\:}_{l}\Vert{\widehat{\rho\:}}_{lj}\right)$$Here, y represents the true label, \(\:\widehat{y}\) represents the prediction result of the model on sample x, and K represents the number of categories. The second term is the L2 regularization term, which is used to prevent the model from overfitting. Here, \(\:{W}_{l}\) denotes the weight matrix of the \(\:l\)th layer, and \(\:{\Vert{W}_{l}\Vert}_{2}^{2}\) denotes the square of its two-norm. The \(\:{\lambda\:}_{1}\) is the regularization coefficient used to balance the importance of the regularization term and the cross-entropy loss term. The third term is a sparse regularization term, which is used to constrain the activation value \(\:{\rho\:}_{l}\:\)of the encoding layer to be close to a small constant \(\:{\widehat{\rho\:}}_{lj}\). Here, \(\:l\) denotes layer \(\:l\:\)and \(\:{n}_{l}\) denotes the number of neurons in layer \(\:l\). \(\:{\rho\:}_{l}\) is the average activation value of all neurons in layer \(\:l\), and \(\:{\widehat{\rho\:}}_{lj}\). is a preset sparsity target, which generally takes a small value, for example 0.05. The \(\:{\lambda\:}_{2}\) is the sparse regularization coefficient, which is used to balance the importance of the sparse regularization term and the cross-entropy loss term.For the entire dataset, the cross-entropy loss function is computed as the average of the loss functions across all samples:$$\:\mathcal{L}=\frac{1}{N}\sum\:_{i=1}^{N}\mathcal{L}\left({x}_{i},{y}_{i}\right)$$Here, N represents the number of samples in the dataset.During model training, the momentum-based Stochastic Gradient Descent (SGD) optimization algorithm was employed to minimize the cross-entropy loss function of the overall model, thereby improving classification accuracy and generalization ability. The training process involved multiple rounds of iteration and parameter tuning to achieve the best prediction performance.Hyperparameter comparisonIn our evaluation, we focused on six key hyperparameters: batch size (BS), noise factor (NF), hidden layer dimension of the autoencoder (HLD), sparsity distribution (SD), learning rate (LRA), and training epochs (TE). The performance of the CWI-DTI model showed no significant changes with varying hyperparameters (Figure S1). Ultimately, we selected a batch size of 1024, a noise factor of 0.8, a hidden layer dimension of 300, a sparsity distribution of 0.04, a learning rate of 1e-4, and 100 training epochs, based on their balanced performance and stability during training.Experimental setup and model evaluationIn this study, we evaluated the performance of the CWI-DTI model using Receiver Operating Characteristic (ROC) curve area under the curve (AUC) and precision-recall curve area under the curve (AUPR) as evaluation metrics. We employed a 10-fold cross-validation (CV) approach with 5 repetitions to assess the performance of the DTI prediction method. Both AUC and AUPR scores were calculated for each repetition of the CV. The final AUC and AUPR scores were obtained by calculating the mean across the 5 repetitions.The drug-target interaction matrix Y consists of nd rows for drugs and nt columns for targets. We performed CV under three different settings (Table S2):

CVS1: Testing with random entries (i.e., drug-target pairs) in Y.

CVS2: Blind testing with CV of random rows of drugs (i.e., drugs) in Y.

CVS3: Blind testing with CV of random columns of targets (i.e., targets) in Y.

Under CVS1, we applied 5-repeats of stratified 10-fold cross-validation, where each round used 90% of the elements in Y as training data and the remaining 10% as test data. Similarly, under CVS2, we used 90% of the rows in Y as training data and the remaining 10% as test data. For CVS3, we utilized 90% of the columns in Y as training data and the remaining 10% as test data. These settings, CVS1, CVS2, and CVS3, respectively refer to the prediction of DTIs for (1) new (unknown) pairs, (2) new drugs, and (3) new targets.To determine the optimal configuration of blocks (number of layers and number of neurons per layer) in the CWI-DTI model for the Chinese and Western medicine datasets, we performed five repeated 10-fold cross-validation under CVS1. This allowed us to evaluate the model’s performance with different layer configurations. This approach ensured that the category ratio in each fold remained consistent with the overall dataset.To compare the performance of the CWI-DTI model with other datasets, including GADTI, AutoDTI++, MDADTI, NeoDTI, DDR, and DNILMF, we conducted 10-fold cross-validation over five replicates under three different settings. This enabled us to focus on the differences between CWI-DTI and the aforementioned datasets.

Hot Topics

Related Articles