ResisenseNet hybrid neural network model for predicting drug sensitivity and repurposing in breast Cancer

As outlined in the methodology section, data retrieval, preprocessing, and model training were successfully completed. We employed a hybrid neural network consisting of two modules one to learn features from Amino acid sequences of Transcription factors and target proteins using a complex of 1D-CNN and LSTM, and the other module to learn the intricate features from transcription factors, genomic markers, and molecular descriptors using a DNN. The Hyperopt finetuning based hyperparameters were employed for the algorithms and were tabulated in Table 1. Please refer to Fig. 1 for model architecture.
Table 1 Hyperopt based optimized hyperparameters used to build the ResisenseNet model.
Fig. 1Architecture of the ResisenseNet model. This model integrates a 1D-CNN with an LSTM (Long Short-Term Memory) module to capture long-range and temporal patterns from amino acid sequences, alongside a DNN (Deep Neural Network) module designed to analyse patterns from numerical datasets. The learned feature representation vectors from both modules are concatenated to form a final output layer capable of making binary classification predictions.The ablation studies conducted were represented in Fig. 2A, which delineates the performance of the model employing solely a 1D-CNN, which yielded loss and accuracy values of 0.088 and 0.792, representing the lowest metrics among the conducted experiments. Conversely, the integration of a 1D-CNN and an LSTM module, as illustrated in Fig. 2B, exhibited improved performance metrics, achieving loss and accuracy values of 0.059 and 0.815. To further augment the model’s predictive capabilities, a DNN architecture was concatenated with the 1D-CNN, resulting in substantial enhancements, as depicted in Fig. 2C, with loss and accuracy values of 0.048 and 0.942. Finally, an additional experiment combined the output vector representation from the DNN with both the LSTM and 1D-CNN modules, attaining the pinnacle of validation metrics among all studies, with loss and accuracy values of 0.042 and 0.979, as represented in Fig. 2D (Fig. 3).The comprehensive validation metrics profiles for all experiments conducted during the ablation studies are presented in Supplementary Table S5. Supplementary Fig. S2 and S4 depict the trends in loss and accuracy throughout the epochs of training the ResisenseNet model. Figure 4 displays the confusion matrix, which illustrates the predictive performance of the ResisenseNet model on both the test and validation datasets. The results of the 10-fold cross-validation are detailed in Supplementary Fig. S3and Supplementary Table S10.
Fig. 2Bar plots illustrating the results of the ResisenseNet model’s ablation studies. Panel A depicts the model utilizing only the 1D-CNN module; Panel B shows the combination of the 1D-CNN and LSTM modules; Panel C presents the integration of the 1D-CNN and DNN modules, designed to capture patterns from both numerical and text-based data; and Panel D represents the final ResisenseNet model, incorporating all three modules: 1D-CNN, LSTM, and DNN.Thorough ablation studies have been meticulously conducted to finalize the model complexity. The predictive prowess was rigorously evaluated using test and validation sets, which were judiciously partitioned employing an initial test-train split. To ensure reproducibility, 12 random seed experiments were executed for both the test and validation sets. The model demonstrated commendable performance on the test set, yielding a loss value of 0.042 and an accuracy of 0.9794. It achieved a recall value of 0.968, an F1-score of 0.977, a sensitivity (True Positive Rate) of 0.9725, a precision (Positive Prediction Value) of 0.9795, a Matthews Correlation Coefficient of 0.9721, and an AUC-ROC Score of 0.9864, as derived from the confusion matrix, underscoring the model’s exceptional fit. To further assess its predictive performance and generalizability, an internal validation set was employed, where the model exhibited a validation set accuracy of 0.9575 and a validation loss of 0.069. Additionally, it demonstrated a recall value of 0.948, an F1-score of 0.956, a sensitivity value of 0.9677, a precision value of 0.9219, a Matthews Correlation Coefficient of 0.9487, and an AUC-ROC Score of 0.9667. Please refer to the Fig. 3 and Supplementary Table S7. These findings highlight the promising generalizability capabilities of the ResisenseNet framework.
Fig. 3Graphical representation of the validation metrics used to evaluate the developed ResisenseNet model on both the test set and validation set. The numerical values displayed above the bars in the plot indicate the standard deviation, which serves as error bars for the various random seeds tested.
Fig. 4Heatmap depicting the confusion matrix for the test and validation sets. In the matrix, the “True Label” represents the actual nature of the drugs, while the “Predicted Label” reflects the predicted outcome. This allows for the measurement of true positives, true negatives, false positives, and false negatives, which are the key components for calculating the validation metrics used to evaluate the model’s performance. The value 0 represents sensitive data points, and 1 represents resistant data points.The baseline logistic classifier, trained on the same dataset as the ResisenseNet model, was evaluated for its predictive performance, yielding validation metrics with a loss value of 0.625 and an accuracy of 0.642. Additional validation metrics are detailed in Supplementary Fig. S5 and S6, which also includes the hyperparameters employed for this model. The comparison between the baseline model and the developed ResisenseNet model aimed to assess the complexity and potential of the latter in capturing intricate features from proteins, transcription factor (TF) expression, genomic markers, and drug molecules. This evaluation serves as a critical benchmark for understanding the advancements achieved through the more sophisticated model.To evaluate the performance of the developed model in out-of-distribution scenarios, we utilized test sets from different cancer types, specifically colorectal adenocarcinoma (COREAD) and lung adenocarcinoma (LUAD), which differ from the breast cancer data used for training in terms of transcription factor expression profiles, utilized drugs, molecular descriptors, genomic mutations, and various targets. This variation allows for a robust assessment of the model’s generalizability potential. The validation metrics for the LUAD group yielded a loss value of 0.062 and an accuracy of 0.9014, while the COREAD group demonstrated a loss of 0.058 and an accuracy of 0.924. Additional validation metrics are illustrated in Supplementary Table S8 and Fig. 5. These results underscore the ResisenseNet model’s generalizability and its stability in reproducing outcomes across diverse datasets.
Fig. 5Bar plot illustrating the validation metrics across different cancer groups (COREAD – colorectal adenocarcinoma and LUAD – lung adenocarcinoma) to assess the ResisenseNet model’s performance in out-of-distribution scenarios, highlighting its generalizability and resilience to variations. The numerical values displayed above the bars in the plot indicate the standard deviation, which serves as error bars for the various random seeds tested.A character level-CNN + DNN model was trained on the same dataset used to train the ResisenseNet model. Both models exhibited comparable results, with mean validation metrics from 12 random seeds and their standard deviations presented in Fig. 6 and Supplementary table S9. The SOTA model achieved loss and accuracy values of 0.0356 and 0.988, respectively, outperforming the ResisenseNet model, which recorded loss and accuracy values of 0.042 and 0.9794. Other validation metrics were similarly close, indicating no significant differences. A Wilcoxon signed-rank test was performed to assess statistical significance, revealing no notable differences in performance across random seeds. However, the SOTA model exhibited significantly larger standard deviations across all validation metrics, indicating instability in its predictions and reproducibility. In contrast, the ResisenseNet model demonstrated superior stability in prediction performance across various random seeds. This comparison provides a holistic view of the models’ predictive capabilities.
Fig. 6Validation metrics of the state-of-the-art (SOTA) model used for comparison with the developed ResisenseNet model, highlighting performance differences and effectiveness across various evaluation criteria. The numerical values displayed above the bars in the plot indicate the standard deviation, which serves as error bars for the various random seeds tested.The developed ResisenseNet model was used to screen the drugs which were currently being administered for various other cancers (14 different cancers, please refer to Supplementary Table S4) to identify the potential drugs which were sensitive or resistant in the presence of the interaction coefficients among genomic markers, transcription factors, targets and drugs. Please refer to Supplementary Tables S1, S2, and S3. Among the 14 distinct malignancies examined, the drugs targeting Low-grade Glioma (LGG) and Lung Adenocarcinoma (LUAD) displayed heightened sensitivity toward breast cancer upon assessment through the developed model. Conversely, drugs specific to Colorectal Adenocarcinoma (COREAD) demonstrated a greater resistance against breast cancer. Following COREAD, Bladder Urothelial Carcinoma (BLCA), Lung Squamous Cell Carcinoma (LUSC), and Low-grade Glioma (LGG) exhibited similar trends of resistance against breast cancer. Please refer to Fig. 7.
Fig. 7Horizontal bar plot Representing the number of drugs from each among the 14 malignancies depicting resistance and sensitivity towards breast cancer based on the ResisenseNet model predictions.In this study, we conducted a SHAP (SHapley Additive exPlanations) analysis to elucidate the contributions of various molecular descriptors to our predictive model. The SHAP values reveal the significance of each feature in influencing the model’s output, enhancing our understanding of how different molecular characteristics affect drug behavior. The horizontal bar plot (Supplementary Fig. S7) illustrates the mean absolute SHAP values for the most impactful descriptors, with those related to molecular size, shape, and electrostatic properties showing the highest influence, each with a SHAP value of 3. These findings indicate that variations in these features significantly impact the predicted activity or efficacy of the drugs. Additionally, descriptors such as aromaticity and hydrophobicity also contributed notably, with SHAP values of 2.5 and 2, respectively, while descriptors like atomic connectivity and bond types, though less influential with SHAP values ranging from 1.5 to 1, still play a meaningful role in the model.A comprehensive analysis of genomic markers was conducted to assess the efficacy of drugs in treating various malignancies, particularly focusing on breast cancer. This involved screening different drugs tailored to specific types of cancer. Each drug exhibited unique genomic markers that could account for its effectiveness or resistance across different cancer types. For instance, a drug might demonstrate sensitivity in breast cancer while being resistant to other cancers, or vice versa. Tables 2 and 4 detail the specific malignancies, corresponding drugs, and their respective targets, along with genomic markers indicating sensitivity or resistance in breast cancer. Meanwhile, Supplementary Fig. S6 provides a visual representation in the form of a heatmap plot, encompassing all considered malignancies. This plot highlights the varying responses of drugs across different cancer types, depicting both sensitivity and resistance patterns. Table 2 presents a comprehensive overview of sensitive drugs along with their corresponding mutations and targets across various cancer types. The identified sensitive drugs exhibit a diverse range of mechanisms targeting specific genomic alterations, thereby highlighting potential avenues for precision medicine in breast cancer treatment. For instance, ABT-263, identified as effective in COREAD, targets the BCL2 family proteins influenced by gain MET mutations, suggesting its potential to induce apoptosis in cancer cells harbouring these alterations. Similarly, Nutlin-3a, effective in GBM and SKCM, interacts with the MDM2 protein, offering a promising strategy for reactivating the p53 pathway in tumours with TP53 mutations. Furthermore, the identification of Dabrafenib as efficacious in THCA and BLCA, targeting BRAF mutations, underscores its relevance in MAPK pathway-driven cancers, highlighting the importance of personalized treatment strategies based on mutation profiles.
Table 2 Table illustrates the sensitive anticancer drugs identified for breast cancer following screening by the ResisenseNet model, alongside their associated malignancy types, genomic markers, and targeted mechanisms.Drugs identified as sensitive to breast adenocarcinoma were further assessed for their historical involvement in anticancer research related to breast cancer, their FDA approval status, and whether they are novel compounds lacking prior research. To accomplish this, an extensive literature search was conducted using PubMed, DrugBank, and the FDA database, with the findings summarized in Table 3, which categorizes the drugs into three distinct sets: Set A, which comprises established and annotated FDA-approved drugs currently utilized for breast cancer treatment; Set B, which includes drugs identified through ResisenseNet predictions that have previously been investigated as potential anticancer agents against breast cancer, as well as those that enhance the efficacy of other drugs when administered in combination and drugs currently in clinical trials that have not yet received complete FDA annotation as breast cancer-specific treatments, with references provided for each drug in Set B for cross-verification; and Set C, which contains drugs that are neither listed in DrugBank nor associated with any prior research on breast cancer, categorizing them as novel compounds identified through the repurposing studies conducted by the ResisenseNet model.
Table 3 Tabular representation of the history of sensitive drugs identified as anticancer agents for breast adenocarcinoma. Set A includes drugs that have already been recognized and annotated as anticancer agents for BRCA; Set B consists of drugs with prior research conducted on their efficacy as anticancer agents against BRCA; set C encompasses drugs that lack any previous research history and are not specifically annotated for breast cancer, identified as novel candidates through the predictions of the ResisenseNet model.In contrast, Table 4 delineates resistant drugs against specific cancer types, shedding light on mutation patterns associated with drug ineffectiveness. The resistant drugs identified in this analysis exhibit distinct mutation profiles that confer resistance to their respective targets, posing challenges to therapeutic efficacy. For instance, Dabrafenib, deemed resistant in COREAD due to BRAF mutations, reflects the limitations of targeting the MAPK pathway in tumours with activated BRAF mutations. Similarly, GSK690693, ineffective in COREAD, showcases the impact of PTEN mutations on AKT signalling, leading to reduced sensitivity to AKT inhibitors. These findings underscore the intricate interplay between genomic alterations and drug response, emphasizing the need for tailored treatment approaches to overcome resistance mechanisms in breast cancer and other malignancies.
Table 4 Table illustrates the resistant anticancer drugs identified for breast cancer following screening by the ResisenseNet model, alongside their associated malignancy types, genomic markers, and targeted mechanisms.By leveraging this approach, we can efficiently pinpoint drugs that exhibit sensitivity or resistance in treating breast cancer, paving the way for drug repurposing endeavours. This is made possible through a model trained on extensive datasets containing activity profiles of transcription factors and genomic markers in patients, along with the drugs administered to them. By comprehensively understanding the interaction coefficients among these variables (Please refer to Supplementary Table S11), the ResisenseNet model is adept at predicting potential sensitive and resistant drugs for breast cancer.

Hot Topics

Related Articles