Accelerating Drug Discovery with Machine Learning for Targeted Protein Degraders

Introduction
Targeted protein degraders (TPDs) have emerged as a promising new modality in drug discovery. These molecules are able to selectively degrade disease-causing proteins and have the potential to tackle diseases that were previously considered “undruggable”. However, the application of machine learning (ML) models for TPDs’ property predictions has been limited and questioned. In our recent paper titled “Application of machine learning models for property prediction to targeted protein degraders”, we demonstrate the suitability of ML-based quantitative structure-property relationship (QSPR) models for predicting various properties of TPD molecules. Our work sheds light on the potential of ML in accelerating drug discovery for TPDs.
 
Comprehensive Evaluation of ML for TPDs
Our study evaluated ML models for the prediction of ADME (absorption, distribution, metabolism, and excretion) and physicochemical properties of TPDs, including both molecular glue and heterobifunctional submodalities. We developed and tested ML models using existing experimental data, accurately predicting relevant properties of TPDs, such as passive permeability, metabolic clearance, or lipophilicity. Surprisingly, the performance of these ML models on TPDs was comparable to that of other modalities, with the molecular glue submodality showing the lowest prediction errors.

ML models’ performance on TPDs and other modalities. Model results are shown for fifteen ADME and physicochemical properties (described in the paper). Reported are the mean absolute error (MAE) values for glues (blue), heterobifunctionals (orange) and all the other compounds (green). Models are compared to a baseline prediction (gray), i.e. mean of the training set. 

  
Challenges and Refinements
Our results revealed that predictions for heterobifunctional TPDs remain more challenging. However, the implementation of transfer learning strategies, such as fine-tuning models with heterobifunctional TPDs’ data, improved predictive performance across different ADME endpoints. This highlights the possibility for further refinement and improvement as more data becomes available.

Performance of original and refined models (transfer learning) on heterobifunctional TPDs. Reported are mean absolute error (MAE) values for two fine-tuning strategies: (i) on new data (yellow) and (ii) only heterobifunctional data (purple), as well as the original (red) ML models. Shown are bootstrapping results (n = 1000) for heterobifunctional TPD compounds, and five ADME assays (described in the paper).

 
The Potential of Surrogate Datasets
Our study provides a surrogate dataset with over 270,000 structures, annotated with in-house model predictions for twenty-five  molecular properties. We showcased the potential of using ML-based QSPR models with surrogate data. Hence, this dataset offers exciting prospects for the advancement of ML models in the public domain.

Scheme of surrogate dataset generation. Public compound structures were extracted from ChEMBL, ZINC, and PROTAC-DB, and annotated with our in-house ML predictions. The surrogate dataset contains ~274,000 compounds with predicted data for twenty-five properties.

 
Implications for Pharmaceutical Research
The integration of ML models into the design-make-test-analyze (DMTA) cycle of drug discovery has already proven beneficial in prioritizing compound ideas and experiments. However, the use of ML models for TPDs has been relatively limited compared to traditional modalities. Our findings shed light on the applicability of ML to TPDs and have implications for pharmaceutical research. Specifically, ML models show promise of accelerating the design of TPDs with favorable ADME properties and thus should be encouraged in TPD programs.
 
Conclusion
The ability of ML models to predict ADME properties for TPDs, combined with the potential for improvement through transfer learning, could enable more efficient drug design and advance the field of TPDs in drug discovery. As the availability of data for TPDs continues to increase, including with surrogate datasets, additional modeling strategies can be explored, opening up opportunities to more accurately predict TPDs’ properties based on molecular structures.
  

Hot Topics

Related Articles