Application of machine learning models for property prediction to targeted protein degraders

Assays’ descriptionLE-MDCKFor passive permeability determination, 96-well plate permeable inserts were plated with Madin-Darby Canine Kidney (MDCK) cells and cultured for three days. The test article in dimethyl sulfoxide (DMSO) stock solution (10 mM) was added to Hanks’ balanced salt solution (HBSS) to result in a final concentration of 10 μM. The HBSS buffer contained 0.02% bovine serum albumin (BSA) and 10 mM HEPES. The acceptor compartment was HBSS with 5% BSA and 10 mM HEPES. The assay was run for 120 min, determining the donor concentration at time zero and after 120 min the donor and acceptor concentrations. The difference between version 1 (v1) and version 2 (v2) of the assays consisted in the addition of BSA36.CYP metabolic stability in liver microsomesMicrosomal incubations were performed in 384-well PCR plates at 37 °C. Test articles at a concentration of 10 mM in pure DMSO were dispensed by an acoustic dispenser to 25 µL 100 mM phosphate buffer (pH 7.4) containing 2 mM NADPH. This solution (12.5 μL, equilibrated for 10 min at 37 °C) was added to 12.5 μL liver microsomes (1 mg/mL) suspended in 100 mM phosphate buffer. At 0.5, 5, 15, and 30 min, the reactions were terminated by the addition of 10 µL acetonitrile/formic acid (93:7) containing the analytical internal standards (1 μM alprenolol and 1 µM warfarin) and transferred to a new 384-well plate containing 15 µL acetonitrile/formic acid (93:7). The stopped incubations were centrifuged at 5000 x g for 15 min at 4 °C and the supernatants were analyzed by high-performance liquid chromatography–tandem mass spectrometry (LC-MS) to measure the percentage of test article remaining relative to time zero-minute incubation and determine the in vitro elimination-rate constant (kmic). Intrinsic clearance (CLint) was calculated by dividing kmic by the concentration of microsomal protein.PPBPlasma protein binding (PPB) values were mainly determined through equilibrium dialysis. Binding to proteins was measured using rapid equilibrium dialysis (RED device from ThermoFisher). Test articles were dissolved in matrix (plasma, human liver microsomes or brain homogenate). 300 µL of the matrix solutions were dispensed to the red chamber of a RED device and 500 µL 100 mM phosphate buffer to the white chamber. The RED device was sealed with a gas permeable membrane and incubated for 4 h on an orbital shaker (750 rpm) at 37 °C under 5% CO2. 50 µL aliquots from both compartments were transferred to 600 µL acetonitrile containing the analytical internal standard (0.2 µM glyburide) and 50 µL buffer or matrix for a matrix match. The samples were centrifuged at 5000 x g for 15 min at 4 °C and the supernatant was analyzed by LC-MS analysis for measuring test article and internal standard. The free fraction (fu) was calculated by dividing the area ratio of the receiver compartment to the area ratio of the donor compartment. For large molecular weight compounds, PPB was measured using ultracentrifugation (UC). Test articles (5 µM) were added to 1000 µL plasma and incubated for 10 min at 37 °C in a glass vial. For the determination of the total concentration, 3 times 50 µL were added to a 96 deep-well-plate pre-filled with 600 µL acetonitrile/water (9/1) containing the analytical internal standard (0.2 µM glyburide) and 50 µL phosphate buffer. For the free fraction, an aliquot of 700 µL was centrifuged (Beckman UC Optima Max-XP) at 436,000 x g for 5 h at 37 °C. At the end of the centrifugation, 3 times 50 µL of the supernatant were carefully removed and added to the 96 deep-well-plate pre-filled with acetonitrile containing the internal standard and 50 µL blank plasma for a matrix match. The 96 deep-well plate was shaked for 10 min at 300 rpm and stored over-night in a freezer at −20C° to help protein precipitation. The next day the 96 deep-well plate was centrifuged at 4500 rpm for 1 h at 4 °C. Supernatant (50 µL) was transferred in a 384 well plate with 30 µL water. The samples were analyzed by LC-MS for the measurement of test article and internal standard. High throughput dialysis (HTD) was used as a second alternative method for strong binders (% bound >99). Test articles (5 µM) were added to 700 µL plasma and incubated for 10 min at 37 °C. For the determination of the total concentration, 3 times 50 µL were added to a 96 deep-well-plate pre-filled with 600 µL acetonitrile containing the analytical internal standard (0.2 µM glyburide) and 50 µL phosphate buffer. For the free fraction, an aliquot of 100 µL was dialyzed against 100 mM phosphate buffer at pH 7.4 for 6 h in the HTD96b device (HTDialysis LLC). At the end of the incubation, 3 times 50 µL of the plasma (buffer) compartment were removed and added to the 96 deep-well-plate pre-filled with acetonitrile containing the internal standard and 50 µL blank buffer (plasma) for a matrix match. The 96 deep-well plate was shaked for 10 min at 300 rpm and stored over-night in a freezer at −20C° to help protein precipitation. The next day the 96 deep-well plate was centrifuged at 4500 rpm for 1 h at 4 °C. Supernatant (50 µL) was transferred in a 384 well plate with 30 µL water. The samples were analyzed by LC-MS for the measurement of test article and internal standard. A calibration curve was used to define the LLOQ.Most of the utilized data comes from RED devices but, when available, HTD or UC data was utilized instead (i.e., some >99% qualifiers were replaced). Specifically, 3–6% and 1-3% of the data was generated with HTD and UC, respectively.LipophilicityThe 1-octanol/water partitioning coefficient (LogP) was determined using a miniaturized Shake-Flask equilibrium method. Prior to start the experiment the two phases were pre-saturated, so “water-saturated 1-octanol” and “1-octanol-saturated water” were used. Samples were initially dissolved in DMSO as a 10 mM stock concentration. The samples and an internal standard were dispensed in a 1 ml deepwell plate and DMSO is evaporated prior to be dissolved in 1-octanol at a target concentration of 150 µM by shaking at 1000 rpm for 8 h. The pH 7.4 buffer was added with a phase ratio K of 1 (where K = Vwater/Voctanol) and then the samples were shaken four hours on a shaker at 1000 rpm. The deepwell plate was centrifuged at 3000 rpm prior to phase separation. A x10 dilution for the aqueous phase and a x1000 dilution for the octanol phase are prepared and quantified by LC-HRMS against an internal standard (Dexamethasone) with a known logD = 1.9 with the following equation:$$\log D=\log \left(\frac{{{{{{{\rm{Analyte}}}}}}\; {{{{{\rm{peak}}}}}}\; {{{{{\rm{area}}}}}}\; {{{{{\rm{in}}}}}}\; {{{{{\rm{octanol}}}}}}} * 1000/\frac{{{{{{\rm{IS}}}}}}\; {{{{{\rm{peak}}}}}}\; {{{{{\rm{area}}}}}}\; {{{{{\rm{in}}}}}}\; {{{{{\rm{octanol}}}}}}}{0.794}}{{{{{{\rm{Analyte}}}}}}\; {{{{{\rm{peak}}}}}}\; {{{{{\rm{area}}}}}}\; {{{{{\rm{in}}}}}}\; {{{{{\rm{aqueous}}}}}} * 10/{{{{{\rm{IS}}}}}}\; {{{{{\rm{peak}}}}}}\; {{{{{\rm{area}}}}}}\; {{{{{\rm{in}}}}}}\; {{{{{\rm{aqueous}}}}}}}\right)$$This protocol was adapted from Low et al.37.CYP inhibition
CYP3A4 time-dependent inhibition (TDI)
The TDI assay was utilized to determine the first order inactivation rate (kobs) values. Test articles were dispensed to 96-well plates, and human liver microsomes supplemented with NADPH were added to initiate the pre-incubation. Residual CYP3A4 activity was determined after 0, 7, 16 and 32 min by the addition of midazolam (including d4-1-hydroxy-midazolam as internal standard) and incubated for six additional minutes before adding acetonitrile. Supernatants were analyzed for the CYP3A4 selective metabolite 1-hydroxymidazolam and d4-1-hydroxymidazolam using LC-MS. TDI CYP3A enzyme activity was calculated using normalized area ratios of 1-hydroxymidazolam to internal standard and plotted over the pre-incubation time. A one parameter fit using a range of 80% and a background of 20% was utilized to determine kobs. The percentage of reversible inhibition was calculated by the area ratio at 0 min (pre-incubation) in relation to the area ratio of the control with DMSO only. In cases of strong reversible inhibition ( > 50%), kobs values were not calculated.

Reversible inhibition of CYP3A4, CYP2C9, and CYP2D6
Formation rate of enzyme-specific metabolites from midazolam (CYP3A4), diclofenac (CYP2C9) and bufuralol (CYP2D6) in human liver microsomes was utilized. Substrates, internals standards and test compounds were dispensed by acoustic dispensing to a 384-well microplate. Human liver microsomes supplemented with NADPH were added to start the incubation. Plates were immediately transferred to an incubator. Incubations were stopped by the addition of acetonitrile/formic acid (93:7) and the supernatant was analyzed by LC-MS for the enzyme-specific metabolites and internal standards. The area ratios of test compounds were normalized to the average area ratio of DMSO (100% activity) and an inhibitor cocktail (0% activity) to determine the IC50 (test compound concentration causing an inhibition of 50%) using a dose-response-model with a two-parameter fit in which 100% and 0% activity were constrained.
Data qualityTo deliver the best possible data quality, different processes and acceptance criteria were considered. For instance, in LE-MDCK and PPB assays, data were rejected if the recovery rate was too low ( < 60%). Moreover, to minimize unspecific binding issues, enzymatic incubations are done using one-well-per-data. Therefore, it is expected to extract the (bound) compound completely once the incubation is stopped (high content of organic solvent) in contrast to serial sampling approaches. This does not regard the free fraction in the incubation but increases the probability to get consistent data. The LE-MDCK assay protocol was adapted for outside rule-of-five molecules36. In CLint and CYP inhibition assays, non-specific binding is less problematic since protein is present in the incubation medium. Moreover, Fu,mic is measured to correct CLint for microsomal binding. For CYP inhibition, the presence of protein decreases the non-specific binding to labware. If non-specific binding interferes too much with the assay, the data does not fit the model and no IC50 is reported.Data sets for modelingADME data from twenty-five assays were extracted from Novartis database and pre-processed, including apparent permeability (Papp) from two versions of the low-efflux Madin-Darby canine kidney cell line (MDCK) permeability assay, parallel artificial membrane permeability assay (PAMPA), Caco-2 permeability assay, efflux ratio from MDCK-multidrug resistance protein 1 (MDCK-MDR1) permeability assay, intrinsic clearance (CLint) from CYP metabolic stability in liver microsomes assays for rat, human, mouse, dog, cynomolgus monkey, and minipig, plasma protein binding (PPB) for rat, human, mouse, dog and cynomolgus monkey, human serum albumin (HSA) binding, microsomal binding, brain binding, octanol-water partition (LogP) and distribution coefficients (LogP), time-dependent inhibition (TDI) of CYP3A4 (inactivation rate, kobs), and reversible inhibition of CYP3A4, CYP2C9, and CYP2D6 (half-inhibitory constant, IC50). Experimental data were aggregated (geometric mean) when multiple measurements were available for the same compound and assay’s endpoint. Moreover, values outside the dynamic range of the assays were excluded, and qualified values (‘<’/‘>’) were discarded for permeability, PPB, LogP, and LogD. PPB values were transformed to fraction unbound (Fu) and IC50 values from CYP reversible inhibition assays were converted to pIC50 with a negative logarithmic transformation. Logarithmic transformations were applied to the rest of the assay endpoints, except to LogP and LogD. All above-mentioned assays were utilized for model training, but only a fraction of them were used for model evaluation due to data availability. For instance, some assays are requested less often (e.g. monkey CLint compared to rat CLint) or were deprecated in favor of newer version (e.g. LE-MDCK version 2 assay) or other technologies (e.g. Caco-2 was deprecated). Data set statistics are discussed and reported below.Global QSPR models’ descriptionFour multi-task graph neural networks (MT-GNN) global models were generated and evaluated herein: Permeability (5-task model), Clearance (6-task model), Binding/Lipophilicity (10-task model), and CYP inhibition (4-task model). The models were ensembles of a message-passing neural network (MPNN) followed by a feed-forward deep neural network (DNN)25,26. These DNNs facilitate MT through the consideration of multiple output neurons (one per task). To enable MT model training with sparse labels, a masked loss function was utilized, and missing values were not considered for backpropagation22. Previous investigations indicated that especially for sparse experimental data, MT learning can provide an advantage compared to single-task models38. For all models, rectified linear unit (ReLU) was the activation function, a batch size of 50 was considered, and the models were trained with the consideration of early stopping (with a validation set of 10% with scaffold split). Learning rate was varied from an initial value of 0.0001 to a maximum of 0.001 linearly and decreased until 0.0001 exponentially. Mean aggregation was applied to convert atomic vectors into molecular vectors. Supplementary Table 2 reports details about each model’s architecture. Models evaluated herein are available for ADME assays’ predictions internally at Novartis. Prior to selecting MT-GNN as a modeling approach, a variety of molecular features, ML methods, and hyperparameters were benchmarked2,20.Prediction tasksTable 1 reports the fifteen prediction tasks that were evaluated, including the assay, measured property, and MT-GNN model where they were included. Numerical assay thresholds were used to categorize experimental read-outs into ‘high risk’ and ‘low risk’ classes1. Compound optimization accounts for multiple properties, which are measured with assays of varying resolution (different experimental errors)7. Therefore, this assay risk categorization defined by assay experts accounts for experimental variability20 and facilitates decisions during multiparameter optimization. Following this three-class concept, property predictions were also converted to a three-output classification (‘low risk’, ‘medium’, ‘high risk’) using the same thresholds utilized for experimental read-outs, which are reported in Table 1. The percentage of compounds belonging to each category is reported in Supplementary Table 1. When applying such assay thresholds to ML outputs, predictions in the ‘medium’ category can be disregarded. By only considering ‘high risk’ and ‘low risk’ predictions for decision-making, ML models’ precision improves. As in other works20,21,39, predictions in the medium range can also be termed ‘inconclusive (medium)’ predictions, which ideally are not a large fraction. This regression-based classification approach was previously proposed20,40.Data splittingML models were evaluated prospectively (with newly registered and measured molecules)2. Such evaluation scenario is commonly referred to as temporal validation or time-split41. All MT-GNN global models were trained with compounds registered until the end of 2021 and evaluated with compounds registered from 1st January 2022 until 13th July 2023. Global models were evaluated on glue and heterobifunctional TPDs, and predictive performance on these two TPD modalities was compared to performance across all modalities. For the prediction tasks under evaluation, Fig. 1 reports the number of training and test set compounds per each property and modality. The Permeability model was trained on 206,347 compounds, which included 2,732 heterobifunctionals and 2673 glues. From those compounds, 20,041 had LE-MDCK v2 Papp measurements, including 1,608 heterobifunctionals and 1404 glues. Clearance, Binding/Lipophilicity, and CYP inhibition models were generated with 223,025, 92,464, and 65,701 compounds, respectively. Supplementary Table 3 reports the number of training compounds for all the tasks included in global models.Performance metricsFor regression models, performance was estimated with the mean absolute error (MAE).$${{{{{\rm{MAE}}}}}}=\,\frac{1}{n}{\sum }_{i=1}^{n}{||}{y}_{i}-{\hat{y}}_{i}{||}$$
(1)
where \(y\): experimental value, \(\hat{y}\): prediction, and n: number of compounds.Classification models were evaluated with the average precision on low and high classes. Precision is the percentage of compounds that were predicted ‘high’ (‘low’) and indeed had a ‘high’ (‘low’) property value.$${{{{{\rm{Precision}}}}}}=\frac{{{{{{\rm{TP}}}}}}}{{{{{{\rm{TP}}}}}}+{{{{{\rm{FP}}}}}}}$$
(2)
where TP: true positives, and FP: false positives.Misclassification or error rates were also calculated, as the percentage of compounds that were predicted to have a ‘high’ (‘low’) property value and had a ‘low’ (‘high’) measured value. Finally, the percentage of inconclusive (medium) predictions is computed as the fraction of molecules with a predicted property value in the ‘medium’ range. For those compounds, no ‘high’ or ‘low’ property prediction is given.Models’ refinementSome prediction tasks showing margin for improvement were identified and, when data availability allowed, model refinement was carried out. Specifically, three MT-GNN models were optimized with new data attempting at improving LE-MDCK Papp (permeability model); RLM CLint and HLM CLint (clearance model); and CYP3A4 kobs and CYP3A4 IC50 (CYP model) predictions.Transfer learning was adopted with the purpose of optimizing model parameters for heterobifunctional TPD compounds. Instead of focusing on transferring knowledge to previously unseen tasks, which is more common in the field42,43,44, transfer learning was applied under the paradigm of domain adaptation29,45. The transfer learning approach utilized was model fine-tuning with weights initialization, where the weights of the original model (pre-trained model) were adjusted with new data28,46. Two fine-tuning strategies were investigated and are illustrated in Fig. 5.

Strategy 1 (New data). Global ML models were fine-tuned with all new data generated during the previous year. Here, the model was fine-tuned with compounds registered and measured in 2022.

Strategy 2 (TPD-specific data). The original global ML model was fine-tuned with heterobifunctional TPDs’ data (registered and measured before 2023).

Both strategies were evaluated on 328 heterobifunctional TPDs synthesized and measured in 2023. Supplementary Table 4 reports the number of compounds in the training, fine-tuning, and test sets.Public structures and surrogate data setA data set of public structures was gathered from ChEMBL33, ZINC34, and PROTAC-DB35. ChEMBL and ZINC data sets were prepared according to Rodriguez-Perez and Bajorath47. Structures from the three data sources were standardized, including hydrogen bonds removal, metal disconnection, molecule normalization, reionization, and stereochemistry assignation, with the RDKit module rdMolStandardize, and canonicalized48. After such pre-processing, the data set was composed by 273,706 molecules from ChEMBL (70,465), ZINC (199,972) and PROTAC-DB (3,269). To get physicochemical and ADME property values for this large set of molecules, the internal Novartis ML models were utilized. Twenty-five compound properties were predicted for each of the molecules to generate a surrogate data set. This surrogate data set is provided as Supplementary Data.Reporting summaryFurther information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Hot Topics

Related Articles