Multi-output prediction of dose–response curves enables drug repositioning and biomarker discovery

Dose–responseDrug response data were extracted from the Genomics of Drug Sensitivity in Cancer (GDSC), which contained two pharmacogenomic screens; GDSC1 and GDSC2. GDSC1 includes data from 2010 to 2015 whilst GDSC2 is newer and includes screens conducted post-2015. Hundreds of compounds have been screened across approximately 1000 cell-lines with two key studies published using GDSC1. We used version 8.4 of the datasets released July 2022 in this study. In short, 403 compounds have been screened across 970 cell-lines to produce 333292 IC50s in GDSC1 and 297 compounds were screened across 969 cell-lines giving 243,466 IC50s in GDSC2.The two datasets also use different assays for screening with GDSC1 using Syto60 (Resazurin) and GDSC2 using CellTitreGlo. The duration for both these assays is 72 h.In the GDSC1 dataset, dose–response data was generated through drug screening with two different concentration protocols: one involving 9-dose (2-fold dilutions) and the other with 5-dose (4-fold dilutions) concentrations. For the dataset used in our study, only dose responses of SKCM cell lines treated with three BRAF-targeted drugs (Dabrafenib, PLX-4720 and SB590885) were included, primarily with 9-dose concentrations. If a drug was screened at both 5- and 9-dose concentrations, dose responses with 5-dose concentrations were excluded. Also, dose responses in GDSC1 were excluded from the dose-specific drug responses prediction experiment in ten different cancer types. In contrast, GDSC2, which was used specifically for the dose-specific drug responses prediction in these ten cancer types, utilised a 7-dose concentration protocol. This protocol included a 7-point dose curve with a half-log dilution step covering a 1000-fold range and another 7-point dose curve with 2 × 2-fold dilutions followed by 4 × 4-fold dilutions, covering a 1024-fold range. However, only the 7-point dose curves with the 1000-fold range drug treatment is included in our study. The raw screening data from both GDSC1 and GDSC2 were processed using the R package, gdscIC50. This R package was used to process raw data in the GDSC database and fit dose–response curves for obtaining the AUC and IC50 values presented on the GDSC website (http://www.cancerrxgene.org). In our study, this R package was only used for several data pre-processing steps: data filtering, viability data normalisation, and normalisation of the drug treatment concentration scale. Briefly, screening data with any missing drug information or having failed internal quality control processes indicated by a “FAIL” tag, were first removed. The remaining raw viability data were then normalised with the negative (viable cells with media in GDSC1 or media and compound vehicle in GDSC2) and positive (no viable cells) controls of each tested plate, identified using tags (e.g. neg_control = “NC-0” or “NC-1”, pos_control = “B”) present in the dataset. Distinct drug treatment concentration ranges used in the screening of different individual drugs were then normalised on a single scale in each dataset to make them comparable across all drug concentrations used.Drug chemistryIn addition to the genomic features, a total of 27 chemical features relating to drug molecules such as molecular weight, hydrogen bond donor and acceptor counts and formal charge were initially used. These were extracted from PubChem using PubchemPy and the definitions for each feature can be found under “Computed Properties” for any compound on PubChem. Additionally, the presence and absence of elements in each compound has been encoded and used as binary features. These include boron, nitrogen, phosphorous, platinum, iodine, chlorine, oxygen, sulphur, bromine and fluorine. The final set of chemical features comprises 12 features, as shown in Supplementary Table 2.These chemical features or molecular descriptors can vary in dimensionality (1D, 2D, 3D and so on) and quantitatively illustrate molecular properties of a chemical compound49. For the purposes of this project, only 2D descriptors were used alongside encoded elements. These 2D descriptors represent information gathered from the structural formula of the compound and include many subdivisions such as topological indices and simple counts50. The latter, as the name suggests, refers to features in a compound that can be described by simple counts such as hydrogen bond donors and molecular weight. The former refers to descriptors derived from 2D graphs of the molecules or compounds and includes features such as complexity.Molecular dataThe status of three molecular features: i) genetic variations in high confidence cancer genes, ii) CNA status of RACSs and iii) DNA methylation status of iCpGs for cancer cell lines in the GDSC database were identified using cancer functional events (CFEs) found in the primary tumour samples by combining data across all tumours (pan-cancer) or for each specific cancer type.For comparative analysis between ANOVA and KL-relevance: i) analysis of variance (ANOVA) and ii) KL-Relevance used to determine the contributions of specific cancer features to the variability in drug responses in skin cutaneous melanoma cell lines, a multi-omics binary event matrix (MOBEM) containing the status of a total of 24 melanoma-specific CFEs (genetic variations and CNAs) detected in the associated cell lines were extracted, which were used as input for ANOVA statistical analysis to determine associations between cancer features and drug sensitivity in the GDSC database (Supplementary Data 1). Among the complete set of 24 melanoma-specific CFEs, only those that occurred in at least 3 cell lines in GDSC1 or GDSC2 were taken into account for the ANOVA analysis. Details can be found in the Data Availability section.For dose-specific prediction of drug responses across different cancer cell lines representing 10 cancer types, genomic annotation data of a panel of 1001 human cancer cell lines were obtained from the GDSC1000 resource site https://www.cancerrxgene.org/gdsc1000/GDSC1000_WebResources///Data/BEMs/CellLines/CellLines_Mo_BEMs.zip (Supplementary Data 2). The set of pan-cancer CFEs in a binary event matrix consisted of 1073 molecular features in total (genomic variants: N = 310, CNAs: N = 425, DNA methylation: N = 338). Briefly, genetic variants, CNAs and DNA methylation status of the selected cancer cell lines were identified as described31. Individual genetic variants found within a coding region of a set of 470 high confidence cancer genes, based on their frequent occurrence in COSMIC v68 (http://cancer.sanger.ac.uk/cosmic/), were identified. The mutational status of these genetic features in the cancer cell lines were then determined. For CNAs, focal amplification/deletion of chromosomal regions were identified for all cell lines in the panel. Only amplified regions spanning at least a gene or deletion events of an exon were included. Identification of iCpGs, regions where beta signals are distributed bimodally, was carried out in each specific cancer type separately due to the highly tissue specificity of the DNA methylation profiles. DNA methylation status of these iCpGs were then determined and only hypermethylated sites in the gene promoter region were included. The presence or absence of these different features in the selected cell lines were encoded with binary values. These binary features of cell lines were then used in the MOGP modelling for drug response prediction. The cancer features of the 279 cancer cell lines were first extracted from this set of pan-cancer CFEs for use in the aforementioned analysis (Supplementary Data 3). Similar dose–response predictions were further expanded to include an additional 436 cell lines from the GDSC2 dataset representing additional cancer types (N = 10), and a wider range of drug compounds (N = 10) targeting 5 biological pathways. The molecular features of the additional cell lines used in the analysis are presented in Supplementary Data 4).Multi-output Gaussian process regression modelA Gaussian process (GP)51 is a distribution over functions generally used as a non-parametric prior for a probabilistic model. Here, we apply a probabilistic regression model based on GPs to predict the dose–response of different drugs applied to a particular type of cancer. We have a set of \(N\) input data observations \(X={[{x}_{1}^{T},…,{x}_{N}^{T}]}^{T}\), where each data observation \({x}_{n}\) is a multi-modal vector of \(P\) features; they are a combination of genomics and drugs features. We apply the Multi-output GP (MOGP) model where each output is associated with a dose concentration52, i.e., there is a particular \(d\)-th output vector, \({y}_{d}={[{y}_{d,1},…,{y}_{d,N}]}^{T}\), that stacks \(N\) data observations of the dose–response for the \(d\)-th drug concentration. We can express a joint distribution for a MOGP regression models as:$$p(y,f{{|}}X)\,=\mathop{\prod}\limits_{d=1}^{D}\mathop{\prod}\limits_{n=1}^{N}G({y}_{d}{\rm{|}}{f}_{d}({x}_{n}),{\sigma }^{2})p(f),$$
(1)
where \(G({y}_{d}|{f}_{d}({x}_{n}),{\sigma }^{2})\) is a Gaussian distribution with mean \({f}_{d}({x}_{n})\) and variance \({\sigma }^{2}\); \(f={[{f}_{1}^{T},…,{f}_{D}^{T}]}^{T}\) is a vector built with \({f}_{d}={[{f}_{d}({x}_{1}),…,{f}_{d}({x}_{N})]}^{T}\), where each latent function is derived from an intrinsic coregionalization model53, \({f}_{d}(x)={\sum }_{i=1}^{R}{a}_{d,i}{u}^{i}(x)\), for which each \({u}^{i}\) is an independent and identically distributed (IID) sample taken from a GP, i.e., \(u\sim {GP}(0,k(.,.))\), with \(k(.,.)\) accounting for a kernel covariance; \(R\) represents the number of IID samples taken from the GP; \({a}_{d,i}\) is a linear combination coefficient; and \({\sigma }^{2}\) is a noise parameter that aims to model the uncertainty of each dose–response \({y}_{d}\). The noise parameter \({\sigma }^{2}\) accounts for the noise variance of the dose–responses \({y}_{d}\). Such a noise parameter emerges from the construction of the regression model from a probabilistic perspective. We can think of the dose–response \({y}_{d}\) as an output generated by a function of the inputs \(x\) and corrupted by noise: \(y=f(x)+\epsilon\), where \(\epsilon \sim N(0,{\sigma }^{2})\) is a Gaussian noise, with noise parameter \({\sigma }^{2}\), that corrupts the output \(y\). Therefore, instead of modelling the outputs \({y}_{d}\) as a point estimate, it is more realistic to allow the output \({y}_{d}\) to be modelled with some uncertainty encoded in the parameter \({\sigma }^{2}\).It is worth mentioning that the kernel covariance, \(k(.,.)\), generally depends on a set of hyper-parameters, for instance an Exponentiated Quadratic (EQ) kernel, \(k(x,x^{\prime} )={\sigma }_{{EQ}}^{2}\exp \left(-\frac{{|x}-x^{\prime} {|}^{2}}{{2l}^{2}}\right)\), relies on the length-scale \(l\) and variance \({\sigma }_{{EQ}}^{2}\).Training a MOGP regression model consists on maximising the log marginal likelihood:$${\mathrm{log}}\,{\rm{p}}(y{\rm{|}}X)=-\frac{{ND}}{2}\log (2\pi )-\frac{1}{2}\log {\rm{|}}K+{\sigma }^{2}I{\rm{|}}-\frac{1}{2}{y}^{T}{(K+{\sigma }^{2}{I}_{N})}^{-1}y,$$
(2)
w.r.t to the kernel hyper-parameters, the coefficients \({a}_{d,i}\) and the noise parameter \({\sigma }^{2}\); here \(y={[{y}_{1}^{T},…,{y}_{D}^{T}]}^{T}\) is the output vector that stacks all \(D\) drug concentration outputs; \({I}_{N}\) is an identity matrix of size \(N\times N\); and \(K\) is a covariance matrix which entries are built with evaluations of the covariance, \({k}_{{MO}}(x,x^{\prime} )=\mathrm{cov}[{f}_{d}(x),{f}_{d^{\prime} }(x^{\prime} )]\), for all pairs of input data observations \(X\), and all pairs of outputs \(D\). In order to make predictions for a new set of \({N}_{* }\) input data observations \({X}_{* }\), we just need to evaluate the predictive distribution, \({G(y}_{* }|\mu ({X}_{* }),S({X}_{* }))\), with mean and covariance:$$\mu ({X}_{* })={K}_{* }{(K+{\sigma }^{2}I)}^{-1}y,$$
(3)
$$S({X}_{* })={K}_{* * }-{K}_{* }{(K+{\sigma }^{2}{I}_{N})}^{-1}{K}_{* }^{T}+{\sigma }^{2}{I}_{* },$$
(4)
where \({K}_{* }\) is a matrix built with evaluations of the \({k}_{{MO}}(.,.)\) between the \({X}_{* }\) and \(X\); \({K}_{* * }\) is a matrix built with evaluations of the \({k}_{{MO}}(.,.)\) between all pairs of observations in \({X}_{* }\); and \({I}_{* }\) is an identity matrix of size \({N}_{* }\times {N}_{* }\). The equations above provide us with a mean prediction \(\mu\) and covariance matrix \(S\) to quantify the uncertainty of our prediction.Kullback-Leibler relevance determinationIn order to build a measure of predictive relevance of a \(p\)-th input feature is necessary to compute the MOGP predictive distribution from Eqs. (3) and (4) over a training data observations \({x}_{n}.\) Then, one compares how such a distribution differs when is recomputed with a subtle modification of the \(p\)-th feature of \({x}_{n}\). We can express the relevance of the \(p\)-th feature with respect to the data observation \({x}_{n}\) by calculating the divergence of the predictive distributions:$$r(n,p,\varDelta )=\frac{d[{G(y}_{* }{\rm{|}}\mu ({x}_{n}),S({x}_{n})){\rm{||}}{G(y}_{* }{\rm{|}}\mu ({x}_{n}+{\varDelta }_{p}),S({x}_{n}{+\varDelta }_{p}))]}{\varDelta },$$
(5)
where \(d[.{||}.]=\sqrt{2{D}_{{KL}}[.{||}.]},\) with \({D}_{{KL}}[.{||}.]\) as a Kullback-Leibler divergence, and \({\varDelta }_{p}\) as a vector of zeros with \(\varDelta\) on the \(p\)-th entry. This kind of relevance determination has been previously studied in the context of single output Gaussian processes by54, here we extend the application of the relevance metric to the context of multiple-outputs, where \({y}_{* }\) represents a vector prediction of \(D\) points of cell viability associated with the drug concentrations. Therefore, the divergence \({D}_{{KL}}[.{||}.]\) is computed between multivariate distributions with parameters \(\mu \in {R}^{D}\) and \(S\in {R}^{D\times D}\). Since we calculate the relevance per \(n\)-th datapoint, we can compute an average of the predictive relevance for a \(p\)-th feature along all the data observations,$$K{L}_{p}=\frac{1}{N}\mathop{\sum}\limits_{n=1}^{N}r(n,p,\varDelta ),$$
(6)
and use this estimate to rank the \(P\) features of the input data observations \(X\) by their average relevance.Dataset for BRAF biomarker identification in melanomaFor all our experiments we selected three types of drugs that are highly correlated to the BRAF features: drug PLX-4720, SB590885 and Dabrafenib. There are two types of experiments: the first type mainly focuses on modelling the dose–response of melanoma cancer and identifies the relevance ranking of the features; we compare the ranking with the ANOVA method applied to the same data in the GDSC database for computing the degree of association between the genomic features (coding mutations and recurrent copy number alterations) of cancer cell lines and drug sensitivity measured by IC50 values. The second type of experiment aims to provide insights about the fundamental question of what is the threshold amount of cell lines or dose–response curves necessary for obtaining a robust predictive model for a particular type of cancer; we apply our model to BRCA, COREAD, LUAD, SKCM and SCLC cancers with dose–responses from drugs PLX-4720, SB590885 and Dabrafenib.Dataset for KL-relevance of BRAF drugs to compare with ANOVAFor the first experiment we use cell lines of melanoma cancer from both GDSC1 and GDSC2 datasets. The details are the following: GDSC1 has \(N=40\) dose–responses for drug PLX-4720, \(N=35\) for drug SB590885, and \(N=39\) for drug Dabrafenib; the number of input features is \(P=24\). These features are the same 24 input features used in the ANOVA analysis in the GDSC database, where only features found in at least three cell lines of a specific cancer (SKCM) were included in the analysis. GDSC2 has \(N=50\) dose–responses for drug PLX-4720, \(N=45\) for drug SB590885, and \(N=47\) for drug Dabrafenib; the number of input features is also \(P=24\) (also the same used in the ANOVA analysis). The number of drug concentrations is seven (\(D=7\)) in the GDSC2 dataset for all drugs, but in the GDSC1 dataset is nine (\(D=9\)) for drugs PLX-4720 and SB590885, and five \((D=5)\) for the drug Dabrafenib.Dataset for training, across cancer types and increasing size of training setsFor the second experiment we use cell lines of BRCA, COREAD, LUAD, SKCM and SCLC cancers from the GDSC2 dataset; their dose–responses are produced by the drugs PLX-4720, SB590885 and Dabrafenib. Thus, our dataset consists of BRCA (\(N=110\)), COREAD (\(N=108\)), LUAD (\(N=139\)), SKCM (\(N=142\)) and SCLC (\(N=133\)); we split each cancer dataset into 70% training and 30% testing, guaranteeing the number of drugs appearing in training and testing is proportional to the original number in the whole dataset. This dataset contains a total of 780 unique molecular features after filtering out those that have the same values across all dose–responses of specific drug-cell line pairs, which include 279 mutations, 418 CNAs, 71 related to DNA methylation and 12 chemical features of drug compounds. These \(P=780\) input features are used to predict seven drug concentrations (\(D=7\)) that form the dose–response curve. In this second experiment, we examine how the MOGP model’s prediction performance varies with different numbers of DRCs for training and report the performance over the test set. The gradual increment of the DRCs for training the model is as follows: \(\{\mathrm{8,16,27,42,58,74}\}\) for BRCA, \(\{\mathrm{7,15,26,42,57,72}\}\) for COREAD, \(\{\mathrm{10,19,34,53,72,91}\}\) for LUAD, \(\{\mathrm{10,20,35,54,74,93}\}\) for SKCM and \(\{\mathrm{9,18,32,51,69,87}\}\) for SCLC.Training of MOGPIn order to train the MOGP model we have to maximise the log marginal likelihood of Eq. (2) w.r.t the kernel’s length-scale hyper-parameter, the coefficients \({a}_{d,i}\) and the noise parameter \({\sigma }^{2}\). For the covariance kernel, when having a variety of input features as it is our case of genomics where different clusters of data might be identified; for instance as mutation (mu), methylation (met), copy number (cn) and/or drugs’ compounds (dc); the application of a covariance kernel parameterised with a unique length-scale results restrictive to the model. Restrictive in the following sense: let us suppose that the features among the mutation group smoothly covariate then a large length-scale could appropriately account for their covariance behaviours, whilst the features among methylation aggressively covariate then a short length-scale might adequately explain such a behaviour. Thus, if we chose to fit the model with a unique hyper-parameter length-scale this might not be able to sufficiently account for the contrastive behaviours among mu and met features. Hence, a particular kernel per group of features allows the model to have more flexibility by introducing a length-scale hyper-parameter to account for each type of feature. Therefore, instead of using a unique length-scale along all the features we build our kernel through a product of EQ kernels of the form55,$$\begin{array}{l}k(x,x^{\prime} )={k}_{{mu}}(x,x^{\prime} )* {k}_{{met}}(x,x^{\prime} )* {k}_{{cn}}(x,x^{\prime} )* {k}_{{dc}}(x,x^{\prime} ),\\ k(x,x^{\prime} )=\exp \left(-\frac{{{|}}{x}_{{mu}}-x^{\prime}_{{mu}}{{{|}}}^{2}}{{{2l}^{2}}_{{mu}}}\right)* \exp \left(-\frac{{{|}}{x}_{{met}}-x^{\prime}_{{met}}{{{|}}}^{2}}{{{2l}^{2}}_{{met}}}\right)* \exp \left(-\frac{{{|}}{x}_{{cn}}-x^{\prime}_{{cn}}{{{|}}}^{2}}{{{2l}^{2}}_{{cn}}}\right)\\ \qquad \qquad \,\,* \exp \left(-\frac{{{|}}{x}_{{dc}}-x^{\prime}_{{dc}}{{{|}}}^{2}}{{{2l}^{2}}_{{dc}}}\right),\,\end{array}$$
(7)
where each kernel has a sub-index label to indicate its particular operation over the features related to the label, for instance \({k}_{{mu}}(x,x^{\prime} )\) only operates over the mutation features, \({k}_{{met}}(x,x^{\prime} )\) only operates over the methylation features and so on. The application of a particular kernel per type of feature allows the model to have more flexibility by introducing a length-scale hyper-parameter to account for each type of feature, instead of using a unique length-scale along all the features.For the first experiment we built a model per each drug using cross validation with Kfolds = 20, the rank R was cross-validated from \(\{1,…,D\}\), the coefficients \({a}_{d,i}\) were sampled from a normal distribution \(N(\mathrm{0,1})\), each length-scale hyper-parameter was sampled from a uniform distribution with range \((\mathrm{0.01,3}\sqrt{24}),\) we used a total of 840 seeds for sampling the coefficients \({a}_{d,i}\) and length-scales. It is important to highlight that for the kernel product in Eq. (7), we used only two kernels, \({k}_{{mu}}(x,x^{\prime} )\) for mutation features (11 features) and \({k}_{{cn}}(x,x^{\prime} )\) for copy number features (13 features); this is due to the benchmark ANOVA method only having been applied to those types of features.For the second experiment we built a model for each cancer type, we split the data in 70% training and 30% testing dose curves, ensuring that the amount of drugs present in the complete dataset is proportional to the amount in the training and testing sets. From the training data we performed a process of increasing the number of dose–response curves used for fitting the MOGP model. The increment of the dose–response curves for fitting the MOGP is as follows: \(\{\mathrm{8,16,27,42,58,74}\}\) for Breast, \(\{\mathrm{7,15,26,42,57,72}\}\) for COREAD, \(\{\mathrm{10,19,34,53,72,91}\}\) for LUAD, \(\{\mathrm{10,20,35,54,74,93}\}\) for melanoma and \(\{\mathrm{9,18,32,51,69,87}\}\) for SCLC; the dose–response curves are randomly selected from the complete training set, we run six different random seeds to sample each value of increment. To find the best model for each increment (at each random seed) we cross validated with Kfolds = 5, the rank \(R\) was cross-validated from \(\{1,…,D\}\), the coefficients \({a}_{d,i}\) were sampled from a normal distribution \(N(\mathrm{0,1.5}),\) each length-scale hyper-parameter was sampled from a uniform distribution with range \((\mathrm{0.05,1.5}\sqrt{780}),\) we used a total of 108 seeds for sampling the coefficients \({a}_{d,i}\) and length-scales. Unlike the type-1 experiment, for this type-2 experiment we have all the categories of features, mutation \(({mu})\), methylation \(({met})\), copy number \(({cn})\) and drugs’ compounds \(({dc})\); thus we apply the kernel product presented in Eq. (7), where \({k}_{{mu}}(x,x^{\prime} )\) operates over 279 features, \({k}_{{met}}(x,x^{\prime} )\) over 418 features, \({k}_{{cn}}(x,x^{\prime} )\) over 71 features, and \({k}_{{dc}}(x,x^{\prime} )\) over 12 features (refer to the “Range selection for the uniform distributions to initialise the length-scale hyper-parameters” section for detailed information on the initialisation of the length-scale hyper-parameters”).For both types of experiments we optimised the kernel hyper-parameters, the coefficients \({a}_{d,i}\) and the likelihood noise parameter \({\sigma }^{2}\) by means of the LBFGS optimiser56. All the training of the MOGP models was performed using the python library: GPy (version 1.0.7) and Numpy (version 1.21.2).Range selection for the uniform distributions to initialise the length-scale hyper-parametersThe choice of ranges for the uniform distributions of the length-scale hyper-parameter is based on the fact that we expect the length-scale to be positive and up to a certain value that covers possible covariances between the input space features (genomics and drug compounds features in our case). For instance, the choice of \((\mathrm{0.05,1.5}\sqrt{780}),\) ranges from a small length scale \(0.05\) up to \(1.5\sqrt{780},\) where the upper limit is selected as a rule of thumb when dealing with high dimensionalities, in this case the dimensionality is \(P=780\). This rule of thumb provides a sensible way to initialise a length-scale that fits the spread of the input features across all dimensions. With this approach we assume that a length-scale proportional to \(\sqrt{P}=\sqrt{780}\) is feasible for capturing covariances in a high dimensional space. On the other hand, the choice of number of seeds is just the number of possible models that we run in a High Performance Computing (HPC) server to cross-validate and select the best model. The number of seeds used to cross-validate was a trade-off between the time complexity of the model and HPC resources available. The more HPC resources one can access the more alternative initialisation models one can cross-validate. Nonetheless, we found that among the different models cross-validated we were able to find various of them with similar and salient performances.Prediction on GDSC1 and GDSC2 and vice versaIn the interest of analysing the capabilities of our model, we explore two scenarios of the type-1 dataset: for the first scenario, we assume the GDSC1 dataset as the training source and the GDSC2 as the testing one; for the second scenario we assume the contrary, GDSC2 is assigned as the training source and GDSC1 as the testing. It is worth mentioning that the drug concentration doses are not the same for GDSC1 and GDSC2, thus we have to measure the prediction just at the region where both curves overlap. Supplementary Figure 16 shows an example of the different concentrations in the drug response curves for GDSC1 and GDSC2 datasets. The MOGP models would predict the cell viability per drug concentration, i.e., for GDSC1 the MOGP would predict \(D=9\) outputs, and for GDSC2, \(D=7\) outputs. We interpolate the outputs to render the full curves in order to extract the summary metrics AUC, IC50 and Emax.Estimating AUC, IC50 and E-max for observed and predicted dose–responsesTo compute the Area Under the Curve (AUC), the drug concentration to achieve 50% cell viability (IC50), and the fraction of viable cells at the highest drug concentration (Emax), is necessary to evaluate a function \(f(c)\) that describes the dose–response curve amongst the range of minimum drug concentration and maximum drug concentration, i.e., \(c=[{MinDC},{MaxDC}]\). One can compute the AUC through the trapezoidal rule,$${AUC}={\int_{a}^{b}}f(c){dc}\,\approx \,\mathop{\sum}\limits_{i=1}^{I}\frac{f({c}_{i-1})+f({c}_{i})}{2}\bigtriangleup {c}_{i},$$
(8)
where each \({c}_{i}\) represent the \(i\)-th value of a grid that partitions the range \([a,b]\) into \(I+1\) points with partition size, \(\bigtriangleup {c}_{i}={c}_{i}-{c}_{i-1}\); and the range’s values are \(a={MinDC}\) and \(b={MaxDC}\). To compute the AUC presented in the equation above we used the function ‘auc’ from the python package: metrics, implemented in the library scikit-learn (version 1.0.2). On the other hand, we can compute the IC50 by just finding the drug concentration value \({c}_{{IC}50}\) for which \(f({c}_{{IC}50})\,=0.5\), and the Emax is just the value of the function \(f({c}_{{MaxDC}})\) evaluated at \({c}_{{MaxDC}}={MaxDC}.\)Measuring observed dose–response curvesFor each of the cell lines of our datasets we applied the sigmoid function with four parameters (Sig4) as the evaluating function \(f(c)\) that describes our observed dose–response curves. This Sig4 function is our reference dose–response curve from which we extract the true labels AUC, IC50 and Emax for each cell line. The Sig4 can be expressed as:$$f(c)=\frac{1.0}{L+\exp (-\tau * (c-{c}_{0}))}+d,$$
(9)
where \(L,\,\tau ,\,{c}_{0}\) and \(d\) are the four parameters that describe the curve, and \(c\) is the drug concentration for which we expect to evaluate the range, \(c=[{MinDC},{MaxDC}]\)57.Measuring predicted dose–response curvesOn the other hand, the MOGP model predicts the cell viability at \(D\) concentrations as per the GDSC dataset, i.e., the MOGP generates \(D\) predictions that can be expressed as \(\hat{f}({c}_{1}),…,\hat{f}({c}_{D}),\) where \({c}_{d}\) represents the \(d\)-th drug concentration. In order to compute the summary metrics AUC, IC50 and Emax as previously described, one needs to interpolate such \(D\) predictions and obtain a continuous \(\hat{f}(c)\) function that describes our predicted dose–response curves. We applied the piecewise cubic hermite interpolating polynomial (PCHIP) method to obtain \(\hat{f}(c)\). It is a shape preserving method that reduces oscillating behaviours in the interpolation. We used the function ‘pchip_interpolate’ from the python package: interpolate, implemented in the library scipy (version 1.0.2). From such an interpolated function \(\hat{f}(c)\) we can then obtain the predicted AUC, IC50 and Emax metrics in the same way as for the observed dose–response curves.It is worth mentioning that for our experiments we split the IC50 metric between non-responsive and responsive behaviours as follows: \({IC}50 > 1.0\) is non-responsive and \({IC}50\,\le \,1.0\) is responsive. Also, for the drug response curves, we used a normalised range of drug concentrations between \(0.1428\) or \(0.1111\) as the minimum drug concentration for \(D=7\) or \(D=9\) respectively and \(1.0\) as the maximum drug concentration. Any value higher than the maximum is considered non-responsive and we label it as \({IC}50=1.5\) by default. When studying the melanoma cancer, a cancer very responsive to the drugs PLX-4720, SB590885 and Dabrafenib, we also split the AUC and Emax metrics between non-responsive and responsive behaviours as follows: \({AUC}\le 0.55\) is responsive and \({AUC} \,> \,0.55\) is non-responsive, and \(E\max \le 0.5\) is responsive and \(E\max > 0.5\) is non-responsive.Benchmark methodsSRMFSRMF was used to benchmark our method. This method does not use train and test datasets, but instead takes as input a response matrix containing a mix of response values and NaN values. It then utilises the known values to predict a whole new drug response matrix. To work around this difference, for each pair of test and train datasets, we overrode a copy of the test matrix with NaN values and concatenated it with the train dataset. This was used as the input drug response matrix. This ensured that only values from the test datasets were seen, and thus used, by the model when making predictions, while still including the test data observations in the prediction. We then compared the test data drug response values to their corresponding predicted values from the SRMF output. The original SRMF model was designed for IC50, however, we also ran it for AUC and Emax values (we did not take the log of these beforehand). Pearson correlation was used to create the similarity matrix inputs for both cell lines and gene properties. As we used genetic features for our model rather than gene expression, to be consistent, we also used genetic features when calculating cell line similarity when running SRMF. The hyperparameter K was left at its default as it was originally chosen for GDSE data.Lasso regression and elastic netLasso and Elastic Net have been implemented as the baseline methods. Lasso regression is the linear model with L1-norm while Elastic Net takes together the L1 and L2-norm as the penalty in loss function. We built the linear models under two different settings:

1.

The model is trained and predicted separately on different drugs with one type of the genomics features;

2.

The model is trained and predicted with multi-omics features and together with drug chemical features.

In setting 1, we aimed to investigate the contribution of different omics data in predicting the drug response31. In setting 2, we built the linear model in a multi-task learning fashion8, where multiple drugs are trained and predicted in one model by taking the drug chemical features. We provided the performance metrics for both the single-task-single-omics and multi-task-multi-omics models.DeepCDRDeepCDR is a deep learning (DL) method integrating gene expression, mutation, and DNA methylation profiles. In our investigation, we opted for mutations, copy number alterations, and DNA methylation profiles as our cell line input features. To ensure a consistent comparison, we replaced gene expression with copy number alteration. For the drug encoders, we applied the uniform-graph convolutional neural networks as is designed in the original paper. Throughout the training phase, we implemented a 3-fold cross-validation strategy with early stopping. The performance evaluation was conducted on a test set that was distinct from the training and validation sets yet identical to the test sets used in benchmarking with other methods.GraphDRPGraphDRP is a DL method taking mutational profiles and drug chemical graph as input to predict the drug response. In the original implementation, three GNNs are compared as the drug encoder: GCN, GAT and GIN, while GIN was reported to have the best performance. Consequently, GIN was selected for encoding drug features in our analysis. The same cross-validation and early stop as DeepCDR is applied during the training phase. Performance evaluation was systematically conducted on the same test sets, ensuring comparability of results.NeRDNeRD is a DL model that uses miRNA and copy number as cell line features, while both chemical graph and drug fingerprints are used for drugs. In our evaluation, mutation profiles substituted miRNA data to align with our benchmark. The same cross-validation and early stop as DeepCDR is applied during the training phase. The performance is evaluated on the testing set.

Hot Topics

Related Articles