A multi-task graph deep learning model to predict drugs combination of synergy and sensitivity scores | BMC Bioinformatics

This section discusses the proposed MultiComb model to predict drug combinations’ synergy and sensitivity scores at a cell line. As mentioned before, the MultiComb model is an end-to-end MTDL model. Figure 1 illustrates the structure of the MultiComb model. First, features for both drugs and cell lines are extracted. Then, a network is designed to simultaneously predict the synergy and sensitivity score. “Data representation” subsection discusses the data representation, while “Graph convolution network” subsection discusses the graph convolution network. After that, “Combined cell and drug features” subsection discusses combining drugs and cell line features. Finally, “Attention model” and “Cross-stitch subnetwork” subsections discuss the attention and cross-stitch networks.Fig. 1The structure of the MultiComb modelData representationFirst, we extract the SMILES of drugs from the PubChem website to process the drug features. Then, the drug molecular graph is extracted from drug SMILES by the freely open-source chemical informatics package RDKit. This drug molecular graph comprises nodes that represent atoms and edges that denote chemical bonds. This graph can be formally as \(G = (N;E)\), where \(N\) denotes the set of \(n\) nodes each represented as a \(K-dimensional\) feature vector with \(K\) feature vector length. \(E\) represents the set of edges that can be defined by an adjacency matrix \(A\). In the molecular graph, the \(i-th\) atom is represented by \({a}_{i} \epsilon \in N\), and the chemical bonding between the \(i-th\) atom and \(j-th\) atom is represented by \({e}_{ij}\in E\). The drug features extracted from a molecular graph are non-Euclidean features, although not translation-invariant. So, a graph neural network is applied rather than a standard convolution network.The DeepChem [13] is an open-source library designed for applying deep learning to problems in chemistry, materials science, and biology. Within DeepChem, the MolGraphConvFeaturizer [14] tool converts molecular structures into numerical representations suitable for graph-based deep learning models. The MolGraphConvFeaturizer extracts a comprehensive set of features that effectively capture the structural and chemical properties of molecules, making it a powerful tool for representing drugs. It computes a binary vector for each atom in a molecule, capturing essential properties such as atom type, hydrogen bonding type (donor or acceptor), hybridization (‘sp’, ‘sp2’, ‘sp3’), aromaticity, degree of the atom (ranging from 0 to 5), formal charge, and the number of connected hydrogens (ranging from 0 to 4). These features enable effective training of machine learning models on molecular data and summarized in Table 2.
Table 2 The summarization of MolGraphConvFeaturizer featuresTo represent the cell lines, gene expression data is used. This data is obtained from the Cancer Cell Line Encyclopedia (CCLE) [15], a project dedicated to studying cancer cell lines’ genomes, microRNA expression, and anticancer drug dose responses. The CCLE gene expression data contains 57,820 gene features per cell line.As observed, there is an imbalance in the dimensional data between drugs and cancer cell line features. To address this, the dimensionality of the cancer cell line features is reduced using the LINCS [16] project. The LINCS project identified a set of 1000 carefully selected genes, known as the ‘Landmark gene set,’ that can capture more than 80% of the characteristics of any cancer cell line based on connectivity map data. Subsequently, the genes that intersect between the CCLE gene expression data and the Landmark gene set are selected. Ultimately, 934 genes are selected to represent the final cell line vector, which is then normalized by the tanh-norm method.Graph convolution networkA graph convolution neural network learns the drug features. The message is transferred between each atom and its adjacent atoms in the learning process. In this paper, the graph convolution network (GCN) [17] is trained as a graph network in the learning framework to extract drug features.The GCN model utilizes an effective layer-wise propagation mechanism. This layer propagation mechanism can be represented in Eq. (1). The initial input to the GCN consists of the atom properties matrix \(X\in {R}_{n\times 30}\) and the adjacency matrix \(A\in {R}_{n\times n}\). In the adjacency matrix, \({e}_{ij}\in A\) is equal to 1 if there is a chemical bond between \(i-th\) and \(j-th\) atoms; otherwise is equal to 0.$$\grave{X}_{{\text{i}}} = \alpha WX_{i} + b,$$
(1)
where \(W\) and \(b\) are learned weight and biased, and \(\alpha\) is calculated as in Eq. (2).$$\alpha ={\widehat{D}}^{-1\backslash 2} \widehat{A} {\widehat{D}}^{-1\backslash 2},$$
(2)
where \(\widehat{A}=A+I\) is the modified graph adjacency matrix with self-loops added, \(I\) represents the identity matrix and \(\widehat{D}\) denotes the degree matrix of the modified adjacency matrix. After each layer, a LeakyReLU activation function and dropout connections are applied to the final output \(\grave{X}\).Also, a regularization method is applied to the weights and outputs of layers to reduce overfitting between training and testing scores. a global max pooling layer is applied to the output of the final graph layer to transform the \({\text{X}}_{\text{i}}\) matrix to a featured vector for the next steps.Combined cell and drug featuresFirst, the subnetwork is learned to extract the features of the cell line. The subnetwork consists of three FC layers, each followed by an activation function and a dropout layer. ReLU is used as the activation function following each FC layer, with a dropout rate of 0.2 applied after both the first and second FC layers. However, no dropout follows the final FC layer. In this paper, the FC layer is regularized by applying techniques to both the weights and outputs of the FC layer to prevent overfitting of the learning model. The first FC layer of the subnetwork receives the gene expression data of the cancer cell line. The FC layers then learn the cell line features, producing the final feature vector of the cancer cell line.On the other hand, the drug feature vector output from the GCN is passed through two fully connected layers with ReLU activation, without dropout, to produce the final drug feature vector. This vector is then concatenated with the cell line feature vectors into a single combined vector. A normalization layer is applied after concatenation to maintain the integrity of the combined cell line and drug features. These normalized data are then used as input for the attention model.Attention modelAn attention model [18] is applied to outperform different task feature representations. Also, attention gates are used to guide the learning process of various tasks to focus on the extracted features completely. In this paper, multi-head attention is used. Multi-head attention based on weighting extracted features for different tasks separately. So, it focuses on the important features and minimizes the noise.Multi-attention consists of three main inputs: query (\(q\)), key (\(k\)), and value (\(v\)) vectors. Here, for each task, the attention mechanism maps the normalized concatenation layer \({\prime}{h}_{0}{\prime}\) to \(q\), \(k\), \(v\) using distinct linear projection layers, as shown in Eqs. (3), (4), and (5).$${h}_{q}=f\left(wa*h+b\right),$$
(3)
$${h}_{k}=f\left(wa*h+b\right),$$
(4)
$${h}_{v}=f\left(wa*h+b\right),$$
(5)
where \(f\) is the activation function of attention while \(wa\) and \(b\) are the learning attention weight and bias vectors, respectively.After that, the dot-product is applied to them as represented in Eq. (6).$$s=softmax\left({h}_{q}{*h}_{k}\right),$$
(6)
Finally, the cell line vector is summarized in Eq. (7).$${h}_{f}^{1}=\sum s*{(h}_{v}* {W}_{s}),$$
(7)
where \({W}_{s}\) is the scale weight of the cell line feature.This process is repeated across \(h\) parallel attention heads, and the resulting vectors from each head are concatenated to produce the final vector, as described in Eq. (8). Here, \(h\) is set to 4.$${h}_{f}=concat\left({h}_{f}^{1},{h}_{f}^{2},\dots \dots ,{h}_{f}^{h}\right).$$
(8)
Finally, the output from the multi-head attention mechanism is concatenated with its input according to Eq. (9).$${h}_{ff}=concat\left({h}_{f},{h}_{0}\right).$$
(9)
Cross-stitch subnetworkOne major factor in the MTDL model is determining the relationships between tasks. The correct task relationship setting can enhance overall task performance. On the other hand, inaccurate relationship setup may lead to negative knowledge transfer and decrease prediction accuracy. So, in this paper, the cross-stitch subnetwork [19] is implemented to discover the relationships between synergy and sensitivity tasks.The cross-stitch learns the task relationships during the training of the MultiComb model. It uses the cross-stitch unit to determine the extent of information sharing needed. The cross-stitch function is defined by Eq. (10).$$\left[{\overline{a} }_{1} {\overline{a} }_{2}\right]=\left[\begin{array}{cc}{k}_{11}& {k}_{12}\\ {k}_{21}& {k}_{22}\end{array}\right] \left[{a}_{1} {a}_{2}\right],$$
(10)
where \({a}_{1}\) and \({a}_{2}\) represent the first and second input task features, respectively, while \({\overline{a} }_{1}\) and \({\overline{a} }_{2}\) represent the task relationship features for the first and second tasks, respectively. The value \({k}_{ij}\) represents the of the weighted relationship between tasks \(i\) and \(j\).In this paper, the final output from the multi-head attention mechanism for each task is fed into the cross-stitch network. according to Eq. (11).$$\left({\overline{h} }_{ff1},{\overline{h} }_{ff2}\right)=cros{s}_{stitch\left({h}_{ff1},{h}_{ff2}\right)},$$
(11)
where, \({h}_{ff1} and {h}_{ff2}\) represent the multi-head attention features for synergy and sensitivity tasks separately.Then, the output vectors \({\overline{h} }_{ff1}\) and \({\overline{h} }_{ff2}\) from the cross-stitch network are then passed through an FC layer separately, resulting in new representations \({h}_{s1}\) and \({h}_{s2}\) for each task, respectively.Following this, an additional cross-stitch layer is added as described by Eq. (12).$$({\overline{h} }_{s1},{\overline{h} }_{s2})=cross\_stitch({h}_{s1},{h}_{s2}),$$
(12)
Finally, the inputs and outputs of the cross-stitch layer are concatenated according to Eqs. (13) and (14).$${h}_{sf1}=concat\left({\overline{h} }_{s1},{h}_{ff1}\right),$$
(13)
$${h}_{sf2}=concat\left({\overline{h} }_{s2},{h}_{ff2}\right).$$
(14)
After capturing the two tasks’ relationship features, the prediction scores are learned by a prediction subnetwork. The prediction subnetwork consists of two FC layers, and each task feature is fed to a separate subnetwork and outputs the synergy and sensitivity scores.

Hot Topics

Related Articles