sChemNET: a deep learning framework for predicting small molecules targeting microRNA function

sChemNET: a deep learning framework for predicting drug targets in the presence of sparse and small-size chemical datasetsWe developed a deep-learning predictive model that incorporates information about small molecules with known and yet-unknown biological activity on miRNAs in a neural network model to predict small molecules targeting miRNAs (or their downstream targets) on the basis of their chemical structure alone. We combined ~2400 such “unlabeled” small molecules with a small number of “labeled” small molecules (i.e., known to affect miRNA expression levels or the expression of its targets) to build a two-layered neural network for small chemical datasets (sChemNET) – see Fig. 1a. In sChemNET, the chemical structure information of the labeled and unlabeled small molecules is fed into the model and distributed over a set of hidden layers of nodes (Fig. 1b). The output layer of the network represents each of the miRNAs, and the model outputs a predicted score for each miRNA for a given small molecule’s chemical feature.Fig. 1: Overview of our deep learning framework for predicting miRNA targeted by small molecules in the presence of sparse and small-size chemical datasets.a sChemNET integrates labeled and unlabeled chemical structure information to predict bioactive small molecules against miRNAs or their mRNA targets. (Left) Labeled small molecules (sky-blue) are known to affect the expression level of miRNAs or their mRNA targets, as curated in the SM2miR database. The dotted arrow represents the experimentally verified small molecule-miRNA association and the up and down-arrows (in green and red, respectively) represent whether there is up-or down-regulation of the expression level (Right). The Drug Repurposing Hub database was used to obtain thousands of small molecules yet unknown to affect miRNAs (a.k.a. unlabeled) shown in green. b sChemNET is a two-layered, fully connected neural network model that incorporates unlabeled chemical structure information during training to enhance prediction performance when only a small set of bioactive small molecule-miRNA dataset is available for training. The trained sChemNET model provides predicted scores for each miRNA given a small molecule’s chemical fingerprint (obtained from its 2D chemical structure representation). Nodes represent input chemical features (yellow), hidden units (gray), and miRNAs’ predicted scores represent the output (purple). Solid lines show the connection between the layers. Molecules with known bioactivity, labeled, molecules without a bioactivity designation, unlabeled. Different miRNAs are illustrated with different colors.sChemNET’s key idea is to train a learning model using a large amount of unlabeled chemical structure information. To learn the probability ${\hat{y}}_{{iu}}=p({y}_{{iu}}|{x}_{i})$ that the ith small molecule, with chemical feature ${x}_{i}$, affects the uth miRNA, sChemNET aims to minimize the following loss function:$$L=\frac{1}{2} {\sum}_{(i,v)\,is\,labeled}{s}_{uv}{({\hat{y}}_{iv}-{y}_{iv})}^{2}+\frac{\alpha }{2} {\sum}_{(i,v)\,is\,unlabeled}{{\hat{y}}_{iv}}^{2}$$
(1)
The first summation in our model applies a fitting constraint to the labeled chemical information (small molecule-miRNA associations ($i,v$) with ${y}_{{iv}}=1$), designed to learn a high prediction score for known associations between small molecules and miRNAs. To learn a model for each miRNA $u$, sChemNET exploits the labeled information available for all other miRNAs, such that their relative contribution to the learning is weighted based on their sequence similarities to the miRNA target $u$ through ${s}_{{uv}}\in [a,\,1]$, where $0\le a < 1$ (see “Methods” section). The second summation in Eq. (1) is the fitting constraint on the unlabeled chemical information (small molecule-miRNA associations ($i,v$) with ${y}_{{iv}}=0$). Unlabeled small molecules are assigned low prediction scores to each miRNA during learning, and their overall relative importance is controlled with the hyperparameter $\alpha \,\in \,[{{{\mathrm{0,1}}}}]$. Typically, $\alpha \,\ll 1$ is used. Unlabeled small molecules have unknown biological activity against targeted miRNA $u$, and they are introduced here due to the small-size chemical dataset available for training the model. The goal of the second term is to allow the neural network to learn from a broader range of chemical space that is mapped to a low probability score; our modeling was motivated by our recent work on zero-driven regularization in non-negative matrix decomposition models27,28.Predicting small molecules targeting miRNAs or their targets in Homo sapiens
We trained and tested sChemNET using small-size chemical datasets with labeled information about the bioactivities of each small molecule on miRNAs. We used the Small Molecule to miRNA (SM2miR)20 database to obtain manually curated information on small molecules that affect the expression levels of either specific miRNAs or their corresponding mRNA targets (see “Methods” section). Our dataset only provides positive label information, that is, true positives, or ${y}_{{iv}}=1$ (Eq. 1). There is not explicit source for negative labels and ${y}_{{iv}}=0$ in Eq. (1) represents unlabeled small molecule-miRNA associations. In SM2miR, we found several small molecule-miRNA associations across 18 species (Supplementary Fig. 1). For Homo sapiens, we used 1102 associations between 131 small molecules and 126 miRNAs (see Supplementary Fig. 2a). The number of bioactive small molecules for each miRNA varies between 5 to 35, and its distribution follows a long-tailed pattern (see Supplementary Fig. 2a). The average number of shared bioactive small molecules for miRNAs is $1.77\pm 1.51$ (mean $\pm \,{{{{\rm{std}}}}}$), indicating that most miRNAs tend to share a small number of bioactive small molecules (see distribution in Supplementary Fig. 3). To obtain a large set of unlabeled small molecules, we used the Drug Repurposing Hub database29, which contains structurally and therapeutically diverse small molecules that have reached clinical trials; including most FDA-approved drugs. We obtained 6,302 unlabeled small molecules with unique PubChem CIDs that we used together with the 131 small molecules from SM2miR to build a set of 6,433 small molecules. Chemical input feature information for each small molecule was obtained from their MACCS chemical fingerprints calculated from their SMILES representation. Sequence similarities between miRNAs were obtained by re-scaling the Needleman-Wunsch score obtained using miRNA mature sequences from the miRBase database30 (see “Methods” section and Supplementary Fig. 4).sChemNET’s ability to integrate large amounts of chemical structure information in the presence of small bioactive chemical datasets allows us to simulate a realistic scenario in which a small molecule biologically active against a miRNA is recovered from a large pool of chemicals. To this end, for each known bioactive small molecule-miRNA association, we built a test set containing 4000 small molecules, where only one was experimentally determined to be bioactive, and 3999 were randomly selected small molecules currently unknown to affect miRNAs (see Fig. 2a). We then performed a systematic evaluation using leave-one-out cross-validation (LOOCV) for all the miRNAs (see “Methods” section). For each miRNA, we trained sChemNET with the remaining labeled and unlabeled small molecules and used the trained model to rank all the 4000 small molecules in the test set by their predicted scores. The model’s prediction performance was assessed based on the percentage of known bioactive small molecules that could be retrieved amongst the top 100, 300, 500, or 1000 predicted small molecules.Fig. 2: sChemNET outperforms other methods at predicting small molecules bioactive against 125 miRNAs in Homo sapiens.a Training and testing set with labeled and unlabeled chemical structure information built in. Molecules and activity data were selected from the SM2miR and the Drug Repurposing Hub databases. We used these to assess the prediction performance of sChemNET and other computational approaches under a leave-one-out cross validation procedure. (Left) The training set consists of labeled and unlabeled small molecules. The labeled compounds are small molecules known to be bioactive against at least one miRNA from Homo sapiens. The unlabeled set of compounds consists of ~2400 randomly selected small molecules without known activity against the set of miRNAs. (Right) The testing set consists of 4000 small molecules of which only a single small molecule is known to be bioactive against the specific miRNA under evaluation. b Prediction performance of eight computational methods obtained under our leave-one-out cross-validation procedure. Notched boxplots show the distribution around the median of the percentage of recall obtained for 125 miRNAs (y-axis) when retrieving top-K (100, 300, 500, 1000) small molecules from the test set. (Left) When considering all the 1102 instances; (Right) When considering only chemically dissimilar instances between training and testing sets. sChemNET was run with and without (${s}_{{uv}}=1$) miRNA sequence similarity information in its loss function. FNN stands for Feed-Forward Neural Network and “random” for scores sampled from a uniform distribution. The chemical similarity baseline ranks all the small molecules in the test set based on the max 2D Tanimoto chemical similarity to the bioactive small molecules in the training set. The distribution shows the variation of recall across the $n=$125 miRNAs for all the methods. For the boxplots the center line represents the median and the lines extending from both ends of the box indicate the quartile (Q) variability outside Q1 and Q3 to the minimum and maximum values. The notch represents the 95% confidence interval of the median.Following Stokes et al.31, to select the model hyperparameters- number of hidden units ($n$), unlabeled regularization parameter ($\alpha$), number of epochs, learning rate, and dropout-, we used a Bayesian optimization approach to hyperparameter search based on a LOOCV of the small molecules known to target miR-224-5p. This miRNA was randomly selected and excluded from further evaluation analysis. sChemNET performed well with $n=16$, $\alpha=0.286$, dropout = 0.174, and learning rate = 0.0346.Figure 2b (Left) shows sChemNET’s prediction performance at retrieving bioactive small molecules for 125 miRNAs in Homo sapiens. The recall is shown as a percentage ($y$-axis) as a function of the top-$k$ number of small molecules retrieved from the test set ($x$-axis). The performance of sChemNET, shown with and without (${s}_{{uv}}=1$) integrating sequence similarity information is compared with four machine-learning baseline methods that were trained using the same input feature information as sChemNET: XGBoost, Logistic Regression (LR), Random Forest (RF), and a Feed-Forward Neural Network (FNN), and two other baseline approaches that rank each of the 4000 small molecules in the test set based on: (i) the maximum Tanimoto chemical similarity to the set of bioactive small molecules in the training set (chemical similarity, green bars) or (ii) random scores assigned to each small molecule when sampling from a uniform distribution between 0 and 1 (random, brown bars) (see “Methods” section). sChemNET outperforms the baseline methods at different numbers of predictions retrieved by 1–9% for the top 100 small molecules retrieved from the test set, 7–21% (top 300), 5–33% (top 500), and 8-29% in the top 1000. We found that the average improvement in prediction performance of sChemNET over all the competitors is statistically significant (Supplementary Fig. 5). sChemNET achieves good prediction performance even without using sequence similarity information in the loss function (see also Supplementary Fig. 6) but with a slight reduction in prediction performance of ~1.81–3.62% across the different top-K thresholds. In Supplementary Fig. 7, we also show that sChemNET outperforms the competitors in terms of the area under the receiver operating characteristic curve (AUROC) obtained for three miRNAs with the largest number of positive labels.A key question regarding the utility of our approach in practice concerns its ability to predict bioactive small molecules chemically dissimilar from those available for training the model. Figure 2b (Right) shows the prediction performance of sChemNET when considering instances in which the bioactive small molecules in the test set were chemically dissimilar from those available in the training set (Tanimoto similarity <0.6, see “Methods” section). sChemNET significantly outperforms the baseline methods by 5-9% in the top 100 small molecules retrieved from the test set, 10-24% (top 300), 10-40% (top 500), and 12-34% in the top 1000. Our findings suggest that sChemNET could be helpful in the discovery of novel small molecule modulators of miRNAs or their downstream targets.The small size of the best-available labeled chemical dataset, which we used for training sChemNET, prompted us to ask whether sChemNET prediction performance varies with the number of bioactive small molecules available for each miRNA target. Supplementary Figure 8 shows the predicted rank of the active small molecules as a function of the number of labels available for training. sChemNET effectively retrieves bioactive small molecules even when as few as four or five bioactive small molecules are available for training the model.We further performed a prospective evaluation in which we used all our available data from SM2miR 2015 as a training set, and 1180 new associations between 120 small molecules and 123 miRNAs as a test set that we obtained from the RNAInter 2022 database32. This evaluation is a realistic scenario that preserves the chronological order in which the information becomes available. sChemNET outperforms all the baselines methods in the prospective evaluation (Supplementary Fig. 9).Predicting small molecules targeting miRNAs or their targets in model organismsTo understand whether sChemNET can be helpful for chemical datasets available for mammalian model organisms, we assessed its prediction performance in small molecule-miRNA datasets available for Mus musculus and Rattus norvergicus. Since fewer miRNA-small molecule associations are known for these models than for Homo sapiens (see the distribution of labeled information in Supplementary Fig. 2b, c), we combined miRNA information from Homo sapiens to train sChemNET for each model organism (see Fig. 3a and “Methods” section). Like our evaluation for Homo sapiens, we combined chemical data from the Drug Repurposing Hub to obtain a broader range of unlabeled chemical structures and performed a LOOCV procedure on the bioactive small molecules against miRNA targets in Mus musculus and Rattus norvergicus, respectively (see “Methods” section). For Mus musculus, we used 272 associations between 44 small molecules known to be bioactive against 43 miRNAs, and for Rattus norvergicus, we used 78 associations between 32 small molecules known to be bioactive against 13 miRNAs.Fig. 3: sChemNET prediction performance evaluation on mouse and rat miRNAs.a Small molecule and miRNA set for mammalian model organisms (Mus musculus and Rattus norvergicus) were combined with those available for Homo sapiens for training sChemNET for predicting small molecule-miRNA associations available for these organisms. Silhouettes of model organisms were obtained from https://www.phylopic.org/. b The percentage of bioactive small molecules correctly retrieved from the test set for different numbers of small molecules retrieved by each method under a leave-one-out cross validation procedure. Only chemically dissimilar instances were considered between training and testing sets (Tanimoto chemical similarity <0.6). (Left) Recall obtained for 272 small molecule-miRNA associations for $n=$43 miRNAs from Mus musculus; (Right) Recall obtained for 78 small molecule-miRNA associations for $n=$13 miRNAs from Rattus norvergicus. For the boxplots the center line represents the median and the lines extending from both ends of the box indicate the quartile (Q) variability outside Q1 and Q3 to the minimum and maximum values. The notch represents the 95% confidence interval of the median.Figure 3b (Left) shows the prediction performance of sChemNET in Mus musculus miRNA data when considering only chemically dissimilar instances in the test set. We observed that sChemNET performs best without using sequence similarity information, and it can retrieve more than 43% of bioactive small molecules within the top 25% of prediction retrieved. Similarly, the prediction performance of the different methods for chemically dissimilar instances of bioactive small molecules for miRNAs in Rattus norvergicus is shown in Fig. 3b (Right). In this dataset, sChemNET outperforms the competitors by 6.18–24.67% in the top 300 (7.5%) of predictions retrieved and by 2.74–20.50% in the top 1000 (12.5%). Logistic regression performs 0.726% better than sChemNET in the top 100 (2.5%). The prediction performance for mammalian organism when using all the small molecule-miRNA instances (i.e. without controlling for chemical similarities) is shown in Supplementary Fig. 11.Mapping the effects of drugs on miRNAs and experimental validations for miR-451sChemNET’s effectiveness at computationally predicting small molecules bioactive against miRNA activity prompted us to ask whether we could generate a map between miRNAs and small molecules’ pharmacological and chemical classes. To generate the mapping, we calculated the enrichment of the drug mode of action (MoA) and drug indications for ~127 out of ~6300 small molecules predicted in the 98th percentile score for each miRNA belonging to Homo sapiens (see “Methods” section). Figure 4b, c below shows a heatmap for the enrichment obtained for drug MoA and indications for selected miRNAs. In the heatmaps, miRNAs are ordered based on their distance in tissue-specific expression patterns using data from human donors obtained from the miRNA Tissue Atlas database33 (Fig. 4a).Fig. 4: Heatmaps of miRNA tissue expression, enriched small molecules mode of action and drug indications predicted by sChemNET for selected miRNAs from Homo sapiens.a miRNAs were ordered using hierarchical clustering based on their z-score expression levels on human tissues from the Tissue Atlas database; Red areas indicate expression and the color is proportional to the z-score. b Enrichment values of drug indications predicted by sChemNET for each miRNA. The color is proportional to the ${-\log }_{10}{adjP}$, where ${{{{\rm{adjP}}}}}$ is the Benjamini-Hochberg corrected p-value calculated using Fisher’s Exact Test to keep an overall significance below 0.05. White areas indicate non-significant; (c) Enrichment values of drug mode of action predicted by sChemNET for each miRNA. The color is proportional to the, where ${{{{\rm{adjP}}}}}$ is the Benjamini–Hochberg corrected p-value calculated using Fisher’s Exact Test to keep an overall significance below 0.05. White areas indicate non-significant.We investigated several compelling associations observed in Fig. 4 in more detail. We first focus our attention to miR-451, an erythrocyte-specific miRNA. To experimentally validate if sChemNET-predicted associations for miR-451 are phenotypically and physiologically relevant, we incubated zebrafish embryos with different drug candidates with the potential to modulate the miR-451 response. Zebrafish embryos are an optimal model for validating small molecules as they are transparent and enable testing the physiological effect of the drugs in the whole organism. Since miR-451 is expressed only in erythrocytes, we focused our analysis on the progress of erythrocyte maturation. 48 hours after fertilization, embryos display robust blood circulation. At this stage, the accumulation of mature erythrocytes can be easily assessed in transparent embryos using O-dianisidine, a hemoglobin-specific stain34. All the drugs were tested in wild-type zebrafish embryos in combination with phenyl-thiourea, a chemical known to induce anemia due to oxidative stress when miR-451 activity is impaired, but not in wild-type embryos35,36. In this PTU-sensitized background, drugs impairing miR-451 activity induce anemia, while miR-451 boosting drugs will increase erythrocyte production (Fig. 5a).Fig. 5: Experimental validation on Zebrafish embryos of drugs predicted to modulate the activity of miR-451 or the expression of its targets.a Depiction of our experimental design. Zebrafish embryos were incubated with different drug candidates predicted by sChemNET in combination with phenyl-thiourea (PTU), a chemical known to induce anemia due to oxidative stress when miR-451 activity is impaired, but not in wild-type embryos. 48 hours after fertilization, embryos display robust blood circulation. At this stage, the accumulation of mature erythrocytes can be easily assessed in transparent embryos using O-dianisidine, a hemoglobin-specific stain. Drugs impairing miR-451 activity induce anemia, while miR-451 boosting drugs will increase erythrocyte production (blood circulation). b Ventral images of 2-day-old embryos stained with O-dianisidine to reveal hemoglobinized cells (brown staining) for wild-type embryos and those treated with docetaxel, $\beta$-elemene and $\alpha$-calcidol. Blood accumulates in the ventral region (ducts of Cuvier). c Lateral view of another group of embryos to reveal accumulation of the excess of blood in the tail region upon treatment with docetaxel. d Northern blot quantification of miRNA expression in zebrafish embryos treated with drug candidates. Total RNA extracted from 2-day-old embryos was analyzed by Northern blot to reveal the expression of miR-451, miR-144, miR-15, and let-7 under different drug treatments. e Quantification of miRNA expression based on the radioactive signal of miRNA probes after Northern blot assay. Bars represent averaged normalized miRNA expression from three representative experiments (white circles). Error bars indicate the standard error of the mean. **p-value = 0.0091 One-way ANOVA test.We selected three small molecules for the experimental validation on miR-451 response: (i) the tubulin polymerization inhibitor docetaxel, predicted in sChemNET’s top-3 position and also known to target BCL2, a known gene target of miR-451; (ii) the vitamin D receptor agonist $\alpha$-calcidol, predicted in sChemNET’s top-5 position; and (iii) $\beta$-elemene, predicted in sChemNET’s top-71 position.We treated the embryos with docetaxel to experimentally validate our first candidate drug. Figure 5b shows that consistent with the predictions of sChemNET, docetaxel causes a higher accumulation of blood in the ventral region of treated embryos compared to untreated siblings. This finding confirms that docetaxel has a physiological effect on miR-451-induced erythropoiesis. Higher doses of docetaxel (25 µM) further induced erythrocyte production, and erythrocytes started to pool in the tail region (Fig. 5c).Our second candidate compound was $\alpha$-calcidol, motivated by sChemNET predictions for miR-451 in Fig. 4c, which shows enrichment for vitamin D receptor agonists (adjusted significance $p < 8.40\times {10}^{-22}$). To test this hypothesis, we treated zebrafish embryos with $\alpha$-calcidol. We observed blood accumulation associated with $\alpha$-calcidol treatment on the ventral region of embryos, even at concentrations as low as 10 nM (see Fig. 5b). Finally, our third candidate compound was $\beta$-elemene due to its ability to bind to miR-451 targets MMP2 and MMP937,38. Our experiments also confirm that $\beta$-elemene treatment also induces excess blood in the ventral region of the embryos (Fig. 5b).To elucidate whether these compounds increased erythrocyte maturation by stimulating miR-451 biogenesis or through modulation of its network of mRNA targets, we analyzed the miR-451 and other miRNA levels by Northern blot (see Fig. 5d, e and Supplementary Fig. 12). Our analyses revealed that miR-451 expression levels did not change upon drug treatment compared to untreated embryos. Consistent with these results, we did not observe changes in miR-144, another erythrocyte-specific miRNA expressed from the same primary transcript as miR-45139. These results suggest that the drugs tested elicit a transcriptional response that mimics the effect of miR-451-mediated regulation.Only accumulation of let-7 shown in Fig. 5e, expressed in the hematopoietic tissue and elsewhere in the embryo, increased significantly upon treatment with $\alpha$-calcidol, a drug known to increase Dicer expression and hence miRNA processing. Since miR-144 but not miR-451 is also processed by Dicer, we conclude that $\alpha$-calcidol affects Dicer expression outside the hematopoietic tissue.miRNAs, Vitamin D, and the example of the miRNA-181 isotype familyThe most striking association between miRNAs and a drug proved to be vitamin D which was associated with most of the miRNAs examined (last row of Fig. 4c). Initially, this may seem anomalous, until it is recognized that the active form of vitamin D, calcitriol or 1,25-dihydroxy vitamin D (1,25(OH)2D), is active in every tissue and recently been discovered to be central in regulating mitochondrial function which is essential for all tissues.Our experiments in zebrafish embryos indicated that the VDR agonist $\alpha$-calcidol acts directly on miRNA processing. The observed upregulation of mature let-7 most likely occurs by Dicer overexpression. Since Dicer is a central component of the miRNA processing pathway, we analyzed if other miRNAs are associated with other VDR agonists. To experimentally assess the accuracy of miRNAs predicted for calcitriol, we first investigated the correspondence of miRNAs in human neuroblastoma cells (SH-SY5Y) treated with calcitriol using miRNA sequencing. Following 24 h treatment, we observed a small number of miRNAs were differentially expressed in SH-SY5Y cells (Fig. 6a). Two of our predicted miRNAs hsa-miR-424-5p and hsa-miR-19a-3p were upregulated, and four predicted miRNAs were reduced (hsa-miR-21-5p, hsa-miR-92a-3p, hsa-miR-323a-3p, and hsa-miR-328-3p). The mean of sChemNET’s distribution of predicted rank for calcitriol-miRNA interactions was lower for significant miRNAs (mean rank 245) than for the non-significant ones (mean rank 288; see Fig. 6b).Fig. 6: Experimental validation of miRNAs targeted by the vitamin D receptor agonist calcitriol.a Enhanced volcano plot of miRNAs differentially expressed in SH-SY5Y cell lines upon drug treatment. P-value was obtained with two-way ANOVA test and contrasts were determined between control and treated using Fisher’s Least Significant Difference (LSD); (b) sChemNET predicted rank for calcitriol for the group of miRNAs found to be non-significant (NS) vs. the group found to be significant in terms of p-value and log2FC (S) in SH-SY5Y cell lines ($n=64$ for NS and $n=6$ for S). For the boxplots the center line represents the median and the lines extending from both ends of the box indicate the quartile (Q) variability outside Q1 and Q3 to the minimum and maximum values; (c) Enhanced volcano plot of miRNAs differentially expressed under treatment with calcitriol on endothelial progenitor cells and control cells derived from the bone marrow of male Sprague-Dawley rats. P-values were obtained with DESeq2 by the Wald test and adjusted for multiple testing using the Benjamini–Hochberg method. d ChemNET predicted rank for calcitriol for the group of miRNAs found to be non-significant (NS) vs. the group found to be significant in terms of p-value and log2FC (S) in miRNAs from Rattus Norvergicus ($n=7$ for NS and $n=5$ for S). For the boxplots the center line represents the median and the lines extending from both ends of the box indicate the quartile (Q) variability outside Q1 and Q3 to the minimum and maximum values. e Box plots for the droplet digital PCR (ddPCR) quantification of the copies/ul of miR-181a-5p, miR-181b-5p, miR-181c-5p, and miR-181d-5p in human non-metastatic MCF10CA1a and metastatic MCF10CA1a-ras breast cancer cell line with and without calcitrol treatment determined by ddPCR. The significance between groups is shown by the p-value. There are $n=3$ replicates for each condition. For the boxplots the center line represents the median and the lines extending from both ends of the box indicate the quartile (Q) variability outside Q1 and Q3 to the minimum and maximum values. The p-values were determined by two-side pairwise comparison.We further tested sChemNET predictions on model organisms. To assess the predictions for miRNAs from Rattus norvergicus, we used previously published data on the expression of miRNAs between calcitriol-treated endothelial progenitor cells and control cells derived from the bone marrow of male Sprague-Dawley rats40 (Fig. 6c and “Methods” section). Figure 6d shows the mean of sChemNET’s distribution of predicted rank for calcitriol-miRNA interactions was lower for significant miRNAs (mean rank 192) than for the non-significant ones (mean rank 434). Our analysis suggests that sChemNET’s predictions can also be helpful for small-sized miRNA chemical datasets available for model organisms.Vitamin D or its active form calcitriol have long been associated with calcium and phosphate metabolism, which are directly modulated by the mitochondrion, and vitamin D has been associated with regulation of mitochondrial respiration, reactive oxygen specific (ROS) production, cell proliferation, and cell death. Vitamin D acts through the Vitamin D Receptor (VDR) and silencing of the VDR in a variety of cultured human cells not only modulated mitochondrial respiration, ROS production, and apoptosis, it downregulated the protein levels of critical oxidative phosphorylation (OXPHOS) proteins coded in both the mtDNA (COX2 and ATP6) and the nDNA (COX5 and ATP5B)41.Regardless of the developmental target of a miRNA, it would be essential that the miRNA also modulate mitochondrial bioenergetics to have an integrated effect on the cellular and developmental function. This is powerfully demonstrated by miR-2392 which not only enters the mitochondrion to bind to the mtDNA but also has “seed” binding sites in in 362 nDNA coded mRNAs8,42. miRNA-181 provides an example of the critical importance for a miRNA to regulate both developmental as well as mitochondrial functions. miRNA-181 is developmentally regulated, predominantly expressed in the multiple areas of the brain (Fig. 4a), though it is also active in immune, neuronal, and heart tissues43,44,45. As predicted, miRNA-181 has been found to be a powerful negative effector of mitochondrial biogenesis, mitophagy, and apoptosis45,46.There are four mature forms of miR-181 (miR-181a-5p, miR-181b-5p, miR-181c-5p, miR-181d-5p). These are transcribed from three chromosomal clusters: miR-181-a1 and miR-181-b1 on chromosome 1, miR-181-a2 and miR-181-b2 on chromosome 9, and miR-181c and miR-181d on chromosome 1945. In neuronal cells, miR-181a/b act within the cytosol to reduce OXPHOS in favor of glycolysis through inhibition of the mRNAs for the master mitochondrial biogenesis transcription factor, peroxisome proliferator-activated receptor gamma coactivator 1-alpha (PGC-1α) gene PPARGC1A, the mitochondrial nuclear regulator factor 1 gene (NRF1), as well as the structural gene mRNAs COX11, COQ10B, and PRDX344,45,47. In the heart, miR-181c enters the mitochondrion and binds to the mtDNA COXI transcript resulting in suppression of OXPHOS48.Given that the four different isotypes of miR-181 all affect mitochondrial function, but have subtly different mRNA targets, if follows that the clinical effects of vitamin D would differentially overlap with the functional profiles of the different miR-181 isotypes. Since it is established that miR-181 directly regulates mitochondrial biogenesis and bioenergetics, and the requirement for mitochondrial function is ubiquitous, it follows that predictions made from the function of a wide range of miRNAs would also modulate mitochondrial function and thus be related to vitamin D metabolism.miR-181s have been shown to be overexpressed in several cancer types49,50,51 including breast cancer, and has been demonstrated to be involved with greater proliferation, invasiveness, and metastasis when overexpressed52. There has been evidence of dysregulation of the miR-181 family in a number of cancer types, including colorectal, breast, lung, and prostate cancers45,53. Depending on the target genes involved, studies have demonstrated that miR-181s can function as tumor suppressors54. In addition, it has been reported that VDR agonists have the ability to alter the expression of miR-181 in cancer cells55. VDR agonists with the capacity to control miRNA expression has been identified as possible cancer treatment drugs56.To experimentally determine the impact of vitamin D receptor agonist on miR-181 family (miR-181a, b, c and d) and breast cancer, we utilized a non-metastatic MCF10CA1a and metastatic MCF10CA-ras breast cancer cell line with and without calcitriol treatment (Fig. 6e). We then quantified miRNA concentration for each condition using droplet digital PCR (ddPCR; see “Methods” section). As expected, the miR-181a-5p and miR-181c-5p increased in copies/$\mu l$ when comparing the metastatic to the non-metastatic cell line. On average, the calcitriol treatment reduced the amounts of the miR-181 family for the metastatic cell line. For the non-metastatic cell line, calcitriol caused an increase in the amount of miR-181a-5p, with no difference in miR-181b-5p and a decrease in miR-181c-5p and miR-181d-5p (Fig. 5e).

sChemNET: a deep learning framework for predicting small molecules targeting microRNA function

SPLANG—a synthetic poisson-lognormal-based abundance and network generative model for microbial interaction inference algorithms

Quantum chemical calculation dataset for representative protein folds by the fragment molecular orbital method

Predicting non-responders to lifestyle intervention in prediabetes: a machine learning approach

Massive lost mountain cities revealed by lasers

AI-Generated Annotations Dataset for Diverse Cancer Radiology Collections in NCI Image Data Commons

Hot Topics

SPLANG—a synthetic poisson-lognormal-based abundance and network generative model for microbial interaction inference algorithms

Quantum chemical calculation dataset for representative protein folds by the fragment molecular orbital method

Predicting non-responders to lifestyle intervention in prediabetes: a machine learning approach

Related Articles

Balancing Act: Pregnancy and Bipolar Disorder

Cohesion at the cellular level: flexible yet stable

Gut bacteria influence responses to immunotherapy in patients with asbestos related cancer

Quick Links

Must Read

SPLANG—a synthetic poisson-lognormal-based abundance and network generative model for microbial interaction inference algorithms

Quantum chemical calculation dataset for representative protein folds by the fragment molecular orbital method

Predicting non-responders to lifestyle intervention in prediabetes: a machine learning approach

Massive lost mountain cities revealed by lasers

Popular Articles

SPLANG—a synthetic poisson-lognormal-based abundance and network generative model for microbial interaction inference algorithms

Quantum chemical calculation dataset for representative protein folds by the fragment molecular orbital method

Predicting non-responders to lifestyle intervention in prediabetes: a machine learning approach