Expanding drug targets for 112 chronic diseases using a machine learning-assisted genetic priority score

Construction of machine learning models to predict phecode diagnosesWe screened 3612 phecodes included in phecodeX to identify 336 phecodes corresponding to non-communicable chronic disease processes across 11 phecode categories (Supplementary Data 1). To identify phecodes associated with chronic physiologic changes, we used LightGBM to construct preliminary gradient boosting models using only age, sex, and 72 laboratory and vital measurements. Models for 112 of 336 phecodes achieved mean areas under the receiver operating characteristic curve (AUROCs) ≥ 0.70 and areas under the precision-recall curve (AUPRCs) ≥ the prevalence of the phecode (Supplementary Fig. 1a; Supplementary Data 2). Model performance was unequally distributed across different phecode categories; models in the endocrine/metabolic, blood/immune, and cardiovascular categories had the highest mean AUROCs, whereas models in the musculoskeletal, dermatological, and sense organ categories had the lowest mean AUROCs.For the 112 phecodes with model performance above our thresholds, we constructed final models that incorporated 165 additional features, including lifestyle factors, medication usage, and diagnostic history (Fig. 1). LightGBM models were robust to feature selection (Supplementary Data 3), and either outperformed or were comparable to XGBoost and random forest models (Supplementary Data 4). Final models had high discrimination, with a median AUROC of 0.85 [interquartile range (IQR) 0.08] (Fig. 2a), and high calibration, with a median Brier score of 0.01 [IQR 0.02] (Supplementary Data 2).Fig. 1: Workflow for constructing ML-GPS.Workflow for constructing ML-GPS, including machine learning models to predict phecode diagnoses across 112 phecodes, genetic association analyses using both observed and predicted phenotypes, and integration of genetic associations with existing genetic evidence.Fig. 2: Generation of and genetic association analyses with predicted phenotypes.a Mean AUROCs (blue) of final models for 112 of 386 phecodes meeting performance thresholds (AUROC ≥ 0.70, AUPRC ≥ phecode prevalence). Numbers at the top of the graph indicate the number of phecodes in each phecode category; each phecode is represented as a grey dot in the background. AUROCs were calculated among 183,021 UK Biobank participants with GP records (see “Study sample” in the Methods section). b Number of genes identified by P (blue), B (orange), and C (green) in common, rare, and ultra-rare variant analyses across 112 phecodes. For common and rare variant analysis, “gene” refers to any gene with a significant variant, whereas for ultra-rare variant analyses, “gene” refers to any gene with a significant test. c Odds ratios for drug indications in Open Targets with 13 variables included in ML-GPS. Note that these odds ratios are for binary encoded variables, whereas ML-GPS uses continuous encoded variables as features (see “Genetic priority scores” in the Methods section). d Odds ratios for drug indications in Open Targets with B-P and C-P; these represent genes identified by B and C not identified by P, respectively. Note that B-P and C-P are not ML-GPS features and are included solely for comparison. Plots c,d represent logistic regression analyses of 112,274 gene-phecode pairs in Open Targets, of which 4116 had a drug indication. Plots a, c and d show means with 95% confidence intervals. Source data are provided as a Source Data file. Abbreviations: AUROC (area under the receiver operating characteristic curve); AUPRC (area under the precision-recall curve); P (observed case/control); B (binarized model probabilities/predicted case-control); C (continuous model probabilities).Compared to preliminary models, these models had median increases in AUROCs and AUPRCs of 0.08 [IQR 0.06] and 0.06 [IQR 0.08], respectively. Further, for 13 phecodes partially definable using a single laboratory or vital biomarker (e.g., hypertension:systolic blood pressure, type 2 diabetes:hemoglobin A1c, and obesity:body mass index), both preliminary and final models outperformed the biomarker in both AUROC and AUPRC for phecode diagnosis (Supplementary Fig. 1b; Supplementary Data 5). Finally, across all phecodes, each quintile increase in predicted phecode probability corresponded to a median odds ratio (OR) of 3.24 for observed phecode presence, whereas participants in the highest quintile had a median OR of 45.97 for observed phecode presence compared to those in the lowest quintile (Supplementary Data 6).There was diverse feature usage across models: for 70 of the 112 phecodes, three or more feature categories (i.e., demographics, measurements, lifestyle factors, medication usage, and diagnostic history) were represented among the top 10 model features (Supplementary Data 7). Important model features were generally consistent with known characteristics of each phecode, such as erythrocyte distribution width and hemoglobin for iron deficiency anemia (BI_160.1) and urate and antigout preparation (M04A) usage for gout (MS_703.1). For eight phecodes, the top 10 model features were all based on diagnostic history; this is consistent with our prior report demonstrating that the presence of some diagnoses can inform the presence of other diagnoses17.Many chronic diseases increase mortality risk1, and 93 of the 112 phecodes were significantly associated with all-cause mortality in the UK Biobank (Supplementary Data 8). The maximum hazard ratio across phecodes was 7.98 (95% CI 7.37–8.64) for CV_420 (cardiac arrest). We also observed that increased quintile of predicted probability was positively associated with all-cause mortality for 110 of the 112 phecodes [all but CM_751.4 (congenital glaucoma) and MS_722.4 (palmar fascial fibromatosis)]. This association was present separately among cases and controls for 68 and 110 of the 112 phecodes, respectively, suggesting that predicted probabilities are associated with increased disease severity and identify probable disease underdiagnosis. Together, these results demonstrate that predicted probabilities are associated with disease risk, severity, progression and underdiagnosis.Analysis of genetic associationsWe modeled the allelic series of each gene-phecode pair (Fig. 1). We performed genome-wide association testing of common variants [minor allele frequency (MAF) ≥ 0.01], exome-wide association testing of single rare coding variants (0.0001 ≤ MAF < 0.01) that were missense or loss-of-function (LOF), and gene-level testing of ultra-rare coding variants (MAF < 0.0001) that were deleterious missense or LOF for three different phenotypes: observed phecode case/control status (P), binarized model probabilities (B), and continuous model probabilities (C).For rare variant analyses, median inflation factors (λ) were 1.04 (P), 1.04 (B), and 1.03 (C). For common variant analyses, median λs were 1.03 (P), 1.06 (B), and 1.34 (C), whereas for ultra-rare variant analyses, median λs were 0.76 (P), 0.89 (B), and 1.03 (C). The increased λ for C in common variant analyses may be attributable to increased identification of causal variants under polygenic inheritance18, whereas the decreased λs for P and B in ultra-rare variant analyses may be attributable to our inclusion of only deleterious missense and LOF variants.Across all phecodes, C identified substantially more genes with significant variants/tests than B, which identified more genes than P (Fig. 2b). Specifically, P, B, and C identified at least one gene for 64, 75, and 111 phecodes for common variant analyses; 40, 46, and 108 phecodes for rare variant analyses; and 53, 61, and 109 phecodes for ultra-rare variant analyses, respectively (Supplementary Data 9). For common variant analyses, B and C identified a median of 30% [IQR 70%] and 34% [IQR 77%] of genes identified by P, respectively, demonstrating the overlap between predicted and observed phenotypes. The percentage of genes identified by P that were also identified by C was significantly associated with the AUROC of the model (β = 3.79, 95% CI 1.72–5.87, p = 5.4 × 10−4) (Supplemental Table 1), suggesting that models with higher discrimination better represent the observed phenotype. Additionally, for common, rare, and ultra-rare variant analyses, C identified only 71% [IQR 50%], 50% [IQR 100%], and 80% [IQR 50%] of genes identified by B, respectively, despite B being a binarization of probabilities used for C. Finally, for each of P, B, and C, median absolute effect sizes per gene were higher for rare and ultra-rare compared to common variant analyses, including in pairwise comparisons of the same genes (Supplemental Table 2).Association of genetic features with drug indicationsGenetic analyses with predicted phenotypes increased the identification of drug indications at the phecode level, with B and C identifying one, two, or more than two genes with drug indications for a greater number of phecodes compared to P (Supplementary Fig. 2a,b). For common variant analyses, C identified a greater number of genes with drug indications than P for 25 phecodes, and for 16 of these phecodes, P did not identify any such genes. This was also true of 9 and 8 phecodes for rare variant analyses and 10 and 9 phecodes for common variant analyses, respectively.Consistent with our prior report7, gene-phecode pairs with existing evidence from EVA-ClinVar, HGMD, OMIM, and L2G were significantly associated with drug indication, with ORs in Open Targets of 6.61 (95% CI 4.50–9.70), 4.87 (95% CI 4.13–5.76), 13.20 (95% CI 7.58–22.99), and 6.68 (95% CI 5.20–8.58), respectively (Fig. 2d; Supplemental Table 3).For common variant analyses, P, B and C had ORs of 7.56 (95% CI 5.08–11.26), 6.28 (95% CI 4.55–8.68), and 3.19 (95% CI 2.53–4.03), respectively (Fig. 2d; Supplemental Table 3). There were no significant differences in ORs between P, B, and C for rare or ultra-rare variant analyses. For rare variant analyses, P, B, and C corresponded to ORs of 16.46 (95% CI 5.95–45.59), 15.62 (95% CI 7.16–34.06), and 8.75 (95% CI 5.17–14.80), respectively, whereas for ultra-rare variant analyses, P, B, and C corresponded to ORs of 6.87 (95% CI 1.95–24.21), 8.66 (95% CI 4.03–18.59), and 4.02 (95% CI 2.35–6.88), respectively. Further, even after subtracting genes identified by P from B and C (i.e., B-P and C-P), we found that these two features were still significantly associated with drug indication, with ORs of 5.21 (95% CI 3.44–7.90) and 2.62 (95% CI 2.01–3.41) for common, 15.25 (95% CI 5.71–40.90) and 7.49 (95% CI 4.18–13.40) for rare, and 10.86 (95% CI 4.49–26.26) and 3.96 (95% CI 2.24–6.99) for ultra-rare variants, respectively (Fig. 2e). Thus, B and C increase the coverage of genes with drug indications.Construction of ML-GPSWe constructed machine learning models to predict whether each distinct gene-phecode pair had an indicated drug. Of 112,274 pairs in Open Targets and 58,674 pairs in SIDER, 4116 and 1883 had indicated drugs, respectively. We included up to 13 features, including three features representing clinical evidence (EVA-ClinVar, HGMD, OMIM), one representing L2G, and nine features incorporating additional evidence from P, B and C common, rare, and ultra-rare variant analyses.We first tested five different model architectures for all 13 features: ElasticNet logistic regression (LR), gradient boosting (GB), GB with continuous feature encoding [GB (CE)], GB (CE) with sample weights based on the number of indicated drugs [GB (CE, number weights)], and GB (CE) with sample weights based on the maximum phase of indicated drugs [GB (CE, phase weights)]. In both Open Targets and SIDER, the GB model significantly outperformed the LR model in AUPRC based on permutation testing (Fig. 3a; Supplemental Table 4), and all three GB models with CE outperformed the GB model without CE. Although there was no significant difference in AUPRC between the three GB models with CE, scores from the GB (CE, phase weights) model resulted in significantly higher ORs for main indication among all drugs and separately among phase IV drugs compared to scores from all other models (Fig. 3b–d). As a sensitivity analysis, we also compared the LightGBM-based GB (CE phase weights) model with XGBoost and random forest models; LightGBM outperformed the latter two models in AUPRC in both Open Targets and SIDER (Supplementary Fig. 3; Supplemental Table 4).Fig. 3: Performance of genetic priority scores with different architectures.a AUPRC for drug indication in Open Targets (holdout testing) and SIDER (external testing). Grey dotted lines show the proportion of gene-phecode pairs with indications in each dataset. b Odds ratios per standard deviation increase in score for any drug indication and separately for drug indications in specific clinical trial phases in Open Targets. Brackets denote the number of gene-phecode pairs with drug indications in each phase. c,d Odds ratios of drug indications for gene-phecode pairs in the top X score percentiles compared to pairs in the 0-50 percentiles in Open Targets (c) and SIDER (d). Plots a–c represent analyses of 112,274 gene-phecode pairs in Open Targets, of which 4116 had a drug indication. Plots a and d represent analyses of 58,674 gene-phecode pairs in SIDER, of which 1883 had a drug indication. All plots show means with 95% confidence intervals. Source data are provided as a Source Data file. Abbreviations: AUPRC (area under the precision-recall curve); LR (logistic regression); GB (gradient boosting); CE (continuous encoding); L2G (locus-to-gene); P (observed case/control); B (binarized model probabilities/predicted case control); C (continuous model probabilities).For the GB (CE, phase weights) model architecture, we next compared the relative contributions of different features by constructing models with L2G, clinical evidence (Clinical), L2G + Clinical, L2G + Clinical + P, or L2G + Clinical + PBC. With each additional set of features, there were significant increases in AUPRC in both Open Targets and SIDER based on permutation testing (Fig. 4a; Supplemental Table 4), with 47.5% and 70.7% increases between the L2G and L2G + Clinical + PBC models in these two datasets, respectively. In Open Targets, each standard deviation in score from the model incorporating evidence from C and B (L2G + Clinical + PBC) corresponded to ORs of 1.26 (95% CI 1.24–1.28) for any drug indication and 1.41 (95% CI 1.37–1.44) for phase IV drug indications (Fig. 4b); these ORs were significantly higher than for scores from all other models and represented 11.6% and 16.9% increases from ORs for the L2G model. Additionally, gene-phecode pairs in the 99–100 compared to 0–50 percentiles for this model had ORs of 6.49 (95% CI 5.60–7.53) and 7.38 (95% CI 6.02–9.03) for drug indication in Open Targets and SIDER, respectively (Fig. 4c,d).Fig. 4: Performance of genetic priority scores with different features.a AUPRC for drug indication in Open Targets (holdout testing) and SIDER (external testing). Grey dotted lines show the proportion of gene-phecode pairs with indications in each dataset. b Odds ratios per standard deviation increase in score for any drug indication and separately for drug indications in specific clinical trial phases in Open Targets. Brackets denote the number of gene-phecode pairs with drug indications in each phase. c, d Odds ratios of drug indications for gene-phecode pairs in the top X score percentiles compared to pairs in the 0-50 percentiles in Open Targets (c) and SIDER (d). Plots a–c represent analyses of 112,274 gene-phecode pairs in Open Targets, of which 4116 had a drug indication. Plots a and d represent analyses of 58,674 gene-phecode pairs in SIDER, of which 1883 had a drug indication. All plots show means with 95% confidence intervals. Source data are provided as a Source Data file. Abbreviations: AUPRC (area under the precision-recall curve); LR (logistic regression); GB (gradient boosting); CE (continuous encoding); L2G (locus-to-gene); P (observed case/control); B (binarized model probabilities/predicted case control); C (continuous model probabilities).We performed a Shapley Additive exPlanations (SHAP) analysis of L2G + Clinical + PBC model predictions in Open Targets to further assess the contributions of each feature to model predictions. The most important features were B (rare variant), C (rare variant), and B (ultra-rare variant); conversely, the OMIM feature had no contribution to final predictions, likely because of redundancy with the HGMD and EVA-ClinVar features (Supplementary Fig. 4). We also analyzed relationships between feature values and SHAP values. For both EVA-ClinVar and HGMD, genes with one clinical variant had higher SHAP values compared to those with none, but additional clinical variants beyond the first did not further increase SHAP values (Supplementary Fig. 5). For L2G, higher scores resulted in higher SHAP values, but in a discrete rather than continuous manner. For P, B, and C features, genes with -log10(p-values) above standard significance thresholds generally had positive SHAP values; however, some genes with -log10(p-values) below these thresholds also had positive SHAP values, demonstrating the utility of continuous encoding of these features.Based on these results, we use scores from the L2G + Clinical + PBC model under the GB (CE, phase weights) model architecture as ML-GPS. Although optimal thresholds for ML-GPS will depend on the user’s goal (e.g., maximizing target coverage for high-throughput screening versus prioritizing a few high-scoring targets for manual screening), we provide precision and recall metrics for different thresholds in Open Targets and SIDER (Supplementary Fig. 6a,b). Precision reflects the proportion of identified gene-phecode pairs with drug indications, whereas recall reflects the proportion of pairs with drug indications that are identified. For example, a ML-GPS threshold of 0.212 (equivalent to 99th percentile on the full dataset of 2,362,626 pairs) balances precision and recall, yielding precision = 0.116 and recall = 0.076 in Open Targets, and precision = 0.137 and recall = 0.094 in SIDER. To prioritize precision, a higher ML-GPS threshold of 0.540 yields precision = 0.400 and recall = 0.014 in Open Targets, and precision = 0.424 and recall = 0.015 in SIDER.Finally, although we could not directly compare ML-GPS with the original GPS due to different phecode definitions7, we compared ML-GPS with a logistic regression model including L2G + Clinical + P features, which approximates GPS. First, there were increases in AUPRC from 0.049 (95% CI 0.045–0.054) to 0.063 (95% CI 0.058–0.069) in Open Targets and from 0.050 (95% CI 0.043–0.056) to 0.066 (95% CI 0.057–0.074) in SIDER (Supplementary Fig. 7a); these represent increases of 28.6% and 32.0%, respectively. Second, ORs per standard deviation increase in score increased from 1.18 (95% CI 1.16–1.20) to 1.26 (95% CI 1.24–1.28) for all drug indications and from 1.27 (95% CI 1.24–1.30) to 1.41 (95% CI 1.37–1.44) for phase IV drug indications (Supplementary Fig. 7b). Third, for the 75-85, 85-95, and 95-98 percentiles of scores in both Open Targets and SIDER, only scores from ML-GPS had ORs greater than one for drug indication (Supplementary Fig. 7c,d), demonstrating the increased coverage of ML-GPS.Construction of ML-GPS with direction of effect (ML-GPS DOE)We extended ML-GPS to predict direction of effect (DOE) in addition to drug indication. ML-GPS DOE is a one-versus-rest classifier that assigns each gene-phecode pair three different probabilities summing to one: probability of no drug indication, probability of an activator drug indication, and probability of an inhibitor drug indication. This differs from our prior implementation of GPS DOE7, which outputs a single positive or negative score based on whether the genetic features are primarily loss- or gain-of-function.In both datasets, there were more inhibitor compared to activator drug indications, with 3019 and 890 in Open Targets and 1288 and 364 in SIDER, respectively. Despite weighting activator drug indications twice as heavily as inhibitor drug indications during training, we still observed higher AUPRCs and ORs for predicting inhibitor compared to activator drug indications. Nevertheless, we similarly observed that the L2G + Clinical + PBC model significantly outperformed all other models for predicting both activator and inhibitor drug indications (Supplemental Table 4).When predicting activator drug indications, the L2G + Clinical + PBC model achieved AUPRCs of 0.018 (95% CI 0.014–0.021) in Open Targets and 0.022 (95% CI 0.014–0.032) in SIDER, respectively (Fig. 5a). In Open Targets, each standard deviation increase in score was associated with an OR of 1.17 (95% CI 1.15–1.20) for any activator drug indication (Fig. 5b), and gene-phecode pairs in the 99-100 compared to 0–50 percentiles had ORs of 6.93 (95% CI 5.28–9.09) in Open Targets and 7.26 (95% CI 4.86–10.86) in SIDER, respectively (Fig. 5c,d). When predicting inhibitor drug indications, the L2G + Clinical + PBC model achieved AUPRCs of 0.052 (95% CI 0.047–0.058) in Open Targets and 0.056 (95% CI 0.046–0.065) in SIDER, respectively (Fig. 6a). In Open Targets, each standard deviation increase in score was associated with an OR of 1.24 (95% CI 1.22–1.26) for any inhibitor drug indication (Fig. 6b), and gene-phecode pairs in the 99-100 compared to 0–50 percentiles had ORs of 6.21 (95% CI 5.24–7.37) in Open Targets and 7.87 (95% CI 6.22–9.94) for inhibitor drug indications in SIDER, respectively (Fig. 6c,d). Given these results, we similarly use scores from the L2G + Clinical + PBC model as ML-GPS DOE.Fig. 5: Performance of direction-of-effect (DOE) genetic priority scores with different features for activator drug indications.a AUPRC for activator drug indications in Open Targets (holdout testing) and SIDER (external testing). Grey dotted lines show the proportion of gene-phecode pairs with indications in each dataset. Inhibitor drug indications were set to 0 (no drug indication). b Odds ratios per standard deviation increase in score for any activator drug indication and separately for drug indications in specific clinical trial phases in Open Targets. Brackets denote the number of gene-phecode pairs with drug indications in each phase. c,d Odds ratios for activator drug indications for gene-phecode pairs in the top X score percentiles compared to pairs in the 0–50 percentiles in Open Targets (c) and SIDER (d). Plots a–c represent analyses of 112,274 gene-phecode pairs in Open Targets, of which 890 had an activator drug indication. Plots a and d represent analyses of 58,674 gene-phecode pairs in SIDER, of which 364 had an activator drug indication. All plots show means with 95% confidence intervals. Source data are provided as a Source Data file. Abbreviations: AUPRC (area under the precision-recall curve); L2G (locus-to-gene); P (observed case/control); B (binarized model probabilities/predicted case control); C (continuous model probabilities).Fig. 6: Performance of direction-of-effect (DOE) genetic priority scores with different features for inhibitor drug indications.a AUPRC for inhibitor drug indications in Open Targets (holdout testing) and SIDER (external testing). Grey dotted lines show the proportion of gene-phecode pairs with indications in each dataset. Activator drug indications were set to 0 (no drug indication). b Odds ratios per standard deviation increase in score for any inhibitor drug indication and separately for drug indications in specific clinical trial phases in Open Targets. Brackets denote the number of gene-phecode pairs with drug indications in each phase. c,d Odds ratios for inhibitor drug indications for gene-phecode pairs in the top X score percentiles compared to pairs in the 0-50 percentiles in Open Targets (c) and SIDER (d). Plots a–c represent analyses of 112,274 gene-phecode pairs in Open Targets, of which 3019 had an inhibitor drug indication. Plots a and d represent analyses of 58,674 gene-phecode pairs in SIDER, of which 1288 had an inhibitor drug indication. All plots show means with 95% confidence intervals. Source data are provided as a Source Data file. Abbreviations: AUPRC (area under the precision-recall curve); L2G (locus-to-gene); P (observed case/control); B (binarized model probabilities/predicted case control); C (continuous model probabilities).As with ML-GPS, we provide precision and recall for different thresholds in Open Targets and SIDER for ML-GPS DOE (Supplementary Fig. 8a–d). For example, for activator drug indications, a probability threshold of 0.084 yields precision = 0.060 and recall = 0.044 in Open Targets, and precision = 0.058 and recall = 0.049 in SIDER. For inhibitor drug indications, a probability threshold of 0.204 yields precision = 0.250 and recall = 0.022 in Open Targets, and precision = 0.280 and recall = 0.035 in SIDER.Analysis of targets and pathways prioritized by ML-GPSWe generated ML-GPS and ML-GPS DOE predictions for all 2,362,636 gene-phecode pairs for which at least one of the 13 features was non-zero. These pairs represented 26,035 distinct genes, of which 18,247 were protein-coding. We directly compared scores from ML-GPS with those from the L2G + Clinical + P model for 127,258 of these pairs where the gene was targeted by any drug in Open Targets or SIDER: among the 5008 pairs with an indicated drug, ML-GPS had higher scores for 55.91% of pairs (Fig. 7a), whereas among the 122,250 pairs without an indicated drug, ML-GPS had lower scores for 58.37% of pairs (Fig. 7b). Similarly, ML-GPS scores ≥ 99th percentile (score > 0.212) had a greater proportion and number of drug indications compared to L2G + Clinical + P scores ≥ 99th percentile (Fig. 7c). These results demonstrate improved identification of drug indications when including C and B as features.Fig. 7: Analysis of targets prioritized by ML-GPS.a, b Direct comparison between scores for ML-GPS versus L2G + Clinical + P models for gene-phecode pairs with a drug indication (a) or without a drug indication (b) in either Open Targets or SIDER. c Number of gene-phecode pairs and the proportion of these pairs with drug indications among ML-GPS and L2G + Clinical + P scores <99th percentile versus ≥ 99th percentile. d Number of gene-phecode pairs and the proportion of these pairs with drug indications among ML-GPS scores <99th percentile versus ≥ 99th percentile and approximated original GPS scores = 0 versus > 0. e,f Proportion of gene-phecode pairs in each score bin with the specified score increase (from L2G + Clinical + P to ML-GPS) with direct (e) or indirect (f) target-disease associations in Open Targets. g Gene set-phecode combinations with the highest normalized enrichment score for ML-GPS. Source data are provided as a Source Data file. Abbreviations: L2G (locus-to-gene); P (observed case control).As evidence of the increased coverage of drug targets offered by ML-GPS, our approximation of the original GPS had non-zero scores for only 9576 of the 2,362,636 gene-phecode pairs [0.4%] (Fig. 7d), representing 5353 distinct genes, 107 phecodes, and 303 drug indications. In contrast, the 23,626 pairs with ML-GPS scores ≥ 99th percentile (score > 0.212) represented 9916 distinct genes, all 112 phecodes, and 696 drug indications; 409 of these indications had no support from the original GPS.The top 23,626 ML-GPS gene-phecode pairs were unequally distributed across phecodes, with EM_239.2 (hyperglyceridemia) having the most pairs (n = 1708) and CV_438.2 (aneurysm of iliac or artery of lower extremity) having the least (n = 26). ML-GPS DOE predicted 2779 of the pairs as more likely to have activator drug indications and 20,847 as more likely to have inhibitor drug indications. Although ML-GPS does not include tractability information as features, many of the prioritized targets appear amenable to drug development: of 9916 distinct genes represented among the top 23,626 pairs, 2589 [26.1%] have either membrane or secreted products, 5014 [50.6%] have favorable tissue specificity, 1458 [14.7%] bind ligands, 1851 [18.7%] bind small molecules, and 618 [6.2%] have predicted pockets (Supplementary Table 5).For 120,728 of all 2,362,636 gene-phecode pairs, there was a large (>30%) increase in score for ML-GPS compared to the L2G + Clinical + P model; these pairs represent targets prioritized only with evidence from the C and B machine learning analyses. We used direct and indirect target-disease associations from Open Targets to examine the evidence supporting these pairs beyond drug indications; these associations include evidence from the published literature and databases not used to construct ML-GPS. A greater proportion of pairs with <10% increase in score had both direct and indirect associations compared to pairs with > 10% increase in score, likely because these pairs have corroborating support from clinical variants, L2G, or P (Fig. 7e,f). However, in the 0.2–0.4, 0.4–0.6, and ≥ 0.6 score bins, 36.6%, 70.6%, and 100% of pairs with 10-20% increase in score and 28.6%, 74.2%, and 100% of pairs with 20-30% increase in score had direct associations, respectively. Further, we manually examined the 50 highest-scoring pairs without drug indications or target-disease associations and found that 33 of these pairs [66%] had supporting genetic, clinical, and/or mechanistic evidence (Supplementary Data 10). These pairs included GBA for NS_324.1 (parkinsonism), USP40 for EM_252.3 (disorders of bilirubin excretion), NAA25 for EM_200.6 (atrophy of thyroid), MMAA for EM_256.3 (mixed disorder of acid-base balance), and PVR for EM_239.1 (hypercholesterolemia).Many of these large score increase pairs represent well-known target-disease relationships, including PCSK9 for EM_239.2 (hyperglyceridemia; score increase from 0.46 to 0.79), ACE for GU_582.2 (chronic kidney disease; score increase from 0.39 to 0.79), GUCY1A1 for CV_401.2 (hypertensive heart disease; score increase from 0.22 to 0.76), NPC1L1 for EM_239.2 (hyperglyceridemia; score increase from 0.11 to 0.65), and ADRB1 for GU_582.2 (chronic kidney disease; score increase from 0.19 to 0.60) (Supplementary Data 11). These are targeted by PCSK9 inhibitors, ACE inhibitors, vericiguat, ezetimibe, and beta blockers, respectively, and ML-GPS DOE correctly predicted the effect direction of all these drugs. However, ML-GPS also identifies viable targets without drug indications, such as LDLR for EM_239.2 (hyperglyceridemia; score increase from 0.17 to 0.73), which ML-GPS DOE predicts as having an activator drug indication. LDLR LOF mutations are associated with elevated plasma triglyceride levels19,20, and LDLR activators are under preclinical investigation for atherosclerosis prevention21. Another is WNT16 for MS_745.9 (pathological fracture; score increase from 0.32 to 0.44), which ML-GPS DOE also predicts as having an activator drug indication; several preclinical studies suggest WNT16 activation may be useful for treating osteoporosis22,23. ML-GPS results could also aid drug development for conditions opposite to the disease phenotype. For example, it identifies TMPRSS6 for BI_160.1 (Iron deficiency anemia; score increase from 0.34 to 0.62): TMPRSS6 mutations cause iron deficiency anemia via elevated hepcidin24, and inhibitors of TMPRSS6 are under investigation for hemochromatosis (iron overload)25. Finally, in cases where ML-GPS targets cannot be directly modulated, indirect modulation or substrate delivery may still be possible: for example, ML-GPS identifies NOS3 (endothelial nitric oxide synthase) for CV_401.2 (hypertensive heart disease; score increase from 0.22 to 0.67), and organic nitrates are commonly used in hypertension and heart disease.Examining the highest scoring ML-GPS gene-phecode pairs overall, we identified additional gene-phecode pairs without drug indications but which had supporting preclinical evidence (Supplementary Data 12). One example is ALOX15 for RE_471.5 (Nasal polyps; score 0.82); ALOX15 is mechanistically linked with airway inflammation26, and ALOX15 inhibitors that reduce nitric oxide production and lipid peroxidation have recently been synthesized27. Another is BMPR2 for CV_406.1 (pulmonary hypertension; score 0.74); although sotatercept, which targets the BMPR-II pathway downstream of BMPR2, demonstrated success in a 2023 phase III trial for pulmonary arterial hypertension28,29, there are no drugs that target BMPR2 directly. ML-GPS also provides supporting evidence for targets currently in phase II/III clinical trials, many of which are first-in-class. One example is LRRK2 for NS_324.1 (parkinsonism; score 0.85); phase I trials of BIIB122 for Parkinson’s disease were recently completed, and a phase III trial is ongoing30,31. Another is LPA for both CV_404.1 (myocardial infarction; score 0.81) and CV_404.2 (coronary atherosclerosis; score 0.58); a phase II trial of olpasiran for cardiovascular disease recently demonstrated efficacy in reducing lipoprotein(a) and a phase III trial is ongoing32. A third example that highlights drug repurposing is CFB for SO_374.5 (macular degeneration; score 0.73); a phase II trial of iptacopan, originally indicated for paroxysmal nocturnal hemoglobinuria, is ongoing for age-related macular degeneration33. A fourth example is MYH7 for CV_414.2 (dilated cardiomyopathy; score 0.69); a phase II trial of danicamtiv is ongoing for primary dilated cardiomyopathy following demonstration of efficacy in rodent models34,35.Finally, we examined the enrichment of the 50 MSigDB hallmark gene sets with increasing ML-GPS scores across the 112 phecodes using single-sample gene set enrichment analysis. For 50 × 112 = 5600 gene set-phecode combinations, there were higher normalized enrichment scores (NES) with ML-GPS compared to L2G + Clinical + P model scores for 3441 combinations [61.4%], and 899 combinations were enriched only with ML-GPS scores. The gene sets with the highest NES for ML-GPS scores were consistent with known disease mechanisms, including PANCREAS_BETA_CELLS for type 2 diabetes (top gene GCK), NOTCH_SIGNALING for coronary atherosclerosis (top gene TCF7L2)36,37, REACTIVE_OXYGEN_SPECIES_PATHWAY for nasal polyps (top gene GPX4)38,39, and MYOGENESIS for hypertrophic cardiomyopathy (top gene TNNT2) (Fig. 7g; Supplementary Data 13). However, ML-GPS identified disease-relevant pathways with high NES that the L2G + Clinical + P model did not; these included UNFOLDED_PROTEIN_RESPONSE for congenital heart disease (top gene CCL2)40, ANGIOGENESIS for coronary atherosclerosis (top gene LPL)41, REACTIVE_OXYGEN_SPECIES_PATHWAY for essential hypertension (top gene FES)42, and HEME_METABOLISM for disorders of iron metabolism (top gene SLC4A1) (Supplementary Data 14). These results further support the biological relevance and potential clinical utility of ML-GPS.

Hot Topics

Related Articles