Phenotypic evaluation of deep learning models for classifying germline variant pathogenicity

Study Design and ParticipantsWe analyzed data from the UK Biobank, a prospective cohort study comprising approximately 500,000 participants aged 37-73 years old living in the UK that were recruited between 2006 and 2010. UK Biobank participants have matched genomic profiling and longitudinal health record data, including cancer diagnosis history. For this study, all participants with matched exome sequencing profiles and linked health record data were included (total n = 469,623 participants). All participants provided written informed consent, which was approved by the North West Multicenter Research Ethics Committee. As the present study involved reanalysis of fully de-identified preexisting data, no additional approval was required. This study was performed in accordance with the Declaration of Helsinki.Defining cancer diagnosesParticipants in the UK Biobank were linked to national cancer registries, included as data fields 40006 (ICD10) and 40013 (ICD9). Using the UK Biobank Research Analysis Platform (RAP), we queried the entire cohort for all instances of the “Type of cancer” entry within the cancer registry data, annotated by ICD9 and ICD10 codes. Breast cancer diagnoses were defined by ICD9 (174*) and/or ICD10 (C50.* or D48.6). To assess whether our findings were consistent beyond the context of breast cancer, we also considered diagnoses of ovarian or fallopian tube cancers for BRCA1 and BRCA2 analyses: ICD9 (1830, 1832) and/or ICD10 (C56, C57.0, C57.4, D39.1). All participants with matched genomic and clinical data were included for the analyses comparing variant pathogenicity annotations between ClinVar and each deep learning model. For analyses of breast or ovarian cancer risk in relation to pathogenic vs benign variant carrier status, only female participants were selected.Analysis of exome sequencing dataWithin the UK Biobank RAP, we used the Swiss Army Knife tool to filter and annotate variants in BRCA1, BRCA2, ATM, CHEK2, and PALB2. Specifically, we used PLINK2 to extract variants in the chromosome regions corresponding to each of the 5 target genes, with a minimum minor allele frequency of 0, a minimum minor allele count of 4, a Hardy-Weinberg equilibrium filter of p < 1 × 10−15 with the “keep-fewhet” flag activated, as well as less than 10% of missing genotypes and variant calls26. From there, we used SNPEFF27 to annotate each variant by their functional type and impact on protein sequence, with further annotation by SNPSIFT28 to incorporate annotations from the February 15, 2024 ClinVar data release.ClinVar29 labels were consolidated into benign, pathogenic or VUS categories, with conflicting or missing annotations consolidated as VUSs. For all identified missense coding variants, we used precomputed scores from AlphaMissense7, EVE8 and ESM1b9 to predict variant pathogenicity. For AlphaMissense, we used the default optimized score of 0.34 to distinguish benign vs ambiguous variants, and 0.564 to distinguish ambiguous vs pathogenic variants. For EVE, we used the “EVE_classes_75_pct_retained_ASM” classifications as optimized in the original study. As the EVE database (https://evemodel.org/) did not have predictions available for CHEK2, EVE was excluded from the CHEK2 analyses. For ESM1b, we used the default optimized threshold of −7.5 to distinguish pathogenic vs benign variants; unlike AlphaMissense and EVE, ESM1b was designed to use a single binary threshold such that no variants are annotated as ambiguous.We examined UK Biobank participants (male or female) with at least one missense variant in each of the five genes and classified the participants as benign, VUS, or pathogenic variant carriers. Given that individual participants may have multiple variants simultaneously, pathogenic variants were prioritized over VUSs, and in turn VUSs were prioritized over benign variants. For instance, if a participant had both pathogenic and benign variants in a given gene, the participant was annotated as a pathogenic variant carrier and not as a benign variant carrier.We then compared ClinVar and the deep learning model annotations, first on the level of unique variants and subsequently on the participant level to evaluate the accuracy of the deep learning models in recapitulating ClinVar pathogenicity labels. To model a clinical scenario in which deep learning models might be applied, we generated composite classifiers by starting with ClinVar annotations as a foundation and augmenting them with model-based pathogenicity predictions to specifically classify ClinVar VUSs; in other words, all benign and pathogenic ClinVar annotations were retained for the composite classifiers, and the pathogenicity predictions were only applied to ClinVar VUSs.Association of variant pathogenicity with cancer riskTo assess whether the pathogenicity classifications were functionally associated with breast cancer risk, we analyzed female participants with a benign vs pathogenic missense variant for the gene of interest, while excluding participants that carried a frameshift, stop gain, stop loss, or start loss variant in that particular gene. We then used Firth’s penalized logistic regression to determine the association of pathogenic vs benign variants with diagnoses of breast or ovarian/fallopian cancer (as noted, only females were included in these analyses). In all regression models, we included age at time of enrollment in the UK Biobank as a covariate, binarizing by the median age. Participants carrying a VUS (as defined by ClinVar or each deep learning model), but no pathogenic variants, were excluded from the regression analysis, regardless of the presence of co-occurring benign variants (see discussion above). We further calculated regression models using the composite pathogenicity labels (see above) to simulate the clinical scenario of first relying on ClinVar annotations where available and subsequently employing deep learning models to classify VUSs. To directly assess the predictive utility of deep learning models on classifying VUSs, we then analyzed VUS carriers only (as defined by ClinVar) and assessed whether the predicted pathogenicity labels were associated with cancer risk.Regression results were reported as log odds ratios (ORs) and p-values, with a significance threshold of p < 0.05. We did not adjust for multiple comparisons. The 95% confidence intervals (CIs) for all log ORs are detailed in the supplementary data.Assessment of gene-specific thresholds for defining pathogenic variantsFor evaluating whether gene-specific thresholds could improve model performance, we focused on AlphaMissense pathogenicity scores. We retained the default threshold of 0.34 to distinguish benign vs ambiguous variants, while varying the threshold to distinguish ambiguous vs pathogenic variants, ranging from 0.564 (the default) to 1, in increments of 0.01. Regression results were reported as described above.

Hot Topics

Related Articles