Assigning credit where it is due: an information content score to capture the clinical value of multiplexed assays of variant effect | BMC Bioinformatics

Application 1: information content of MAVEsWe used data from prior analyses of MAVEs and prior proposed translations of MAVE data to ACMG rules (see methods) to calculate information content. Evidence criteria were converted to odds pathogenic per Tavtigian et. al. [9]. Then, the posterior probability was determined using standard Bayesian calculations. Posterior probability was then converted to information content (see Methods).We present two examples variant in BRCA1 to illustrate this process. These examples only use population frequency and functional data in classification to simplify the examples and to focus on the information content conversion.BRCA1 c.5120T > C was classified as a variant of uncertain significance that is absent in population databases (PM2_supporting). Combining the prior probability of pathogenicity of 0.1 with PM2 evidence ((odds*prior)/((odds-1)*prior + 1) derived from Tavtigian et al. [9] would give a posterior probability of pathogenicity of 0.188. This can be converted to an information content of 0.303 by application of Formula 1 and Formula 2 which gives (1-(– 0.188* log2(0.188) – (1–0.188) * log2 (1–0.188))). BRCA1 c5120T > C has a functional score of -0.143 [10], which is functionally normal and can be used as BS3 evidence, which gives and likelihood of pathogenicity 0.053. Combining this with the population evidence and prior gives a posterior probability of pathogenicity of 0.012. This can be converted to an information content of 0.905 by application of Formula 1 and Formula 2 which gives (1-(– 0.012* log2(0.012) – (1–0.012) * log2 (1–0.012))). The difference in information content that results from incorporating functional data for BRCA1 c.5120T > C is 0.905 − 0.303, or 0.602 bits of information. The functional data substantially increases the probability that the variant is benign, decreasing uncertainty and increasing information.BRCA1 c.5288G > T was classified as a variant of uncertain significance that is absent in population databases (PM2). Before incorporating functional data it has the same probability of pathogenicity (0.188) and information content (0.303) as the prior example. However, BRCA1 c.5288G > T has a functional score of -1.83 [10], which is functionally abnormal and leads to PS3 evidence associated with a 18.8 likelihood ratio supporting pathogenicity. Incorporating this evidence with prior probability and PM2 evidence, gives posterior probability of pathogenicity of 0.812. This can be converted to an information content of 0.303 from application of Formula 1 and Formula 2 which gives (1-(– 0.812* log2(0.812) – (1–0.812) * log2 (1–0.812))). The difference in information content that results from incorporating functional data for BRCA1 c.5288G > T is 0.303–0.303, or 0.000 bits of information. Incorporating functional data substantially changes the probability of pathogenicity, swinging the probability of pathogenicity from 0.188 to 0.812, VUS leaning benign or VUS leaning pathogenic, but we are not any closer to certainty about the variant, so the information content does not change.We summed bits of information across all variants for which MAVE data was available to calculate total information content generated for several MAVEs that assessed variants in BRCA1, PTEN, and TP53 [10]. We compared information gain while only considering changes that resulted in VUS reclassification to the information gain from all single nucleotide substitutions reported in MAVE data. Data from MAVEs on amino acid changes that require more than one DNA substitution were excluded from this analysis.Table 1 Information content in bits for reclassifying VUS as presented in Fayer et al. [10], total information content from all single substitutions reported with functional data presented in papers originally listing data, and total possible missense informationThe BRCA1 MAVE [11] examined 3893 variants with 2821 functional, 249 intermediate, and 823 showing loss of function. A functionally normal classification was considered strong benign evidence and loss of function was considered strong pathogenic evidence. Conversion of this evidence to posterior probability using prior probability of 0.1 and then to information content resulted in 813.2 bits of information gained by the BRCA1 MAVE.The PTEN MAVEs [12, 13] examined 8198 variants for effects on protein abundance and examined 7657 variants for activity. The number of overlapping variants was 7639, of which 4811 had combined scores that were considered strong pathogenic evidence and 303 which were considered benign supporting evidence. Conversion of this evidence to posterior probability using prior odds of 0.1 and then to information content yielded a total information content of 893.6 bits of variant classification information added by the study for changes possible through single missense substitutions.Four TP53 MAVEs were combined and used to train a naïve Bayes classifier and make predictions on 7893 variants with scores in each of the four assays [10, 15, 16]. The Bayes classifier predicted 5070 as normal and 2823 as abnormal. These were assigned weights of benign moderate and pathogenic strong evidence, respectively. Conversion of this evidence to posterior probability using prior odds of 0.1 and then to information content yielded 160.0 bits of variant classification information for changes possible through single missense substitutions.Comparing reports that included only VUS reclassified to our method, which included all the classification information generated by several MAVEs, the reported information content increased 22-fold for the BRCA1 assay, 85-fold for the PTEN assays, and 3.5-fold for the TP53 assays. (See Table 1)Application 2: total missense variant information content in a geneThe total missense information content of a gene can be calculated by counting the number of possible missense variants in a gene. Since each variant represents one bit of information, the total missense information content of a gene, in bits, is the number of possible missense variants. These values were 12,351; 2721; and 2571 for BRCA1, PTEN, and TP53 respectively. The MAVEs generated 6.7%, 32.8%, and 6.2% of the total possible single-substitution variant classification information for BRCA1, PTEN, and TP53 respectively. (Fig. 2)Fig. 2Percent of total information found by each study. A plot of the percentage of total information found by each of the studiesApplication 3: quantifying the apparent information effect of a classification guideline rule changeFor well-established functional studies the 2015 ACMG-AMP guidelines recommend using the evidence codes PS3 and BS3 indicating strong evidence for or against pathogenicity, respectively [17]. However, different levels of evidence have been proposed for MAVEs that meet stronger or weaker validation criteria [7, 18,19,20,21]. If the variant classification guidelines or strength of evidence criteria change, the changes have an effect on the apparent information for any variant that is impacted by the guideline change. For example, evidence against pathogenicity for PTEN MAVEs is considered supporting evidence primarily because there are very few established benign PTEN variants to use in validation. If the amount of validation data increases the evidence generated by the MAVE may change. This would result in an apparent change in information content for many variants. Similarly, if ACMG-AMP committees or ClinGen VCEPs decide to refine the level of evidence assigned to a specific rule, there are many variants for which the apparent information content would change. We evaluated how applying single ACMG-AMP evidence levels with different priors using the Tavtigian et al. [9] Bayesian framework would result in different levels of evidence (Fig. 3). This analysis illustrates how the greatest information gains always occur when the prior probability is 0.5. Differences between points on the same vertical prior line show how shifting evidence assignment will change apparent information. This also illustrates that applying benign evidence to a variant with a high prior probability results in an apparent information loss from the increase in uncertainty or increasing entropy (all points with information change below 0 as plotted on the y-axis). There is a similar result when pathogenic evidence is applied to a variant with a low prior probability.Figure 3 plots how different evidence levels combined with prior probabilities result in different amounts of apparent informatioFig. 3Plot of the information change for the different types of evidence across different prior probabilities. Information content loss for pathogenic evidence occurs at lower prior probabilities since these priors contain high information content for benign interpretation. The incorporation of pathogenic evidence for a variant with a low prior moves the probability in the pathogenic direction and toward greater uncertainty. The same effect occurs for benign evidence with a high prior probability of pathogenicity since incorporation of benign evidence will reduce the probability moving classification toward a more uncertain class, thus reducing information content. The listed evidence categories are pathogenic very strong(PVS), pathogenic strong(PS), pathogenic moderate(PM), pathogenic supporting(PP), benign supporting(BP), benign moderate(BM), benign strong(BS), and benign very strong(BVS). Note that benign moderate and benign strong are not currently approved categories but are listed in parentheses for completion.

Hot Topics

Related Articles