The variation landscape of CYP2D6 in a multi-ethnic Asian population

While extensive research has been conducted on CYP2D6 across various ethnicities, a gap still exists in understanding the extent of CYP2D6 pharmacogenetic diversity within Southeast Asian populations. To address this gap, we conducted a study to characterize the distribution of CYP2D6 star alleles and their associated phenotypes using a genetically diverse cohort from Singapore11. This cohort comprises individuals representing the three major ethnicities in the country: Chinese, Malay, and Indian, and includes high-coverage short-read whole genome sequences from over 1800 participants. To the best of our knowledge, this study represents the most comprehensive examination of CYP2D6 genetic variation in the Singaporean population, and given the country’s rich diversity, it provides an ideal platform for comprehensively exploring CYP2D6 variation within Southeast Asia.We developed a bioinformatics workflow using three distinct tools to mitigate inaccuracies in identifying CYP2D6 variants and diplotypes, given the limitations presented by the short-read data we used in this study. Our workflow includes a consensus algorithm, which reports a diplotype call only when at least two out of three tools concur; however, cases where potential novel star alleles were predicted by StellarPGx were subjected to additional manual inspection to ascertain the final diplotype calls. This approach, while conservative, was chosen to prioritize the accuracy of star allele assignments. Additionally, our workflow includes steps to interpret predicted diplotypes into metabolizer profiles and to identify potential novel alleles. Upon applying the workflow to the 1850 samples in our cohort, we successfully determined consensus diplotype calls for 1487 samples, encompassing over 80% of the population. In contrast, around 20% of samples remained uncharacterized. Notably, the majority of samples that did not reach consensus contained haplotypes that included SVs and potential novel alleles (90.4%), thus underscoring the challenges associated with star-allele analysis of the CYP2D6 locus via short-read sequencing. Such challenges arise from the gene’s significant homology with the CYP2D7 and CYP2D8 pseudogenes, hinting at the potential for future research utilizing novel technologies such as long-read sequencing. In fact, technology providers such as Pacific Biosciences and Oxford Nanopore Technologies have begun releasing dedicated workflows for PGx, and dedicated software tools like pangu are already demonstrating improvements over reference calls previously established with short-read sequencing and long-range PCR15.In our study, we observed significant patterns in the distribution of CYP2D6 alleles among the Southeast Asian populations we examined. Among the most prevalent star alleles, we noted a predominance of alleles associated with reduced or absent function, except for *1 and *2, which are associated with normal function. *1 emerged as the most prevalent CYP2D6 allele with normal function, followed by *2, aligning with previously reported trends in Southeast Asian populations9. Additionally, we observed variations in allele frequencies among the Singaporean populations included in our study. *36 + *10, *10, and *36 were more prevalent in the Chinese and Malay populations, with the Chinese and Malays exhibiting approximately six-fold higher allele frequencies of *10 compared to Indians. This trend aligns with well-documented findings, which consistently report a high prevalence of the *36 + *10 tandem in East Asian populations, including Japanese, Korean, and Chinese9. Furthermore, it supports the common identification of the *10 allele as a reduced-function variant in Asian populations8,9. Notably, *36 was not observed in the Indian participants in this study. In contrast, *2, *41, *5, and *4 alleles exhibited higher prevalence in the Indian population, reaffirming previous research highlighting the prominence of these alleles in Indian samples16. We also observed 11 haplotypes (*112, *52, *133, *7, *75, *82, *17, *9, *15, *4 + *4, *69) that are only present in one individual which expands the previous findings of rare stare alleles in our study population16. Of particular interest was the identification of three individuals with no detectable copies of CYP2D6 (*5/*5*), constituting approximately 0.2% of our population. This aligns with previous reports indicating that the *5/*5 diplotype occurs at a very low frequency, ranging from 0% to 1.9% in Southeast Asian populations9.The frequencies of common star alleles in our population, such as *10, *36 + *10, and *2, exhibited significant differences from the average frequencies estimated by the PharmGKB and 1000 Genomes Project (1KGP) for the equivalent populations. Whilst PharmGKB may include additional ethnicities to the ones in SG10K_Health, these discrepancies may originate from variations in data generation methods (e.g. WGS versus genotyping) and star allele calling approaches employed in various studies, leading to variability in the range of detectable star alleles. It is also possible that the frequency of *10 could be overestimated when *36 + *10 tandems and/or *36 alleles are not reported. Additionally, *2 alleles are considered “backbone” alleles since their defining Single Nucleotide Variants (SNVs) occur in multiple other haplotypes, which may introduce potential mis-assignments if any of these additional haplotype-defining variants cannot be accurately detected.After translating diplotypes into metabolizer profiles, we identified actionable variants in over 46% of the population, increasing to over 80% when focusing on the top ten most common haplotypes. This underscores the significant impact of implementing pharmacogenomics on a large scale. In our study population, normal metabolizers (NMs) were the most prevalent phenotype at 53.9%, ultra-rapid metabolizers (UMs) accounted for 1.1%, while poor metabolizers (PMs) represented 0.5%. The highest frequencies of diplotypes predicting PM were found in Indian subjects (1.2%), followed by Malays (1%) and Chinese (0.4%). Interestingly, we detected PMs in all three ethnicities (Chinese, Indian, and Malay), a difference from previous studies in the same population that did not identify PMs in Chinese individuals16, likely due to our larger sample size and higher-depth sequencing approach. Compared to global trends, our cohort showed a slightly higher incidence of intermediate metabolizers, around 42%, surpassing the previously reported 34%17. This coincided with a decrease in the prevalence of normal metabolizers, typically observed at 64–68% in global populations. This trend may be attributed to the higher prevalence of diplotypes associated with reduced or no function, including *36 + *10, *10, *41, and *36.Both PMs and UMs exhibit altered capacity to metabolize CYP2D6 substrates, including codeine, certain antidepressants, and antipsychotics18. UMs face an elevated risk of toxicity due to increased morphine formation after codeine administration, while individuals with non-functional alleles are at risk of inadequate pain relief due to reduced efficacy. In our study, the proportion of UMs exceeded that of PMs by approximately than two-fold. Unlike Caucasians, where the *4 allele predominates and accounts for 70–90% of poor metabolizer status, its low frequency in Asians may explain the lower proportion of poor metabolizers in our population17. Lastly, we observed a relatively high proportion of Indian participants (7.9%) with an indeterminate CYP2D6 metabolizer phenotype, highlighting the limitations of current CPIC guidelines for genotype–phenotype translation. Most of these individuals carried alleles with uncertain functions, such as *43, *86, *113, *82, *111, and *112. This underscores the need for extensive allele characterization and phenotypic studies to develop effective precision medicine strategies, particularly for medications metabolized by CYP2D6.We further inspected the prevalence of structural variants (SVs) in CYP2D6, as it remains underexplored in Asian populations8. Our analysis indicated that the majority of study participants (55.6%) had at least one SV-containing star allele, with *36 + *10 hybrids being the most prevalent overall. This percentage exceeds the prevalence previously reported for the same population by Chan et al.8. All three tools used in our consensus diplotype calling approach demonstrated > 90% recall for known CYP2D6 structural variants, including key star alleles such as *36 + *10, *36 × 2 + *10, and *36 + *10 × 2, when using high-depth short-read WGS as input in previous benchmarks (aldy, Cyrius, and StellarPGx). Based on these previous validations and the single-tool deficiencies resolved by our consensus approach, we are confident in having minimal false-positive calls. However, there might be cases where novel SVs were missed or where complex diplotypes (e.g., individuals having an SV on each haplotype) might have been miscalled. As more data from long-read technologies is generated, we expect an overall improvement in accurately characterizing SV-driven CYP2D6 variation, particularly in understudied populations. When stratifying metabolizer profiles based on the presence or absence of SVs, we detected a higher incidence of IMs (56.2%), UMs (1.9%) and PMs (1%) among participants with SV-containing alleles compared to individuals with no CYP2D6 SVs, as expected. Interestingly, we still detected a high prevalence of NMs among the first group (39.3%), highlighting the importance of detecting the exact nature of the CYP2D6 SVs present in each sample, and emphasizing the limitations of relying on copy number or SV information alone to make predictions on phenotypic outcomes.Lastly, our study also provides an initial assessment of the extent of genetic variation that remains undocumented in public databases. We identified 14 potential novel haplotypes for CYP2D6, based on a carefully curated subset. This group includes both shared variants observed across multiple individuals (N = 7) and private events (N = 7). Although these novel haplotypes are individually rare, collectively, they appear in 28 individuals, accounting for 1.5% of our study population. This underscores the significant, yet often overlooked, impact that rare allelic variations could have on precision medicine strategies, both in Asia and globally. The majority of these 13 novel haplotypes are variations of the *10 allele (N = 6), followed by *2 (N = 3). This distribution aligns with the prevalence of these alleles, which are among the top ten star alleles frequently identified in our study populations. All the novel haplotypes we detected include potentially functional variants previously catalogued in PharmVar in combination with other haplotypes, except for 5, which we inferred using the Variant Effect Predictor. They encompass a range of genetic changes, including missense and frameshift mutations, which could significantly alter protein function. However, given the complexities of genotyping and star allele calling in CYP2D6, caution is advised when interpreting computationally inferred novel haplotypes, particularly singletons. The predictions can be influenced by limitations in the genomic datasets used, such as short-read sequencing and software. While these computational predictions are informative, they are not a substitute for actual experimental validation. Although we could not validate these haplotypes due to lack of DNA access, our findings offer a valuable foundation for future research, especially as Singapore’s National Precision Medicine program progresses in characterizing a larger portion of its population. A prevalent practice in the field is to report variants based on their backbone allele, which could significantly influence drug dosage recommendations. Unexamined variants with potential functional significance, like those uncovered in our study, could affect phenotype assignments and clinical decisions, potentially leading to adverse pharmacological outcomes for patients.Overall, our research represents a significant step towards enhancing the understanding of CYP2D6 variations in under-represented populations. Gaining a thorough insight into the allelic diversity within the populations in this study is key for precisely predicting drug responses and successfully applying pharmacogenetics in Singaporean and global clinical settings, where targeted assays are still the norm. Therefore, understanding the genetic variability of the various populations is essential to ensure that the most prevalent alleles are effectively identified and included in the list of targets. While a broader test such as whole genome sequencing would be ideal to continuously explore genetic variation and guide future clinical applications, cost constraints currently limit this approach. However, the ongoing reduction in sequencing costs and the advanced discovery potential of long-read sequencing techniques point to a promising future in this research area. Despite the technological limitations in characterizing our dataset, we believe our analysis presents a valuable contribution to the Singapore and global scientific community. We advocate for the consensus CYP2D6 star allele calling method used in our study for similar analyses, to address the challenges of short-read sequencing. Future studies focusing on the definitive characterization of the novel haplotypes not validated in this research, along with functional studies assessing their clinical significance, will be crucial for enhancing clinical pharmacogenomics implementation strategies. In the meantime, we believe that the detailed mapping of CYP2D6 star allele distributions in Southeast Asian populations, as presented in our study, can serve as a resource in advancing precision medicine strategies and fostering the adoption of proactive pharmacogenetic testing in diverse clinical environments across Asia and worldwide.

Hot Topics

Related Articles