Polygenic risk and rare variant gene clustering enhance cancer risk stratification for breast and prostate cancers

Disease risk can be stratified according to PRS in conjunction with monogenic variants in high-risk genes11,13,21. Using the EPRS approach, we systematically categorized monogenic variants by clustering risk genes using odds ratios and PAF values and then assessed the extent to which PRS influences each cluster in breast and prostate cancers. Through EPRS, we were able to observe the contributions of monogenic and polygenic effects on cancer risk, improving the understanding of the genetic profile influencing cancer risk.PRS demonstrated significance in stratifying both breast and prostate cancer risk. The odds ratios of the low- and high-PRS groups for both cancers significantly differed from that of the intermediate-PRS group in both analyses. These findings demonstrate that the cumulative effect of SNPs increases the risk of cancer, indicating that PRS alone can be utilized to stratify the risk of an individual. We used three different summary statistics for each cancer to construct PRS and applied PRSice2, LDpred2, and SbayesR. The performance of PRS was evaluated using R2 on the liability scale, and the best-performing methods were selected: LDpred2 using summary statistics from Zhang et al. 29 for breast cancer, and PRSice2 using summary statistics from Wang et al. 30 for prostate cancer (Supplementary Table 9)29,30.In addition, by incorporating the monogenic variant effect, we also observed increased cancer risk in each PRS group with pathogenic variants compared with that of PRS groups without any pathogenic variants. Pathogenic variants increase cancer incidence by interrupting metabolic pathways13,16. In our study, cancer risk varied by PRS group and the presence of variants. Notably, the intermediate-PRS group with variants exhibited a significantly increased cancer risk compared with the high-PRS group without variants. Moreover, the high-PRS group with variants for both cancers displayed the highest risk among all groups. Samples with monogenic variants in each PRS risk group demonstrated an up to 2-fold higher risk than those without monogenic variants for both cancers. Interestingly, the low-PRS group with monogenic variants slightly exceeded the risk of the intermediate-PRS group without variants for both cancers. Although the odds ratios were not significant, these findings suggest that monogenic effects can amplify the risk in samples with low polygenic effects.We observed stratified risks in each group, depending on the absence or presence of monogenic variants. However, given that the impact of the presence or absence of monogenic variants can have a considerably more critical effect on risk than SNPs, applying the summation of genetic effects, often used for PRS construction, to monogenic variants may not fully represent the genetic risk of disease. Furthermore, the risk associated with monogenic variants can vary depending on the specific gene hosting that variant. Previous studies have primarily focused on the elevated cancer risks associated with well-known risk-increasing genes in conjunction with PRS13,22,23. Notably, our EPRS approach provides a systematic method for prioritizing and clustering monogenic effects and integrating them with PRS, thereby refining cancer risk stratification. In our EPRS approach, we prioritized genes specific to each type of cancer and clustered the monogenic effects based on their odds ratios and PAFs. By selecting genes with odds ratio Bonferroni-adjusted P-values less than 0.0022 and PAF values greater than 0, we were able to highlight the genes most significantly affecting each cancer type. This approach aligned with previous studies that identified genes, such as ATM, BRCA1, BRCA2, CHEK2, and PALB2 as associated with an increased risk of breast cancer26,31, and HOXB13 and BRCA2 with prostate cancer5,27. We then clustered the identified genes to estimate their associated cancer risk. In breast cancer, we identified five genes grouped into two distinct monogenic effect clusters. Monogenic effect clusters 1 and 2 showed moderate- and high-risk effects, respectively. In prostate cancer, we identified three genes clustered into two groups. Monogenic effect clusters 1 and 2 demonstrated moderate- and high-risk effects on prostate cancer.Additionally, we incorporated these cluster effects within PRS groups, facilitating a more detailed subdivision of risk stratification and its characteristics. When combined with PRS, all monogenic clusters increased the risk for both breast and prostate cancers. However, the degree of increase varied depending on the monogenic cluster effects, which were revealed through the odds ratios of each risk group. In breast cancer, monogenic effect clusters 1 and 2 demonstrated moderate- and high-risk effects, respectively, when combined with PRS. A total of 1292 samples were newly classified into different groups compared with those stratified by PRS risk alone. The odds ratios demonstrated a concurrent elevation in breast cancer risk influenced by both polygenic and monogenic effects. Samples in gene cluster 2 within the low-PRS risk category exhibited a higher odds ratio than even the high-PRS risk group without variants. High-PRS risk samples in clusters 1 and 2 displayed higher odds ratios than the high-PRS risk group with unclustered variants. This detailed stratification of monogenic effect clusters allowed us to observe more specific risk differences. Applying the EPRS approach to prostate cancer resulted in 817 samples being reclassified into different risk groups compared to those obtained using PRS alone. Similar to breast cancer, monogenic clusters 1 and 2 in prostate cancer demonstrated moderate and high risks, respectively, when combined with PRS. Despite this, the polygenic effects were more pronounced than the monogenic effects in our analysis. PRS showed robust predictive performance with an R² of 0.344 on the liability scale, presenting a steep increase in odds ratios through PRS risk levels. The highest and lowest odds ratios were observed in the high- and low-PRS risk groups of gene cluster 2, respectively. Therefore, our analysis enhanced the distinction of cancer risk groups beyond the scope of PRS alone by incorporating monogenic effect clusters.Our findings highlight that the combined effect of PRS and monogenic clusters can substantially influence cancer risk. This was also evident in the observed prevalence of cancer among risk groups. For breast cancer, monogenic cluster 2 demonstrated a higher prevalence across all PRS groups; specifically, the high-PRS group had more than half of the samples in the risk group diagnosed with breast cancer. BRCA1 and BRCA2, which were identified as critical genetic variants causing breast cancer26,32,33,34,35, were clustered in monogenic effect cluster 2. Specifically, in the high-PRS group, both genes exhibited high prevalence values: 0.67 for BRCA1 and 0.52 for BRCA2 (Supplementary Table 7). The prevalence of prostate cancer among risk groups also varied according to their PRS group and monogenic cluster effect, with a trend of increasing prevalence as the PRS risk group ascended. Monogenic effect cluster 2 exclusively contained HOXB13, a causative gene of prostate cancer23,27. This gene demonstrated a higher prevalence than that of BRCA2 and ATM in cluster 1 for prostate cancer (Supplementary Table 8).In breast cancer, pathway enrichment analysis revealed significant involvement in DNA damage response and repair mechanisms. Both clusters shared enrichment in critical pathways such as DNA double-strand break repair, cellular response to DNA damage, and cell cycle checkpoints, underscoring their collective role in maintaining genomic stability. Key gene ontology biological processes associated with these pathways include double-strand break repair, DNA repair, and signal transduction in response to DNA damage. Despite their common roles, each cluster also exhibited unique pathway enrichments. BRCA1 and BRCA2 of cluster 1 were uniquely involved in pathways like the ATR-BRCA pathway and homology-directed repair, emphasizing their roles in precise DNA repair through homologous recombination and apoptotic signaling. Conversely, ATM, PALB2, and CHEK2 of cluster 2 were enriched in pathways related to diseases of DNA repair and response to ionizing radiation, highlighting their roles in signaling and repair processes under stress conditions.For the pathway enrichment analysis of prostate cancer, two gene clusters emerged: one involving HOXB13 and another comprising ATM and BRCA2. Both clusters shared involvement in developmental and differentiation pathways, such as gland development, reproductive system development, and cellular growth. However, each cluster also exhibited unique pathway enrichments, reflecting their distinct functions. The ATM and BRCA2 of cluster 2 are enriched in DNA repair and damage response pathways, including homologous recombination, the ATR-BRCA pathway, and the DNA repair complex, emphasizing their critical roles in genomic stability and preventing mutation propagation. In contrast, HOXB13 is uniquely enriched in pathways related to cellular growth and maturation, indicating its pivotal role in development and differentiation.This study has some limitations. First, different partitioning criteria for PRS could have potentially shown better performance in cancer risk prediction compared to equal-sized tertiles. Various partitioning criteria, ranging from 5% to 35% in increments of 5% for both top and bottom percentages, were applied to our EPRS approach. The mean AUC was calculated using 5-fold cross-validation, and the 30% partitioning criteria yielded the best performance for breast cancer, while 20% was optimal for prostate cancer (Supplementary Table 10). This approach effectively segregated high- and low-PRS risk groups. The highest and lowest prevalence of risk groups remained the same, with high-PRS and gene cluster 2 showing the highest risk, and low-PRS with no monogenic variant showing the lowest risk. The prevalence increased and decreased accordingly in these groups (Supplementary Fig. 3). However, the optimal partitioning criteria varied for different cancer types. Although we explored various segregation percentages, there may still be room for improvement as PRS is a continuous variable. Future applications of different segregation methods may enhance the understanding of polygenic effects. Nevertheless, the current study primarily focused on the potential of systematic risk stratification using genetic profiles. The second limitation is a potential bias in selecting genes for systematic prioritization and monogenic effect clustering. The challenge of considering all functional genes is substantial. Given their shared causal genes, we focused on breast and prostate cancers to lessen the computational and financial burdens of analysis. Candidate genes specific to each cancer were selected based on previous studies5,25,26,27. Although all pathogenic variants were accounted for regardless of the type of candidate cancer gene, each cancer-specific gene was prioritized, and cancer-specific monogenic clusters showed an elevated effect on cancer risk. However, further research into additional diseases is necessary for more precise systematic risk stratification. The third limitation of this study arises from the recruitment bias of UK Biobank. This cohort predominantly consists of participants who are older, more educated, and of European ancestry. Moreover, these participants generally exhibit healthier lifestyles, and a lower prevalence of several health conditions compared to the general UK population. They are notably less likely to be obese, smoke, or consume alcohol daily. These characteristics suggest a ‘healthy volunteer’ bias, which may affect the generalizability of our findings, including the calculated Population Attributable Fractions (PAFs) and the observed cancer prevalence. To overcome these limitations, further studies should consider incorporating a broader population36.In summary, this study aimed to systematically stratify the risk of cancers by clustering genes with pathogenic variants based on odds ratios and PAF, which are used to infer risk levels elevated by rare variants. We addressed this using sequencing data from the UK Biobank. Our findings suggest that, for breast cancer, relying solely on the popular value odds ratios may be insufficient to fully capture the risk contribution of certain genes. In our study, the well-known genes BRCA2 and PALB2 were similar in terms of odds ratios but differed in PAF, leading to their placement in different clusters. These clusters displayed distinguishable patterns in terms of breast cancer prevalence. Moreover, when combined with a polygenic risk score based on common variants, data regarding individuals with rare variants in BRCA1 and BRCA2 showed stratified patterns of cancer prevalence depending on PRS level. Similar findings were observed for prostate cancer. Therefore, we suggest that when considering risk stratification for cancer, it is beneficial to focus on both rare and common variant information, incorporating metrics such as PAF in addition to odds ratios for the estimation of rare variant gene effects.However, we acknowledge certain limitations in our approach. Gene selection was based on a literature review, which may introduce bias and potentially miss significant genes. Furthermore, the pathogenicity assessment of rare variants and gene clustering methods relied on subjective thresholds. The optimal PRS grouping thresholds varied between traits, suggesting that some may be trait-specific. While more quantitative and statistically rigorous methods, such as gene-based common variant scores combined with rare-variant burden tests, could provide a more robust framework for stratification, these methods require a much larger set of genes and significantly more data. To apply such methods effectively, future studies will require larger sample sizes and more comprehensive datasets, such as those obtained through whole genome sequencing (WGS). Incorporating WGS data would allow for the inclusion of a broader spectrum of variants, thereby providing a more detailed understanding of the roles of low-frequency, rare, and somatic variants in cancer risk. Such enhancements would significantly improve the precision and accuracy of systematic risk stratification.

Hot Topics

Related Articles