Regionally enriched rare deleterious exonic variants in the UK and Ireland

Based on the regional availability of participants with WES data in UKB we classified samples into 16 geographical regions of origin (Methods). These regions contain individuals who were born within the corresponding region, but outside large metropolitan areas, who self-identify as “White British” and who exhibit very similar genetic ancestry based on a principal components analysis of the UKB whole-genome SNP array genotypes. There are two exceptions: the London region which contains individuals born in a 10 mile radius area around the geographical centre of London (i.e., a cosmopolitan control) and the Irish region for which we selected individuals who self-identify as “Irish” and were born in either Northern Ireland or the Republic of Ireland. We also included UKB participants with Ashkenazi Jewish (AJ) heritage, which we split into two groups (full and part AJ) based on their genomic information (Methods). Lastly, we added WES data from two cohorts in the Viking Genes programme9,25, from the relatively isolated archipelagos of Shetland and Orkney (the Northern Isles of Scotland), for which the sequencing and variant calling procedures were identical to those utilized for UKB WES data generation, for a total of 20 regions and 44,696 unrelated individuals (Supplementary Fig. 1).Individuals with Ashkenazi Jewish heritage in UKBAccording to the 2021 UK census, more than quarter of a million respondents answered “Jewish” to the voluntary question on religion. Recent studies have found evidence of participation of such individuals in the UKB project, including a study based on identity-by-descent (IBD) analysis of the 500k UKB participants26 and a recent analysis of European haplotype sharing in UKB SNP genotyping data27. An independent clustering analysis based on UKB whole-genome SNP array genotypes also revealed a distinct group of UK individuals, which based on their genetic data and UKB lifestyle questionnaire answers are likely to be of Jewish ancestry. Our further analysis of these individuals using the UKB WES data indicated that this group is enriched for some known pathogenic variants causing disorders with higher prevalence in Ashkenazi Jewish (AJ) individuals, including a frameshift variant in the HEXA gene causing Tay-Sachs disease (rs387906309, ~50x enrichment in our Jewish ancestry group compared to Central London) and a missense variant in the GBA gene causing Gaucher disease, Type I (rs76763715, ~13x enrichment).Our WES-based Multi-Dimensional Scaling (MDS) analysis revealed the existence of two main clusters within this group (Supplementary Fig. 2A). Our hypothesis that these two groups of individuals in the UKB dataset are distinct from each other is supported by two lines of evidence: (a) an MDS analysis based on known biallelic SNPs shows clear separation between these two groups when compared to a control group consisting of London individuals (Supplementary Fig. 2B); (b) a higher total number of runs of homozygosity (ROH)28 and a higher overall proportion of each individual’s genome was observed in ROH for one of these two groups compared to the other, demonstrating lesser amount of admixture (Supplementary Fig. 2C). These observations, combined with the fact that currently the vast majority (95%29) of British Jews are Ashkenazi, lead us to believe that these two groups presumably consist of individuals with full AJ (e.g. with 3 or more AJ grandparents) or part AJ (e.g., 2 or fewer AJ grandparents or with other Jewish heritage) heritage. Hereafter we refer to these two groups as full AJ and part AJ for brevity, noting that we cannot rule out the possibility that some Jewish individuals with different heritage (e.g. Sephardi, Mizrahi, Yemenite, Iraqi, Iranian or Georgian Jewish) may also be present in them. We included full AJ (1004 unrelated individuals) and part AJ (657 unrelated Individuals) in the following analyses as representative groups of a well-established human isolate population which are at different stages of admixture with other populations, and serving as archetypal groups enriched for variants that are rare elsewhere.Enrichment of shared ultra-rare SNP alleles in the Northern IslesTo check for any potential batch/regional effects in sample collection, storage, manipulation and bioinformatics processing, we computed the overall variation load for each of the 20 regions, after performing extensive QC filtering (Methods) of the variants discovered by the UKB alignment and variant calling OQFE protocol. We found that except individuals with AJ heritage, the samples from the remaining 18 regions exhibit virtually identical variant loads with medians of 31,885 exonic SNPs and 823 INDELs (short insertions or deletions) per person (Supplementary Table 1). To investigate the slight total SNP variant enrichment observed for individuals with AJ heritage compared to their non-AJ counterparts (~1% for full AJ and ~0.5% for part AJ), we further split the 20 regional variant datasets to “ultra-rare”, containing variants which have not been observed in any individual in the gnomAD genome dataset (v3.1.1, n = 76,156), and “known”, for variants found in any gnomAD subpopulation30 with passing variant quality. Compared to the 18 non-AJ regions, which have ultra-rare and known variant allele loads comparable to each other (Supplementary Table 1), the two AJ groups exhibit a lower number of ultra-rare variants and higher number of known variants (i.e. previously observed in gnomAD), with the latter group driving up the overall AJ variant load. These observations for the variant load in the two AJ groups can be explained by the relatively high genomic homogeneity in such individuals and the inclusion of variation data from 1736 AJ participants in the gnomAD dataset (2.3% of all 76,156 individuals).We also observed significant enrichment of shared ultra-rare SNP alleles in the Northern Isles, such that two-thirds of the ultra-rare variants found in Shetland are shared by two or more unrelated individuals from this region and more than half of the ultra-rare variants in Orkney are shared among individuals located there; in contrast, for example, only one-fifth of ultra-rare SNP alleles were observed to be shared among individuals within the London region. This finding confirms our previous result, which has been attributed to founder effects and increased genetic drift in the isolated Shetland population8. We note that the amount of shared ultra-rare variants in Orkney may be underestimated, due to the presence of 23 Orcadian individuals in the gnomAD dataset (via their inclusion in the 1000 Genomes project4), thus potentially reducing the overall number of ultra-rare variants found in Orkney.Rare exonic variation is associated with birthplaceRecent research based on genome-wide genotyping arrays has demonstrated a striking association between genomic variation and place of birth for individuals in the UK and the Republic of Ireland5,6. To assess if this geographical distinction can be recapitulated based on exonic data only, we assembled a dataset of 10,001 unrelated individuals from the UKB and the Northern Isles (492 Shetlandic, 509 Orcadian and 500 randomly chosen individuals from the remaining 18 groups). Performing MDS/UMAP analyses based upon rare (MAF < 5%) exonic SNP variation in the joint dataset of the 20 regions (Methods) using the top 20 MDS dimensions reveals a clear distinction of full AJ, part AJ, Shetland and Orkney populations from each other and from mainland regions (Fig. 1A, Supplementary Fig. 3A). Focusing on the 16 mainland UK and Ireland regions similarly based upon rare (MAF < 5%) exonic SNP variation in their joint dataset, distinctions among Welsh, English, Scottish and Irish exomes are evident, consistent with previous studies based on genome-wide genotyping arrays5,6 (Fig. 1B, Supplementary Fig. 3B). In addition, our analysis reveals an additional differentiation between North and South Welsh individuals, and suggests some level of separation exhibited by individuals born in South East Scotland (Fig. 1B), both of which have been previously observed5. Our choice of using rare SNP variants (MAF < 5%) is driven by the empirical observation that it is the most suitable threshold since it best recapitulates the previously published results (Supplementary Fig. 4).Fig. 1: Distinctions among regional populations based upon UMAP projections of rare exonic variation.The UMAP projections are computed on the top 20 MDS dimensions discovered based on biallelic, non-singleton and linkage-disequilibrium (LD) pruned known SNPs with MAF < 5% in the considered unrelated individuals. (A) UMAP analysis of all 20 groups in our study illustrating the clear genetic distinction of full AJ, part AJ, Shetland and Orkney individuals from each other and from mainland regions. Despite the careful curation of the genealogical records of the Northern Isles participants, some carry a significant proportion of UK mainland heritage; (B) UMAP analysis focusing on the 16 mainland regions, recapitulating previously known distinctions among Welsh, English, Scottish and Irish regions.We also computed the pair-wise FST distances (Methods) based upon biallelic, non-singleton, linkage-disequilibrium (LD)-pruned known SNPs with MAF < 5% across the 20 geographical regions (the same set of variants used for the MDS/UMAP analyses above) as another measure of the exonic distance between the regions (Supplementary Table 2). The results further highlighted the clear exonic distinctiveness of the AJ and Northern Isles populations to the 16 mainland regions (Supplementary Fig. 5), suggesting that the individuals from Shetland (mean FST = 0.00091) and Orkney (mean FST = 0.00083) represent a degree of genetic divergence from the mainland regions in their exomes that is comparable to the divergence of the part AJ (mean FST = 0.00090). In accord with the MDS analysis, Irish, Welsh and mainland Scottish regions show elevated mean FST distances to each other and to the English (0.00024, 0.00015, 0.00011, respectively), compared to comparisons within England or mainland Scotland (mean FST = 0.00005, 0.00004, respectively). The unrooted phylogenetic tree (Supplementary Fig. 6) we built based upon the pair-wise FST distances reiterates the Welsh-English-Scottish-Irish differentiation revealed by our MDS/UMAP analysis.Identification of regionally enriched deleterious variantsBased on the observed regional stratification in UKB, we sought evidence for the presence of potentially deleterious exonic variants enriched in particular geographical regions. We conservatively restricted our analysis to variants predicted to affect the coding potential of canonical transcripts, causing stop codon gain, start codon loss, splice donor/acceptor site loss, and frameshifts, as well as missense and splice region variants confidently predicted to be deleterious (CADD score ≥ 30). From the variants identified in these classes we then defined as enriched those found at a regional frequency at least 5 times higher than the frequency observed in gnomAD NFE and attaining statistical significance (Methods). Overall, we discovered at least one enriched and potentially deleterious variant in 14 of the considered 20 UKB regions, summing up to 67 unique variants. These variants are: (i) enriched in one or more of the UKB regions compared to NFE in gnomAD, (ii) predicted to be functional, (iii) implicated in a monogenic disorder and (iv) reported in ClinVar31 to be pathogenic/likely pathogenic (Methods). The vast majority (95%) of the discovered variants are previously known, but extremely rare variants, with 90% of these having gnomAD MAFNFE < 0.0004 (Fig. 2).Fig. 2: Regionally enriched deleterious variants discovered in the UKB regions of the UK and Ireland.Each of the 67 discovered variants is represented as a point with the frequency at which it is found in gnomAD NFE individuals (x-axis) and its regional frequency (y-axis). Note that, for visual clarity, the two axes are on different scales. To facilitate variant enrichment interpretation, added are four guide lines representing variant regional MAF enrichment of 5 times (solid line), 10 times, 20 times and 50 times (dotted lines) compared to gnomAD NFE. Precise enrichment information per each variant is available in the subsequent tables.We would like to clarify at this point that all the reported enriched variants are implicated in recessive disorders, i.e. individuals carrying the variant in a heterozygous state (referred to as “carriers”, with only one variant copy of the gene) are not affected; in order to be affected individuals must be homozygous. Thus, given that UKB and Viking Genes participants are generally healthy, it is not surprising that all 67 reported variants discovered in our dataset are observed only in a heterozygous state. In the subsequent regional sections, we provide more information on the disorders associated with each of these variants. We also include an estimate (HOMALT) of the number of individuals in each region, who may be expected to develop a disorder by inheriting two variant copies of the gene from their parents, based on the regional carrier frequency and various mating patterns.Analysis of the reference AJ group within UKB is instructiveOur analysis revealed 24 enriched and potentially deleterious exonic variants in UKB participants with AJ descent, with 10 of these variants being shared by full and part AJ, while the remaining 14 are seen exclusively in full AJ (Supplementary Table 3). Most of the identified variants are correlated with health conditions previously reported to be significantly enriched in individuals with Jewish origins32 – nine are predominantly AJ diseases, three are mostly found in Sephardi-Mizrahi Jewish and two are common in all Jewish groups (Supplementary Table 3). In addition, there is a higher incidence of various types of Retinitis pigmentosa among individuals with Jewish heritage32, as well as increased risk of developing breast and ovarian cancer among AJ women33. The rediscovery of these variants in our UKB analysis supports the effectiveness and accuracy of our approach for identifying deleterious variants enriched in UKB regions.Enriched and deleterious rare exonic variants in ScotlandWe discovered nine enriched (with p-values reported in Supplementary Table 4) and potentially deleterious variants in the regions of Scotland considered in our analysis (Table 1). Four of these variants are specific to the Shetland Islands, with one specific variant found in each of the Orkney, Strathclyde and South East Scotland regions. Two variants were also found to be shared across regions of Scotland: a variant associated with Usher syndrome found to be enriched in Shetland and Strathclyde and another associated with Bardet-Biedl syndrome appearing as enriched in both of the Northern Isles populations. For each of the identified variants we also computed a range for the predicted regional number of individuals homozygous for the variant (HOMALT range, Table 1), with the lower bound based on the assumption of random mating of a region’s individuals with the whole of the UK and Ireland (with MAFAVE representing the average variant MAF across the 20 regions in our study) and the upper bound based assuming random mating within the region only.Table 1 Enriched and potentially deleterious variants in samples from ScotlandEnriched and deleterious rare exonic variants in WalesWe identified nine enriched (Supplementary Table 4) and potentially deleterious variants in the Welsh groups in UKB, eight of which were specific to South Wales and one shared with individuals born in North Wales (Table 2). The lack of north Wales specific variants is likely to be explained by the almost four-fold smaller sample size for unrelated North Welsh (n = 883) individuals in our study compared to their Southern counterparts (n = 3239). Furthermore, it is possible that not all eight South Wales variants are truly specific to this region; some may be shared with the neighbouring English regions (e.g. Gloucestershire, Herefordshire, Shropshire and Cheshire), which were not included in our study due to the insufficient number of unrelated UKB individuals in these regions with WES data available.Table 2 Enriched and potentially deleterious variants in samples from WalesEnriched and deleterious rare exonic variants in EnglandOur analysis of the WES data from individuals born in the ten English regions discovered 22 enriched (Supplementary Table 4) and potentially deleterious variants (Table 3). Apart from a single variant found to be enriched in the North East England region (in the PNP gene), all of the remaining 21 variants were identified in four neighbouring regions: Lancashire, Staffordshire, Nottinghamshire and Yorkshire. In addition to variants specific to each of these regions, we also identified three variants (in the COL7A1, F11 and COL4A4 genes) as shared between two of these regions and one variant (ALMS1 gene) shared by individuals born in Lancashire, Staffordshire and Nottinghamshire.Table 3 Enriched and potentially deleterious variants in samples from EnglandEnriched and deleterious rare exonic variants in IrelandThe analysis of the 2005 unrelated UKB individuals who self-identify as Irish and were born in either Northern Ireland or the Republic of Ireland resulted in identification of two enriched (Supplementary Table 4) and potentially deleterious variants (Table 4).Table 4 Enriched and potentially deleterious variants in Irish individuals born on the island of IrelandOne reason for the relatively smaller number of enriched variants found in Ireland compared to other mainland UKB regions may be the different sample selection criteria – in contrast to our requirement for individuals in England, Scotland and Wales to exhibit very similar genetic ancestry based on a principal components analysis of the UKB whole-genome SNP array genotypes, the Irish participants were selected only based on self-identification as Irish and being born in Northern Ireland or the Republic of Ireland (Methods). As a result, it is possible that our sample of Irish individuals contains some with non-Irish ancestry, e.g. in the process of selecting Irish individuals we have identified and excluded six participants with AJ heritage. Another factor might be the relatively low number of Irish participants with available WES data. The analysed 2005 unrelated individuals represent the whole population of Ireland (about 7 million), thus inhibiting identification of potential within-Ireland differentiating signal(s).Cross-regional enriched and deleterious rare exonic variantsA deleterious variant causing a frameshift in the OBSL1 gene (chr2:219568063:G > GT, c.1273dup, p.T425fs, rs762334954) was found to be regionally enriched in the Northern Isles of Scotland (Orkney and Shetland) and puzzlingly, in geographically distant Wales. However, upon closer examination the variant also appears to be measurably enriched in other UKB regions as well, but failing to meet our stringent enrichment criteria there (Supplementary Table 5). This variant has been previously reported to be associated with the 3-M syndrome34, an extremely rare autosomal recessive primordial growth disorder, characterised by distinct facial features, radiological abnormalities, normal intelligence and final adult height in the range of 115 – 150 cm. The exact prevalence of this disorder remains unclear, with around 200 reported cases world-wide as of 2012 since the first published report in 1975, but predicted to have increased substantially with the greater awareness of the disorder and increased availability of genetic testing35. To estimate the practical impact of the elevated frequency of the OBSL1 variant, we considered its effect in each UKB region separately. The variant is predicted to exhibit regional genetic prevalence of individuals homozygous for it (computed as MAFREG2) of 1/16 k (~1500 times higher than gnomAD NFE individuals) in Orkney, 1/39 k (~600 times higher) in Shetland, 1/49k (~500 times higher) in North Wales and 1/518 k (~45 times higher) in South Wales. Assuming random mating within regions, it is expected there will be 1.4, 0.6, 1.4 and 5.2 homozygous individuals affected by the condition in the Orkney, Shetland, North Wales and South Wales regions, respectively. Given the mean MAF = 0.000467 in the remaining UKB regions, a genetic prevalence of 1/4.6 m (~5 times higher than NFE) can be expected assuming random mating, which translates to 10.7 individuals affected by the condition across these regions. Overall, due to the regionally elevated frequency of the OBSL1 variant, we estimate that up to 19 individuals across the UK and Ireland could be affected by 3-M syndrome due to being homozygous for this variant.Comparison of regional population genetic metricsMany factors could underlie the observed patterns of rare exonic variation across the 20 regions in our study. In previous work, we evaluated the roles played by founder effects, genetic drift and relaxation of purifying selection in shaping the isolated Shetland genome8. While founder effects appear to play a role in the more isolated populations in our study (e.g., Shetland, Orkney, full AJ), given the small amount of shared ultra-rare exonic variants per individual in other groups (Supplementary Table 1) it is unlikely that this is a major force driving the observed regional differentiation for the remaining regions. In this section, we provide a comparison of the 20 regions based on the estimates of several metrics designed to capture the effects various forces have on shaping the regional genetic landscapes. The data these analyses are based on have some important constraints, including the general UKB participation bias, the fact that WES data is only a small subset of whole-genome variation and is derived from the protein-coding regions which are known to be generally more intolerant to variation compared to the other parts of the human genome, the exclusion in our analyses of individuals born in large metropolitan areas and the lack of a more suitable reference dataset for individuals with AJ heritage. Therefore, we note that our results cannot be considered as absolute estimates of these population metrics, but are only to be used as means for comparing the 20 regions in our study.Regional variant frequency fluctuationIn previous section, we focused our analyses on a set of variants present in the gnomAD dataset30, with MAF in Non-Finnish European individuals less than 1% (MAFNFE < 1%), which were found to be significantly enriched in one or more of the UKB regions. Here, we perform similar analysis on a subset of more common variants with 1% ≤ MAFNFE < 5%, with each individual in our study carrying on average about 1100 such variants (Supplementary Table 1). While the former set is more informative for investigating monogenic disorders, the latter may contain variation relevant to complex polygenic traits and due to its larger size could provide more robust estimates of the regional variant fluctuations.Our results (Fig. 3) show a clear correlation between the proportions of variants showing a significant regional frequency fluctuation compared to the general European population and the observed degree of distinctiveness from the relatively homogenous English regions (Fig. 1 and Figs. S3, 5, 6). We speculate that the observed proportions of regionally enriched and depleted variants are mainly driven by genetic drift. Compared to the relatively small amount of regionally enriched variants in the ten English, three mainland Scottish and two Welsh regions (from 0.19% for East Anglia to 0.77% for North Wales, mean = 0.34%, sd = 0.15%, Supplementary Table 6), which as well as regional variants will likely represent variants enriched at nation-wide level compared to Europe, the remaining regions—Irish (1.18%), Orkney (3.06%), Shetland (3.62%), part AJ (4.82%) and full AJ (12.7%) – exhibit a much higher proportion of regionally enriched variants, which corresponds well with their levels of geographical/cultural isolation.Fig. 3: Estimate of regional variant frequency fluctuation.The numbers for each region represent the amount the SNPs in each category as a proportion of all regional SNPs found in Non-Finnish European (NFE) individuals with 1% ≤ MAFNFE < 5%. The regions in the plot are sorted based on the total proportion of enrichment.Finally, all variants in a population are subject to genetic drift, including common variants. However, it has been shown that the variant frequency fluctuation in a population is inversely correlated with the initial variant frequency36. Thus, variants common in Non-Finnish Europeans (MAFNFE > 5%) are expected to exhibit less informative regional fluctuations compared to those depicted in Fig. 3 (1% ≤ MAFNFE < 5%). Furthermore, based on their relatively high frequency and the fact that exonic variants with MAF > 5% are mostly synonymous37, common variants are considered to be less likely to have strong detrimental impacts on human health. For these reasons, such variants are outside of the scope of the current work.Regional nucleotide diversityOne important factor which affects the strength of genetic drift and the regional variant frequencies is the past and present effective population size (Ne). While a rigorous analysis of the regional historic and contemporary effective population sizes is outside of the scope of this work, for each region we have computed its nucleotide diversity (π = 4Neµ, where the mutation rate µ is generally similar across human populations). We use the estimated nucleotide diversity as a proxy for the effective population sizes potentially affecting the observed variant frequency fluctuations in the 20 regions.Our nucleotide diversity analysis is based on 44,108 SNP variants with MAF > 5% in the dataset of 10,001 unrelated individuals from 20 regions which are also present in the full gnomAD dataset (Methods). Their presence in a public dataset indicates they are less likely to be sequencing and/or variant calling artefacts in our data, while their relatively high frequency (MAF > 5%) suggests they are less likely to have a functional impact and therefore alleviates the potential conflating signal from various forms of selection, which may affect our nucleotide diversity estimates.Our results (Fig. 4, Supplementary Table 7) show that individuals with full AJ heritage exhibit the lowest nucleotide diversity followed by individuals from the other two isolated regions, Shetland and Orkney, the two Welsh regions and the Irish. The observed higher nucleotide diversity for the Northern Isles compared to the full AJ can be explained by the fact that although the population of each of the two archipelagos is an order of magnitude smaller than the predicted number of AJ individuals living in the UK, the nucleotide diversity in the AJ is strongly affected by their medieval bottleneck (estimated at N ~ 350)38, which was about an order of magnitude smaller than those in the Northern Isles39. The modern day Northern Isles genetic landscape was also significantly affected by the Scots-Norse admixture event16,40,41. While the remaining English and Scottish regions appear to have roughly similar nucleotide diversity, the individuals with part AJ heritage exhibit the highest nucleotide diversity in our dataset, due to their recent admixture with the remainder of the UK population.Fig. 4: Estimate of regional nucleotide diversity.Lower π value implies smaller effective population size (Ne). The regional π estimates are computed based on known SNPs found with MAF > 5% in our 20 region dataset.Regional strength of purifying selectionAnother important factor having an effect on variation in protein-coding exonic regions is selection—variants improving Darwinian fitness increase their frequency via positive selection, while those with detrimental effects are removed by purifying selection. It has been shown previously that isolated populations, due to their smaller effective population size, exhibit weaker purifying selection8,39. Here, we investigate and compare the 20 regions based on the regional estimates of the strength of purifying selection.Our estimate of the strength of the purifying selection is based on SNPs found in each regional cohort of 500 unrelated individuals with MAF ≤ 1% and not reported in the full gnomAD dataset, further split to LOF (including start lost, stop gained and splice acceptor/donor site) and synonymous variants. To account for the possibility that some of these ultra-rare variants may be sequencing/variant calling artefacts, we compare the 20 regions based on the mean number of LOF variants corrected by the mean number of synonymous variants discovered in an individual. By choosing variants not present in gnomAD, we focus our analysis on some potentially recent local variants (present in the UK and Ireland, but not reported or ultra-rare elsewhere) and by imposing the regional MAF threshold, we enrich for LOF variants that are likely functional (based on their predicted effect and their rarity in our data) and therefore subject to purifying selection. Our metric is similar to and inspired by the SVxy metric39, which cannot be directly applied in our context.Our results (Fig. 5) suggest that purifying selection is strongest in the cosmopolitan London region and weakest in the isolated full AJ group, with all Scottish regions, Nottinghamshire, Lancashire and South Wales exhibiting a somewhat weaker strength of purifying selection compared to the remaining regions. Individuals from Strathclyde have the highest LOF/synonymous ratio among non-AJ regions, however closer examination reveals that this is due to a disproportionate decrease in the mean number of the synonymous variants compared to the LOF variants discovered in this group (Supplementary Table 8). We note that while it can be expected that the regional strength of purifying selection will have a minimal impact on our reported set of 67 regionally enriched variants, due to their recessive nature, further research based on deeper and wider regional datasets may be informative in evaluating the effect of purifying selection in the dominant context.Fig. 5: Estimate of regional strength of purifying selection.Lower LOF/synon ratio implies greater strength of purifying selection. The regional LOF/synon ratio estimates are computed based on variants with regional MAF ≤ 1% and not reported in the full gnomAD dataset. LOF: Loss of function variants, synon: synonymous variants.Cross-regional gene flowThe last factor with an impact on regional stratification which we considered is gene flow, which in our context translates to inter-regional mating and/or migration. To evaluate its effects, we used the previously computed pair-wise FST distance matrix converted to a similarity matrix by taking the reciprocals of the original values as an input to the R package qgraph42 (v 1.9.8) and generating the spring layout using the Fruchterman-Reingold algorithm43. In the resulting network, the distance between nodes is expected to correspond well to the absolute edge weight between those nodes and the colour saturation and the width of the edges corresponds to the absolute weight and scale relative to the strongest weight in the graph.In addition to confirming the status of the Northern Isles and AJ as distanced to the mainland regions, our results (Fig. 6) provide several interesting observations. For example, while there is a signal of higher gene flow between the three mainland Scottish regions compared to the rest of the regions, no such observation can be made regarding the two Welsh regions—the gene flow between them appears to be on the same scale as with the remaining mainland regions, which potentially explains the clear separation of the two in our UMAP analysis (Fig. 1). Furthermore, while the ten English regions seem to form a loose cluster, there is a tighter hub-and-spokes cluster of seven English regions with London in the centre, which does not include the North East England, Lancashire and Staffordshire regions. Lastly, there is no evident signal of regional preference in the recent admixture of the part AJ group.Fig. 6: Estimate of cross-regional gene flow among the 20 regions.The cross-regional gene-flow estimate is computed using the calculated pair-wise FST distances among the regions using non-singleton known SNPs with MAF < 5% in our chosen set of 10,001 unrelated individuals (same set of variants used for the UMAP analysis presented in Fig. 1).Biomedical implications of the regionally enriched deleterious variationThe relatively high genomic homogeneity in individuals with Ashkenazi Jewish heritage has been well established and targeted genetic screens for variants implicated in various monogenic disorders have been adopted in Jewish populations world-wide. Enabled by the large-scale WES data in UK Biobank, we have demonstrated the existence of analogous deleterious variant enrichment within various geographical regions of the UK and Ireland (Tables 1–4). Our results highlight some single variants with large impact, for example the FMO3 and SPG7 gene variants in Irish individuals, the SLC7A9 variant in South Wales, the LOXHD1 variant in South East Scotland and the CHEK2 variant in Welsh individuals. They also provide a disorder-centric view, for example Cystinuria with between 7 and 44 individuals predicted to be homozygous for the enriched causative variants, Primary ciliary dyskinesia with 2–7 individuals, Glycogen storage disease with 1–6 individuals, Leber congenital amaurosis with 1–6 individuals, and Dystrophic epidermolysis bullosa with 1–5 individuals. Additionally, based on our data, some regional aggregates may be estimated, for example between 30 and 138 Irish individuals predicted to be homozygous for the regionally enriched deleterious variants, 14–99 South Wales homozygous individuals, 13–65 South East Scotland individuals, 13–42 Lancashire individuals, etc. It should be noted that the reported numbers of predicted individuals homozygous for the regionally enriched deleterious variants are a clear underestimate of the number of individuals potentially affected by the corresponding condition due to the genetic and locus heterogeneity of the disorders, which is not taken into consideration in our calculations.Given the non-negligible total number of individuals predicted to be affected by the reported regionally enriched variation, it appears reasonable to assume that this fact would have been ascertained through clinical means. However, given the variety of the implicated disorders, their general rarity and the potential aggregation at national level, we believe that our regionally/disorder-based approach provides unique insights and complements the existing awareness. Additionally, recent reports suggest that even for some well-studied conditions the clinical ascertainment may be suboptimal. For example, a large study focusing on pathogenic familial hypercholesterolaemia (FH) variants found that by employing targeted sequencing, almost half of the carriers were not previously known to their health provider and received a new diagnosis of FH44. Another study focusing on a BRCA1 pathogenic missense variant found it to be ~500-fold enriched in Orkney compared to the UKB participants and doubled the number of kindreds in which the pathogenic variant was seen to segregate compared to what was previously known to the NHS13, highlighting again the advantages of cohort sequencing compared to familial cascade genetic testing.Our work is a computational study of population genetic data that is not intended to provide insights into the aetiology or treatment of diseases. It aims to gauge the extent to which rare, deleterious variants are enriched in regional British and Irish populations. We demonstrate that, even with conservative thresholding, many regional populations are relatively enriched for otherwise very rare deleterious variants within genes that are already known to cause a range of rare human diseases. To our knowledge, these enrichments have not been observed previously and they provide useful insights into the regional genetic legacies of recent population dynamics. In addition, the deleterious variants we discover are potentially actionable, by inclusion in genetic screening efforts such as these that already exist for known isolated populations. Examples include screens tailored for individuals with AJ heritage in the UK29, the USA45, and Israel46, those from Old Order Amish and Old Order Mennonite communities in the USA47 and a recent small pilot trial offering testing for the BRCA1 variant in the Orkney outer isle of Westray discussed above (https://www.nhsgrampian.org/news/2023/july/testing-pilot-trial-now-underway-for-orkney-cancer-gene-link/).We believe that after careful consideration reproductive carrier screening could be carried out in a cost-effective manner, with a better understanding of the regional landscape of pathogenic rare variation across the British Isles, to which we have contributed. In the future, this landscape could inform new screening strategies, benefitting from the diverse regional burdens of pathogenic variation within a country, to decrease the burden of Mendelian disease.

Hot Topics

Related Articles