Uncovering genetic loci and biological pathways associated with age-related cataracts through GWAS meta-analysis

CohortsTo identify risk loci for cataracts, we conducted a GWAS meta-analysis that encompasses 67,844 cases and 517,399 controls from a meta-analysis of the United Kingdom Biobank (UKB) and Genetic Epidemiology Research in Adult Health and Aging (GERA) cohorts4, 2920 cases and 17,127 controls from the MGBB and 50,961 cases and 287,330 controls from FinnGen. We leveraged data from two independent Australian cohorts, the Raine Study and BHAS25,26,27, to derive PRS based on the meta-analysis results and test their predictive ability to validate the associated loci (Fig. 5). A brief description of each of the cohorts included in the present study is provided below.Fig. 5: Cohorts included in the cataract meta-analysis and polygenic risk score (PRS) analysis.UK biobank (UKB), Adult Health and Aging (GERA), Mass General Brigham Biobank (MGBB).The MGBB (formerly Partners HealthCare Biobank) is a long-term medical research repository allocated within the Partners HealthCare System in Boston, Massachusetts28. The Biobank is dedicated to collecting and storing biospecimens, such as blood and tissue samples, from over 130,000 participants and genomic data from over 65,000 participants; of which 47% are over 60 years of age. The Partners HealthCare Biobank operates with the highest ethical standards and follows the Declaration of Helsinki to ensure the protection of the rights and welfare of study participants. We performed a GWAS in 2920 cases and 17,127 controls using PLINK 1.90 beta. The phenotype was defined based on ICD-9/ICD-10 codes in the electronic health records of the participants.The FinnGen Study is a biobank located in Finland that aims to encourage human genetics research and promote the discovery of novel treatments for various diseases. It is one of the largest biobanks in Europe including over 220,000 participants with genomic data and with a median age of 63 years29. The biobank is maintained by the Institute for Molecular Medicine Finland. The summary statistics used in this study included 50,961 cases and 287,330 controls and are labeled as “senile cataracts” as part of Finngen release eight (https://r8.finngen.fi/pheno/H7_CATARACTSENILE).The current meta-analysis also includes the previous larger cataract multi-ancestry meta-analysis that encompassed GERA and UKB4, using their publicly available summary statistics. The GERA cohort contains clinical and genomic data of over 110,000 participants; 33,145 patients who have undergone cataract surgery and 64,777 controls were included in this study. The UKB is a large prospective study following the health of ~500,000 participants, including 34,699 cataract cases defined as “participants with a self-reported cataract operation (f20004 code 1435) or/and a hospital record including a diagnosis code (ICD-10: H25 or H26)”4, and 452,622 controls were included in the meta-analysis. Cataracts were clinically diagnosed based on ICD-9/ICD-10 criteria in GERA, the MGBB, and FinnGen. Detailed information regarding phenotype definition, genotyping, QC, and imputation procedures for all cohorts is provided in Supplementary Data 6.GWAS meta-analysisWe conducted an IVW fixed-effect meta-analysis of 121,725 cases and 821,856 controls using METAL30. All variants were aligned to the positive strand on build GRC37 hg19 and those with a minor allele frequency (MAF) < 1% were removed. In cases where the MAF was not available for the study (i.e., GERA and MGBB), it was derived from a reference panel based on five thousand healthy individuals from UKBB. Linkage disequilibrium (LD) clumping was used to identify the independent genome-wide significant loci within a 1 Mb window. Clumping of the results was conducted in PLINK 1.9 with a P value cut-off of 5e-8 and r-squared >0.01 using the 1000 Human Genome Project reference panel. Employing the same configuration for the clumping analysis, we estimated the independent loci for the previous, larger meta-analysis, resulting in 44 independent loci. Loci that were shared between the previous and this study’s meta-analysis were considered known, while those unique to this study were deemed as novel. SNP heritability estimates for the GWAS meta-analysis were estimated using LDSC v1.0.131. Manhattan plots were generated using custom code32.Functional annotation and eQTLTo provide insights into the genes and biological pathways underlying cataract etiology, we used functional annotation as implemented in FUMA (Functional Mapping and Annotation of Genetic Variants)33 v1.3.6 and mapped variants to protein-coding genes using MAGMA (Meta-Analysis Gene-set Mining of GWAS) v1.08. FUMA is a tool for functional annotation of genetic variants that integrates functional genomics data from various sources, such as GWAS and eQTL to provide functional annotation and prioritize genetic variants. MAGMA performs gene-based and gene-set analyses to identify genes and genetic pathways that are enriched for genetic variants associated with a trait34. We used a p value threshold of 5e-8 to define loci associated with cataracts in GWAS and applied a Bonferroni correction for multiple testing and corrected for the total number of genes in the gene-based analysis (p = 0.05/19,491 genes) and gene-set analysis (p = 0.05/17,017 gene-sets). Gene-based analysis was used to prioritize associations between genes and cataracts. Gene-set tests, also known as pathway analysis or enrichment analysis, were employed to assess groups of genes that are functionally related and collaborate in biological pathways that are associated with cataracts etiology process. Gene sets were obtained from Msigdb v7.0 for “Curated gene sets” and “GO terms” as part of the magma analysis and are incorporated in FUMA V.108 pipeline.Genes identified through gene-based analysis in MAGMA were further evaluated with the integration of blood eQTL data from 2765 individuals35 of the Consortium for the Architecture of Gene Expression by using SMR. SMR is a commonly used method to interrogate the association between gene expression and complex human traits using GWAS summary statistics36. We applied a Bonferroni correction, which was set at 0.05 divided by the number of genes (N = 134), to account for multiple testing.Drug-gene interactionGenes that were consistent between MAGMA and SMR analyses were evaluated for their potential drug-gene interactions using vs 4.0 of the Drug-Gene Interaction Database37. The Drug-Gene Interaction Database is a carefully curated database that collects information on both established and predicted interactions between drugs and genes. It combines data from various sources to give a comprehensive overview of drug-gene interactions. Drug-gene interactions are instrumental in the development of strategies for preventing cataracts, as they provide information about potential drug exposures that could be related to therapeutic approaches for comorbid conditions or lifestyle factors that increase the likelihood of cataracts.Mendelian randomizationWe estimated the putative causal relationship between Type 1 Diabetes38 and cataracts using the TwoSampleMR framework implemented as a package in R 4.0.2. “TwoSampleMR”39 package v 0.5.5 is an R package that enables the estimation of causal effects between an exposure and an outcome of interest using summary-level data from GWAS through two-sample MR. The package includes various methods such as IVW, weighted median, MR-Egger, MR-PRESSO, and MR Rucker. We selected independent instrumental variables based on clumping using PLINK 1.9, with the following parameters: –clump-r2 0.001, –clump-p1 5e-8, –clump kb 1000. If the IVW estimate showed evidence for a nominal causal association, we further re-assessed the MR relationship using a series of alternative MR models, including MR-Egger, weighted median, simple, and weighted mode40. We calculated the proportion of PVE explained by SNP based on the equation below and used the traits where SNPs collectively explained at least 1% of the PVE. Here, β is the effect of the variant, MAF is the minor allele frequency, SE is the standard error and N is the sample size.$${R}^{2}=\frac{2{\beta }^{2}{MAF}(1-{MAF})}{{2\beta }^{2}{MAF}\left(1-{MAF}\right)+({SE}{(\beta )})^{2}2{NMAF}(1-{MAF})}$$
(1)
Multivariate Mendelian randomization (MVMR) was employed to assess whether the genetic variants associated with Type 2 Diabetes41 were causally influencing the association between Type 1 diabetes and Cataracts; this analysis is included in the “TwoSampleMR” package.Considering the association between lipid metabolites and cataracts highlighted by the gene-set analysis, we further assessed a potential causal relationship through an MR framework. We utilized summary statistics data from 249 metabolic markers, including amino acids and metabolites related to glycolysis and fatty acids. This data was generated through metabolic profiling conducted by Nightingale Health’s NMR metabolomics platform, which analyzed over 118,000 participants from the UKB42. We employed GSMR43 v1.91 to investigate the potential causal relationship between cataracts and the mentioned phenotypes. This method only requires GWAS summary statistics to estimate MR effect sizes and accounts for correlated SNP instruments by modeling LD from a pre-specified reference panel. Additionally, it uses the HEIDI-outlier statistical test to look for heterogeneous SNP outliers. We applied specific parameters to select independent instrumental variants (–clump-r2 0.001, –gwas-thresh 5e-8, and –clump kb 1000 –heidi-thresh 0.01). To control for Type-1 Error, we applied the Bonferroni correction by setting the p value threshold to 0.05/(249 metabolites) = 2e-04. To avoid sample overlap with UKB participants, we excluded GERA-UKB meta-analysis and based the MR results on a meta-analysis of FinnGen and MGBB.Polygenic risk scores (PRS)PRS is a statistical method that adds the number of risk alleles a person carries weighted by their effect sizes to estimate an individual’s risk for developing a particular disease. PRS can be used to predict an individual’s risk for cataracts and can also help identify individuals who may benefit from early intervention. A brief description of each of the cohorts included in the PRS analysis is provided below.The Raine Study is a prospective multigenerational observational study from Western Australia44,45. Between 1989 and 1991, 2900 women in the first trimester of their pregnancy were recruited from metropolitan Perth, Western Australia. A total of 2868 offspring (Gen2) were born to these women, and the birth cohort has been undergoing a series of health and medical examinations since before they were born. Comprehensive eye examinations were conducted at 20 and 28 years of age25,26. The Raine cohort consists of individuals under 30 years of age, which means they do not have age-related cataracts. Therefore, only estimates for UVR-related phenotypes (as explained in the section below) were assessed in this cohort.Blood specimens were obtained from participants during the Gen2 14- and 17-year follow-up assessments. Of the 1592 participants, samples were analyzed in 2010 using the Infinium HD Human660W-Quad Beadchip Array while samples of an additional 310 participants were analyzed in 2013 using the Infinium OmniExpress-24 BeadChip Array.The BHAS is a long-term, population-based study of 5107 adults born between 1946 and 1964 recruited from the City of Busselton, a coastal city in Western Australia and is focused on two examination periods conducted between 2010 and 201527 and between 2016 and 2022. The BHAS involves the collection of detailed data on various aspects of health and well-being, including physical, cognitive, and mental health, as well as lifestyle and environmental factors. All follow-ups of the Raine Study and phases of the BHAS have been approved by the University of Western Australia Human Research Ethics Committee and comply with the Declaration of Helsinki.BHAS participants underwent blood sample collection and were genotyped using the Illumina Infinium Global Screening Array. Quality control measures were implemented to ensure data accuracy, including the exclusion of data with a single SNP call rate <0.95, Hardy-Weinberg equilibrium p value less than \(1{0}^{-6}\), and MAF less than 0.01. Population outliers were identified and excluded through a principal component analysis to maintain participants with known European ancestry using data from the 1000 Human Genome Project reference. The post-quality control data was then imputed against the TOPMed reference panel. SNPs with an imputation accuracy >0.3 and MAF > 0.01 were included in further analysis.The PRS for cataracts was generated using PLINK 2.0. We selected independent SNPs using the following parameters: –clump-r2 0.05, –clump-p1 5e-8, and –clump kb 1000. We used a subset of the UK biobank that includes 5000 healthy individuals as a linkage disequilibrium reference for the clumping process. We used a generalized linear model to assess the correlation between the scores derived based on cataract genome-wide significant SNPs (p < 5e-8) and cataracts in an independent cohort (Busselton, N cases = 389, N controls = 4416). We also conducted a comparison of the PRS results with a PRS derived from the preceding cataracts meta-analysis4, employing the same parameters.We further explored the association between the PRS of cataracts and two surrogate measurements of the UVR exposure; pterygium (Busselton N cases = 516, N controls = 4029), non-cancerous growth of the conjunctiva membrane that is highly associated with UVR exposure, and conjunctival ultraviolet autofluorescence (CUVAF; N BHAS = 4384, N the Raine Study Gen2 = 1847) a non-invasive and objective method of measuring the amount of UVR exposure at the bulbar conjunctiva46. Participants who have previously undergone pterygium removal surgery were also included in the analysis as cases25,47. Given the limited statistical power derived from the scarcity of genome-wide significant instruments for UVR exposure phenotypes, further analysis utilizing MR methods to confirm the causal relationship between UVR exposure and cataracts was not feasible.Reporting summaryFurther information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Hot Topics

Related Articles