Genetic association and computational analysis of MTHFR gene polymorphisms rs1801131 and rs1801133 with breast cancer in the Bangladeshi population

Demographic characteristics of the study participantsThe demographic characteristics of the study participants, including age, BMI, family history of cancer, occupation, residence area, educational status, monthly income, menstrual status, age at menarche, and number of pregnancies are presented in Table 1.Table 1 Demographic characteristics of the study subjects.In our study population, more than 75% of participants (both case and control) are within the 30 to 50 years age group. In both groups, the majority (around 90%) of the participants are housewives. Only a handful (10.89% patients and 13.46% controls) of the participants in each group received higher secondary or above levels of education. Participants from both groups (91.09% patients and 89.42% control individuals) predominantly belong to the lower income group (< 20,000 BDT) and reside in the rural area of Bangladesh (74.25% patients and 68.27% control individuals). Breast cancer patients have more than twice (12.87%) the incidence of cancer in family history, compared to control individuals (5.77%). But no significant differences were found between breast cancer patients and healthy controls regarding age, BMI, family history of cancer, occupation, residence area, educational status, monthly income, number of pregnancies, or age at menarche (p > 0.05). However, a statistically significant difference in menopausal status was observed between the patients and control subjects.Clinicopathological data of breast Cancer patientsThe clinical and pathological data of the breast cancer patients enrolled in this study are shown in Table 2.Table 2 Clinicopathological data of the patients included in this study.Almost all (99.5%) patients included in this study had invasive ductal carcinoma (IDC), and only 1 patient had invasive lobular carcinoma (ILC). In terms of hormone receptor status, 74.14% of patients were ER+, 65.49% were PR+, and 69.9% were HER2+. A total of 63.86% of patients had a tumor within 2 to 5 cm, and 74.75% of patients had grade 2 (G2) tumors.MTHFR polymorphisms and the risk of developing breast cancerGel images of restriction digestion products of different genotypes of rs1801133 (C677T) and rs1801131 (A1298C) are shown in Fig. 1.Fig. 1Gel images of restriction digestion products of different genotypes of rs1801133 (C677T) and rs1801131 (A1298C). (A) Genotypes of rs1801133 (C677T). The 198 bp fragment in lanes 2, 4, and 7 indicates the wild-type C/C genotype. The presence of 198 bp, 175 bp, and 23 bp in lanes 1, 3, 6, and 8 indicates a heterozygous C/T genotype, while the presence of 175 bp and 23 bp in lanes 5 and 9 indicates the T/T genotype. Lane 10 contains a 25 bp DNA ladder. (B) The rs1801131 (A1298C) genotype. The 56 bp, 31 bp, 30 bp, 28 bp, and 18 bp fragments in lanes 3, 6, and 9 indicate the homozygous wild-type A/A genotype. The presence of 84 bp, 56 bp, 31 bp, 30 bp, 28 bp, and 18 bp in lanes 4, 7, and 10 indicates a heterozygous mutant A/C genotype, while the presence of 84 bp, 31 bp, 30 bp, and 18 bp in lanes 2, 5, and 8 indicates a homozygous mutant C/C genotype. Lane 1 contains a 25 bp DNA ladder.The frequencies and associations of different genotypes according to different genetic models with the risk of breast cancer, as measured by odds ratios (ORs) at the 95% confidence intervals (CIs), are shown in Table 3.Table 3 Distribution of MTHFR rs1801131 (A1298C) and rs1801133 (C677T) genotypes in study participants and assessment of the risk of breast cancer.In the control group, the genotype frequencies of rs1801131 (A1298C) were 32.69%, 61.54%, and 5.77% for the homozygous wild type (AA), heterozygous variant (AC), and homozygous variant (CC) strains, respectively. In the patient group, the frequencies were 10.39% homozygous wild type (AA), 75.25% heterozygous variant (AC), and 14.36% homozygous variant (CC). The allele frequency of alternate allele C was 36.60% in the control population and 51.98% in breast cancer patients. On the other hand, for rs1801133 (C677T), the control group had 76.92% homozygous wild type (CC), 20.19% heterozygous variant (CT), and 2.89% homozygous variant (TT) genotypes. In addition, in breast cancer patients, the frequencies were 68.32%, 28.22%, and 3.46% for homozygous wild type (CC), heterozygous variant (CT), and homozygous variant (TT), respectively. The minor allele frequency is 12.98% in the control population and 17.57% in the patient population.According to all the genetic models analyzed (codominant, dominant, recessive, and overdominant), rs1801131 was significantly associated with the risk of breast cancer (p < 0.05). According to the OR analysis, AC and CC genotype carriers had 3.85- and 7.82-fold greater risks of developing breast cancer, respectively, in the codominant model. According to the dominant model, AC + CC genotype carriers had a 4.19-fold greater risk of developing breast cancer, and according to the recessive model, carriers of the CC genotype had a 2.74-fold greater risk of developing breast cancer. Overall, compared with carriers of the reference A allele, carriers of the mutant C allele had an increased (OR = 1.89, p < 0.05, 95% CI = 1.33–2.66) risk of developing breast cancer.For rs1801133, no significant association was detected in any of the genetic models. Although the OR analysis revealed a greater risk of developing cancer for both heterozygous (CT) and homozygous mutant (TT) T allele carriers, the difference was not statistically significant (p > 0.05).The genotype distributions of both SNPs in the study participants are shown in Fig. 2.Fig. 2Genotypic distribution of the two SNPs in the study participants. (A) Distribution of rs1801131 and (B) distribution of rs1801133.Distribution of MTHFR rs1801131 (A1298C) and rs1801133 (C677T) genotypes in the study subjects according to menopausal statusThe female subjects were stratified according to their menstrual status (premenopause and postmenopause), and the risk of developing breast cancer in these groups was analyzed. In premenopausal women, rs1801131 was significantly associated with breast cancer risk. AC and CC genotype carriers had 5.06 and 16.25 times greater risks of developing breast cancer, respectively, than did the reference AA genotype carriers. In postmenopausal women, no significant association was found between genotype rs1801131 and breast cancer risk. For rs1801133, no significant association was found with the risk of developing the disease in either group. The results are presented in Table 4.Table 4 Distribution of rs1801131 (A1298C) and rs1801133 (C677T) genotypes in the participants stratified against menopausal status.Association of the target SNPs with tumor grade and size in breast cancer patientsIn the patient group, associations of the polymorphisms with tumor size and grade were analyzed. The results are presented in Tables 5 and 6.Table 5 Distribution of rs1801131 (A1298C) and rs1801133 (C677T) genotypes in patients with different tumor grades.Table 6 Distribution of rs1801131 (A1298C) and rs1801133 (C677T) genotypes in patients with different tumor sizes.The presence of the genotype CT in rs1801133 was significantly associated with an increased risk of larger tumor size (OR: 4.46, 95% CI: 2.22–9.45), but no significant association was found with tumor grade. Conversely, rs1801131 was not associated with either tumor size or grade.Confirmation of RFLP genotyping results by sequencingPCR products of different genotypes were sequenced by the Sanger sequencing method (BTSeq), and all samples matched the results obtained by PCR-RFLP. A representative chromatogram is shown in Fig. 3.Fig. 3Chromatogram showing the sequencing results of different genotypes of rs1801133.Constancy of genotype frequencies, linkage disequilibrium (LD) and haplotype analysisThe Hardy‒Weinberg equilibrium was used to analyze the constancy of the genotype frequencies observed in this study. For rs1801131, deviation from the Hardy‒Weinberg equilibrium was observed in both the control and patient groups, and both combined. However, for rs1801133, the study population was in equilibrium. The results are presented in Table 7.Table 7 Hardy‒Weinberg equilibrium test of 2 SNPs (rs1801131 and rs1801133) in the study subjects.LD analysis and R2 results for the two SNPs are shown in Fig. 4. Although D’ is 0.71, R2 is very low (0.08), which represents weak LD between the SNPs.Fig. 4Linkage disequilibrium (LD) plot for 2 SNPs of the MTHFR gene.Four (4) possible haplotype combinations are possible for two (2) SNPs studied. The results of the haplotype analysis are presented in Table 8.Table 8 Haplotype analysis of 2 SNPs (rs1801131 and rs1801133) in the participants.The AC haplotype (reference alleles at both loci) is protective against breast cancer development (OR = 0.468, 95% CI = 0.332–0.659, p < 0.05), and the CC haplotype is associated with an increased risk of breast cancer development (OR = 1.757, 95% CI = 1.245–2.48, p < 0.05).Total WBC count and ESRBreast cancer patients had significantly greater (p < 0.05) total WBC counts (9,271 ± 2,608 cells/mm3) than did control subjects (8,139 ± 1,140 cells/mm3). The results are shown in Fig. 5.Fig. 5Violin plot demonstrating Total WBC count in breast cancer patients and healthy controls.The ESR was also significantly higher in breast cancer patients (36.69 ± 178.06 mm/1st hour) than in healthy controls (23.8 ± 8.26 mm/1st hour). Comparison between patient and control ESR is shown in Fig. 6.Fig. 6Jitter plot showing ESR in breast cancer patients and healthy controls.In silico analysis of the effects of SNPs on the MTHFR proteinAnalysis of functional consequences of rs1801131 (E429A) and rs1801133 (A222V)Different web-based tools have been used to predict the effect of polymorphisms on the MTHFR protein. All the tools, except SIFT, predicted the A222V (rs1801133) polymorphism as damaging or disease-causing. However, the E429A (rs1801131) polymorphism was predicted to be tolerated or neutral by all the tools. Both MUpro and INPS-MD predicted that both mutations decrease the stability of the protein. The results are summarized in Table 9. The scores from the different tools are shown in Supplementary Table 3.Table 9 Functional effects of the two SNPs.The results from the ConSurf server revealed the functional and/or structural conservation of amino acid residues at the SNP sites. Functional residues are defined as being highly conserved and being in an exposed position on the protein surface, whereas structural residues are defined as being highly conserved and in a buried position. Alterations of amino acid residues in highly conserved regions are more detrimental than those in less stable regions. In MTHFR, A222 was found to be a conserved and buried residue, whereas E429 was an exposed but variable residue. Therefore, the A422V polymorphism might be damaging to the protein.Homology modelingThe 3D structure of the human MTHFR protein (6FCX) obtained from the RCSB PDB had 2 mutations in its sequence. Therefore, the reference sequence from UniProtKB was obtained in FASTA format and used to construct the reference 3D model. Two (2) other models were generated by substituting A222 and E429 with valine and alanine, respectively. Evaluation of the predicted structure by the SWISS-MODEL structure assessment tool, ProSA-Web, and ERRAT yielded satisfactory results, and the models were of good quality. The evaluation scores from the different tools are presented in Supplementary Table 4. The changes in amino acid residues in 3D structures were also visualized and are presented in Supplementary Figs. 1 and 2.Analysis of the impact of mutated residues on protein structuresMutated structures were superimposed onto the predicted wild-type structure using PyMol, and RMSD values were calculated. A higher RMSD indicates a greater deviation between the structures. Both structures had the same RMSD (0.062 Å) as the wild-type structure. Additionally, both structures had the same TM score of 0.99995. TM scores are calculated within a range of 0 to 1, where a score of 0–0.3 indicates random structural similarity and 0.5-0.99999 indicates a good fit between the models. A score of 1 represents a perfect match between the C alpha atoms of the protein backbone. TM-align also provided an RMSD between the protein structures, which was similar to that of PyMol (0.06 Å vs. 0.062 Å). Combining the RMSD value and TM score, it can be deduced that the protein structure, on a broad scale, was not substantially altered by these substitutions. However, interactions with neighboring residues or the cellular environment might be affected by these polymorphisms.The superimposed structures were further analyzed for the impact of the mutated residues on neighboring interactions and bonding patterns. The mutated residues on superimposed structures and bonding patterns of wild-type and mutated residues are shown in Fig. 7.Fig. 7Impact of the mutated residues on neighboring interactions. (A), (C), and (E) correspond to the A222V polymorphism (rs1801133), and (B), (D), and (F) correspond to the E429A polymorphism (rs1801131). (A) The superimposed structure of wild-type alanine and mutated valine at position 222 (the wild-type structure is shown in teal ribbons, and the mutated structures are shown in orange). (B) The structures of wild-type glutamate and mutated alanine at position 429 (wild-type in teal and mutated in yellow.) (C) and (E) show the differences in neighboring interactions due to the A222V polymorphism. (D) and (F) demonstrate the effects of the E429A polymorphism on neighboring interactions. Hydrogen bonds (H-bonds) are denoted in green, and hydrophobic interactions are shown in purple.As shown in Fig. 7(A), the introduction of valine, which substitutes for the alanine residue at position 222, caused an increase in residue size, but because both residues are hydrophobic, neither the charge nor the hydrophobicity changed. Fig. 7(C) shows that the H-bond distance between Ala222 and Val218 was 2.978 Å, whereas the hydrophobic interactions between Ala222 and Ile192 and between Ala222 and Val194 were 5.247 Å and 3.937 Å, respectively. On the other hand, for the mutated valine residue at 222, the H-bond distance with Val218 decreased to 2.961 Å, but another new hydrophobic interaction was introduced with Val179, which had a distance of 4.722 Å (Fig. 7(E)). The previous interactions with Ile192 and Val194 were retained but were reduced to 5.192 Å and 3.46 Å, respectively.For the E429A polymorphism, glutamic acid was replaced by alanine (Fig. 7(B)). This caused differences in size, charge, and hydrophobicity, as glutamic acid is a polar, acidic residue, but alanine is a nonpolar, hydrophobic amino acid. However, no bonds were diminished with neighboring residues. In both cases, 3 H-bonds were observed with Ser427, Ser432, and Glu433. For Glu429, the distances were 3.276 Å, 3.089 Å, and 3.135 Å (Fig. 7(D)). When alanine was introduced at position 429, the distances were 3.270 Å, 3.154 Å, and 3.236 Å, respectively (Fig. 7(F)).The polymorphisms affected the bond angles and bond distances between the C alpha of adjacent amino acids. For the A222V polymorphism, in the wild-type structure, the bond angle between Gly221-Ala222-Asp223 was 121.74°, and in the mutated structure, the angle between Gly221-Val222-Asp223 was 125.32°. The C alpha distance for Gly221-Ala222 was 3.818 Å for the wild-type structure and 3.824 Å for Gly221-Val222 for the mutated structure. Between the C alphas of Ala222-Asp223, a distance of 3.813 Å was observed in the wild-type structure. In the mutated structure, the distance was 3.818 Å. In the structure predicted for E429A, the bond angle between Glu428-Ala429-Ser430 was 87.85°, but it was 88.49° in the wild-type structure, between Glu428-Glu429-Ser430. The bond distance between Glu428 and Glu429 was 3.854 Å, and that between Glu429 and Ser430 was 3.853 Å in the wild-type structure. For the structure with alanine at position 429, the bond distances were 3.836 Å and 3.845 Å for Glu428-Ala429 and Ala429-Ser430, respectively.During the superimposition of the mutated structures with the wild-type structure, as shown in Fig. 7(A) and (B), little to no fluctuations were observed for either amino acid. For A222V, the C alpha was displaced by 0.1 Å compared to that of the wild-type structure. The C beta moved by 0.3 Å, the N by 0.4 Å, and the O atom moved 0.2 Å away from their positions in the wild-type structure. For the E429A polymorphism, the values were 0.1 Å for all four atoms mentioned.The substrate binding site of MTHFR differs from that of the polymorphisms analyzed. The residues important for binding 5-methyl THF and NAD(P)H are Q228, Q267, K270, L271, and L323. The residues were visualized in the superimposed position, and the residues in the mutated structures did not change in position and superimposed perfectly to the reference structure. The residues in the superimposed position are shown in Supplementary Fig. 3.

Hot Topics

Related Articles