Cluster effect for SNP–SNP interaction pairs for predicting complex traits

For Part 1, we are interested in evaluating FIRs1k for pairs in a cluster. As shown in Fig. 2a–h and Supplementary Table S1, FIRs1k based on 1pRule were larger than 3pRule. For C1 with a high significance level (p-main = 1.4 × 10–8) under a sample size of 10,000, the FIRs1k was 82.3% for 1pRule but reduced to 29.5% for using 3pRule. For C3 with a high significance level (p-main = 2.1 × 10–7) under a sample size 10,000, the FIRs1k was 53.0% for 1pRule but reduced to 23.8% for using 3pRule. These results support that 3pRule can effectively reduce FIRs1k compared with the 1pRule. As for the sample size effect, we observed that a large sample size caused a high FIRs1k, and this trend was applied for both 1pRule and 3pRule. For example, FIRs1k for C3 with a high significance level were 24.8%, 23.8%, and 10.8% under a sample size of 20,000, 10,000, and 5000, respectively, based on 3pRule. In addition, the significance level of the main effect (p-main) for the hub SNP also affected FIRs1k. The smaller value of the hub SNP’s p-main generally had a higher FIRs1k. Using the C1 cluster as an example (Fig. 2a), the FIRs1k were 29.5%, 28.8%, and 20.5% of C1 with p-main of 7.5 × 10–10, 9.6 × 10–8, and 3.8 × 10–5, respectively, under a sample size of 10,000. For the C3-cluster under the sample size 10,000 (Fig. 2c), the FIRs1k were 23.8%, 19.5%, and 8.2% for C3 with p-main of 2.1 × 10–7, 5.0 × 10–6, and 4.1 × 10–4, respectively. A similar FIRs1k trend can be observed in other clusters, as shown in Fig. 2.All FIRs1k results listed in Supplementary Table S1 were summarized in Fig. 3 with 72 FIRs1k results by the hub SNP’s p-main and the 2 significance rules. Each data point represented the results of a cluster with 6 null pairs (such as C1H–N1, C1H–N2, and C1H–N6). The FIRs1k of null SNPs in a cluster were positively associated with p-main values of the hub SNP. The smaller values of p-main, which equals the larger value of − log10 (p-main), the higher the FIRs1k, and the 3pRule approach can reduce FIRs1k compared with the conventional 1pRule. Moreover, FIRs1k were also affected by the MAF status of the peripheral SNPs. As shown in Fig. 4, peripheral SNPs with a low MAF tended to have higher FIRs1k than those with a large MAF. Using the C1H (C1 with high significance, p-main = 1.4 × 10–8) under a sample size of 10,000 as an example, the FIRs1k for peripheral SNPs with MAF values of 0.05, 0.1, 0.2, 0.3, 0.4, and 0.5 were 51.2%, 43.0%, 28.2%, 21.6%, 17.2% and 15.8%, respectively. This means that C1H with peripheral SNPs with a 0.05 MAF had 51.2% chance of being false positive, and the false positive chance was reduced to 15.8% when peripheral SNPs with a large MAF of 0.5. Similar trends can be observed for other conditions (Fig. 4).Figure 3False identification rates (FIRs1k) by the hub SNP’s main effect (p-main) and the 2 significance rules. Results were based on the 1000 simulation runs of the 72 clusters and their hub SNPs. Two Significance rules: 1pRule: p-pair < 2.7 × 10–7 ; 3pRule: p-pair < 2.7 × 10–7 and p-pair < p-main for SNP1, and p-pair < p-main for SNP2.Figure 4False identification rates (FIRs1k) for 8 sets of SNP–SNP interaction clusters based on 3pRule and 1000 runs. Each set had 3 clusters with a hub SNP with various significance levels, such as C1H, C1M, and C1L for C1 SNP with a high, medium, and low significance level, respectively. Sample size: 20 K (n = 20,000), 10 K (n = 10,000), and 5 K (n = 5000).For Part 1, the TIRS1K for the 12 pairs under the 3 sample sizes (n = 20,000, 10,000, and 5000) based on 1000 simulation runs are listed in Supplementary Fig. S1. As shown in Fig. 5 and Supplementary Fig. S1, the TIRS1K for 3pRule was lower but similar to those for 1pRule under the same condition in general. The 3pRule has more stringent criteria to define significance than the 1pRule. Therefore, we can expect that the TIRS1K for 3pRule is lower than for 1pRule. For example, the TIRS1K for the C1–C2 pair with the highly significant interaction (p-pair = 4.5 × 10–18) were 96.5% vs. 90.9% based on 1000 simulation runs by using 1pRule vs. 3pRule, respectively, under a sample size of 20,000. For the C3–C4 pairs with a highly significant interaction (p-pair = 3.9 × 10–13), their TIRS1K were 99.8% vs. 99.1% using 1pRule vs. 3pRule, respectively, under a sample size of 20,000. For the effect of sample size, TIRS1K was higher for a large sample size. For example, the TIRS1K for the C1–C2 pair with a high significance of interaction were 90.9%, 82.7%, and 58.7% by using 3pRule under a sample size of 20,000, 10,000, and 5000, respectively. As expected, the significance level of the interaction also decreased as the sample size decreased. For example, the p-pair values for the C1–C2 pair with a high significance of interaction were 4.5 × 10–18, 7.5 × 10–10, and 9.0 × 10–6 under a sample size of 20,000, 10,000, and 5000, respectively (Supplementary Table S1). We were interested in further evaluating the relationship between TIRS1K for causal pairs and the p-main of their most significant composite SNP. Furthermore, all TIRs1k results for the 36 conditions by the 2 significance rules were summarized in Fig. 5. Each data point represented the results of a causal pair (such as C1H–C2H). Figure 5 shows a positive relationship between TIRS1K and the p-main values of the most significant composite SNP. In addition, the TIRs for 3pRule are lower but similar to the TIRS1K of 1pRule. In summary of Part 1, 3pRule can effectively reduce FIRs1k and maintain TIRS1K compared to 1pRule for detecting SNP–SNP interactions.Figure 5True identification rates (TIRs1k) by the hub SNP’s main effect (p-main) and the 2 significance rules. In this plot, the most significant SNP in a causal pair was used as a hub. Results were based on the 1000 simulation runs of the 36 clusters and their hub SNPs. Two Significance rules: 1pRule: p-pair < 2.7 × 10–7; 3pRule: p-pair < 2.7 × 10–7 and p-pair < p-main for SNP1, and p-pair < p-main for SNP2.Part 2: hybrid studyFor the dataset of 614 SNPs in Part 2, 7 causal pairs (C–C pairs) are only 0.004% of the total 188,191 pairs, so identifying these 7 causal pairs and keeping low FPRs is a challenge. All 7 SNP pairs were significant based on the Bonferroni criterion (p-pair < 2.7 × 10–7), and the range of their p-pair value was 5.7 × 10–18 (rs17632542–rs4783709) to 3.4 × 10–9 (rs266876–rs9521694). Interestingly, each causal pair had at least 1 SNP with a significant main effect. Table 1 showed that SNP pairs with a low p-main of the composite SNP tended to be more significantly associated with the outcome. For further testing, we tested correlations between the p-pair and p-main values for the powerful SIPI approach and the conventional AA-full model approach. Among the 91 pairwise interactions based on the 14 observed SNPs, a significant positive correlation was observed between the p-pair and p-main of the most significant composite SNP for the SIPI approach (Spearman r = 0.73, p < 0.001) but not for the AA-full approach (Spearman r = 0.17, p = 0.118). This demonstrated that the high correlations between p-pair and corresponding p-main values only exist in the SIPI but not in the low-power AA-full approach.Among the 14 clusters with a hub SNP involved in the 7 C–C pairs, FPRcluster were 0% for the 6 SNPs with a p-main ≥ 1 × 10–4 and various MAFs of 0.14–0.44. For the remaining 8 SNPs with a p-main < 1 × 10–4, the FPRcluster for these 8 clusters were summarized in Table 2. The following FPRcluster discussions are primarily based on 3pRule. We observed that the pairs with a hub SNP with an insignificant main effect had 0% FPRcluster despite the hub SNP’s MAF. The clusters tended to have a high FPRcluster if the hub SNP had a significant p-main and a large MAF. Mainly, rs4802755 had a significant effect (p-main = 1.8 × 10–7) with a large MAF (0.46), so its cluster yielded the highest FPRcluster (38.7%) compared with other clusters. In contrast, rs17632542 had the most significant main effect (p-main = 2.2 × 10–15) but had a low MAF of 0.06. Therefore, its FPRcluster is 15.0%, which is lower than the FPRcluster of the rs4802755 cluster. For the pair of rs2271095-rs7446, both SNPs had a significant main effect. rs7446 and rs2271095 had similar MAFs of 35% and 31%, respectively. rs2271095 (p-main = 2.0 × 10–6) had a more significant main effect than rs7446 (p-main = 2.0 × 10–5), so rs2271095 had a higher FPRcluster than rs7446 (FPRcluster = 2% vs. 0.3%). In addition, these cluster effects could be observed in Supplementary Fig. S2a with 4 obvious clusters with a small p-pair. The most significant cluster is the rs17632542 cluster, followed by rs2569735, which had the same order of their p-pair and p-main values.Table 2 Cluster-level false positive rates (FPRcluster) in a SNP-pair cluster by 2 significance rules and minor allele frequency (MAF) of peripheral null SNPs.Among the 14 SNPs in the 7 causal pairs, 8 SNPs with a significant main effect formed a cluster (Table 2). For demonstration of the C-N pairs, we randomly selected one null SNP from the 6 MAF groups (MAF = 0.05, 0.1, 0.2, 0.3, 0.4, and 0.5). For each of these 8 SNPs, the results of 6 C-N pairs were shown in Supplementary Table S2. As we can see, all 6 null SNPs were not significantly associated with the outcome (p-main = 0.212–0.746). Under the same hub SNP, the p-pair of a C-N pair was reduced as the MAF of the null SNP was reduced. For a cluster with rs17632542 as a hub SNP, p-pair values were 0.455, 7.7 × 10–14, and 1.1 × 10–15 for a null SNP with a MAF of 0.5, 0.3, and 0.05, respectively. Furthermore, the results of the 600 null SNPs and these 8 hub SNPs were summarized in Table 2. For the clusters with a hub SNP with a p-main < 2.7 × 10–7, such as rs17632542, rs2569735, rs1058205, and rs4802755, the FPR range was 39.8%-79.5% for 1pRule and 15.0–38.7% for 3pRule. For the clusters with a hub SNP with a p-main > 2.7 × 10–7, the FPR range was 0–7.8%, the same for 1pRule and 3pRule. Consistent with Part 1, the hybrid study results (Part 2) confirm that 3pRule resulted in a noticeably lower FPR than 1pRule. The reduction in FPR by 3pRule compared with 1pRule was − 77% (from 66.5 to 15%) for the cluster of rs17632542, − 76% (from 79.5 to 19.2%) for the cluster of rs2569735, − 54% for the cluster of rs1058205 and − 20% for the cluster of rs4802755. These results demonstrated that 3pRule could effectively reduce FPR, especially for the top pairs. In summary, the magnitude of FPRs depends on the significance of this cut-point of p-pair. The FPRs for the C–N pairs tended to be high when the hub SNP had a small p-main, especially its p-main less than the criterion defining the significance of the SNP–SNP interactions (p-pair < 2.7 × 10–7). To identify the causes of the cluster effects for SNP–SNP interactions, we first tested the LD status between the hub SNP and significant null SNPs. Among the 4 large clusters with a KLK3 SNP as a hub (rs17632542, rs2569735, rs1058205, and rs4802755), there are 90, 115, 111, and 232 significant null pairs in these 4 clusters. The LD r2 between each of these null SNPs and its corresponding hub SNP was close to 0 (range = 0–0.0004). For these 4 KLK3 clusters, the pairwise LD r2 among the null SNPs in the same cluster were also close to 0 (range = 0–0.001, Suppl. Table S4). Thus, we can conclude that LD among the involved SNPs is not the reason for the cluster effect of SNP–SNP interactions. Next, we evaluated whether the significant null pairs in a cluster were highly correlated with the causal pair in the same cluster (such as C1–N1 and C1–N2 correlated with C1–C2). The results showed that null peripheral SNPs with a small MAF tended to be highly correlated with the causal pair to cause false positivity. As shown in Table 2, the null SNP with a smaller MAF tended to have a higher FPR than those with a larger MAF. For example, FPR values for the rs4802755 cluster decreased from 80 to 6% as the MAF of null SNPs increased from 5 to 50%. The mean correlations between rs4802755-rs4473378 and significant null pairs involved with the hub SNP of rs4802755 demonstrated a decreasing trend (r = 0.88 to 0.64) as MAF of null SNPs went up from 5 to 50%. Similar trends can be observed for other clusters of rs17632542, rs2569735, and rs1058205. Finally, we also tested correlations among 7 KLK3 SNPs in Table 1. The pairwise LD r2 values among the 7 KLK3 SNPs tested in Part 2 were not in a strong LD (all r2 < 0.8, n a range of 0.03–0.77). All r2 values were less than 0.6 except rs2569735 and rs1058205 (r2 = 0.77).For evaluating the performance of the bootstrap + 3pRule approach, the related TPR results are summarized in Table 3. The TPR results corresponding to all the causal pairs in Table 3 revealed that the TPR values under 1pRule and 3pRule are very similar. In Table 3, all 7 causal pairs had > 75% TPR based on the 1pRule and 3pRule approaches. The TPR corresponding to the causal pair of rs17632542–rs4783709 was observed at 95% for 1pRule and 91.8% for 3pRule based on the 500 bootstrap runs. For these 7 causal pairs, these two methods (3pRule and 1pRule) of defining statistical significance for SNP–SNP interaction only varied by 0.2–3.2%. Thus, 3pRule with a more stringent criterion had a similar performance in terms of TPR compared to 1pRule for the causal pairs.Table 3 True positive rates (TPR) by causal SNP pairs based on bootstrap results.The FPR results of the bootstrap + 3pRule approach are shown in Table 4. Although FPR looked small (0.81% for 1pRule and 0.35% for 3pRule) in the original dataset, there were still many false-positive pairs due to a large number of pairwise interaction tests (188,191 pairs): 1538 significant pairs using 1pRule and 672 pairs using 3pRule. This showed that 3pRule could effectively reduce 56% of false-positive findings compared with 1pRule. Moreover, the bootstrap method can dramatically reduce the number of false-positive pairs. Using the bootstrap method under 3pRule, the number of significant pairs was only 86, 48, and 31 when using ≥ 75%, ≥ 80%, and ≥ 85% bootstrap criteria, respectively. By applying the criterion of ≥ 75% bootstrap runs and 3pRule for selecting significant pairs, this approach maintained 100% TPR, but its false-positive findings can be reduced by 95% from 1531 pairs identified using the conventional 1pRule without the bootstrap validation to 79 pairs (overall FPR = 0.82–0.04%). If using a more stringent criterion of ≥ 90% bootstrap datasets and 3pRule, false-positive findings out of the 1531 significant null pairs can be reduced further to 2% (= 26/1531), but TPR is reduced to 71.4% (5 out of the 7 pairs).Table 4 Evaluation of the 3pRule + bootstrap approach for detecting SNP–SNP interactions based on a dataset with 614 SNPs.For N–N pairs, the mean and median of p-main values for the 600 null SNPs were 0.33 and 0.29, respectively. For demonstration, we randomly selected one null SNP from the 6 MAF groups. The p-main and p-pair values of the 15 N–N pairs based on the selected 6 null SNPs were shown in Supplementary Table S3. The p-pair values for the pairwise interactions of these 6 null pairs (6.0 × 10–3 to 0.928) were insignificant. For the 179,700 N–N pairs, FPRs were 0% using both 1pRule and 3pRule with the original and bootstrap datasets (Table 4), and the mean and median p-pair values were 0.13 and 0.07, respectively, with an interquartile range of 0.028–0.147. The significance levels of the 7 C–C pairs and the selected null pairs are shown in Supplementary Fig. S2. As shown in Supplementary Fig. S2a, most of the N–N pairs were less significant than the C-N pairs. As shown in Supplementary Fig. S2b, for the distribution of the 1,797,700 N–N pairs’ significance levels, most of them (99.94%) had a p-pair ≥ 1 × 10–4. This result demonstrated that 1 × 10–4 can be used as the cut-point to select promising SNP–SNP interaction pairs. This is also why we used p-pair < 1 × 10–4 in Part 1. Because of the insignificance of N–N pairs, the exclusion of N–N pairs for SNP–SNP interaction detection can be used as a strategy of variable reduction.

Hot Topics

Related Articles