A comparative analysis of mutual information methods for pairwise relationship detection in metagenomic data | BMC Bioinformatics

Mutual information provides meaningful results on identifying associated pairsEach mutual information estimator in this study has the property that its magnitude (or score) is proportional to the strength of the dependency it measures–i.e., a higher score corresponds to a stronger pairwise association. Therefore, the task of detecting relationships can be viewed as a classification problem where pairs with null relationships are assigned a score of low magnitude while pairs with dependent relationships are assigned a score of high magnitude. Here, we treat each metric as a binary classifier and assess its classification ability on single-actor (amensal, commensal) and dual-actor (exploitative) asymmetric relationships. Receiver operating characteristic (ROC) curves were constructed for each metric using an equal number of dependent relationship pairs and null relationship pairs. Table 1 shows the AUCs (and corresponding 95% confidence intervals) of all mutual information estimators when detecting each relationship type, based on count table data generated from various prior distributions.Table 1 Areas under the ROC curves (AUCs) for each metric under exploitative, commensal, and amensal relationshipsTable 1 AUC results shown are for tools tested on TMM [41] normalized data (n = 50) across several prior distributional assumptions under exploitative, commensal, and amensal relationships. Best results for each prior distribution by relationship type are indicated by bolded font. Bootstrapping was used to produce 95% confidence intervals. It was ensured that a 1:1 parity existed between positive and negative (dependent and null) examples – in particular, each bootstrapped test consisted of 100 dependent and 100 null pairs. For the KSG, LNC, and naïve partitioning methods, only the results from the best performing parameters are included, whereby performance was assessed by the average AUC across all distributions (best parameters are provided in parentheses).Table 1 shows that the highest performance was observed in commensal interactions, with an average AUC of 0.905 across all methods and prior distributions. This was followed by exploitative interactions (average AUC of 0.782) and amensal interactions (average AUC of 0.748). For commensal interactions, AUCs ranged from 0.806 to 0.977, indicating robust discriminatory power across all distributions. This high performance is expected, as commensal relationships exhibit the least deviation from linearity among all tested relationships. Conversely, all methods showed a decline in performance for exploitative and amensal interactions. AUCs for exploitative relationships ranged from 0.606 to 0.941, while for amensal relationships, they ranged from 0.467 to 0.870, indicating a significant decrease in performance compared to commensal interactions.When reviewing method-specific performances, the two machine learning estimators using KL-divergence in their formulation, MINE and NWJ, outperformed the other approaches. Across all prior distributions and relationship types, MINE achieved an average AUC of 0.870, while NWJ closely followed with 0.843. Following them in performance were LNC, KSG, DoE, MIC, and the partitioning approach with average AUCs of 0.836, 0.832, 0.797, 0.762, and 0.742, respectively, across all settings. For individual relationship types, MINE was the best approach for both exploitative and commensal interactions, achieving average AUCs of 0.864 and 0.946, respectively. NWJ was the best approach for amensal interactions with an average AUC of 0.852 across all prior distributions. In contrast, DoE, MIC and the naïve partitioning were the worst performing methods, all yielding average AUCs below 0.8 across all relationship types. Outside of the commensal setting, DoE and MIC achieved AUCs above 0.8 only in the case of exploitative relationships with a log-normal prior, while the naïve partitioning approach did not surpass an AUC of 0.8 in any exploitative or amensal scenario. Furthermore, DoE registered sub-0.7 AUCs in three of the fifteen possible combinations of relationship types and prior count distributions, while MIC and naïve partitioning each produced four of such instances. This suggests that purely distributional or grid-based mutual information estimators have limitations and may not be ideal in the context of studying ecological relationships.It should be noted that results shown for the machine learning-based tools (MINE, NWJ, DoE) were based on scores generated by neural networks that were not tuned for hyperparameters, so their performance is likely sub-optimal. Because each unique interaction pair requires its own independent network, tuning for hyperparameters would result in a drastic increase of the already significant computational time and resources. However, despite the trade-offs between model optimization and performance, MINE and NWJ outperformed all other metrics in over half of distribution \(\times\) relationship type settings, with 12 of the 15 best results coming from one of these two approaches. When DoE is included, the machine learning-based tools produced 13 of the 15 best results.Mutual information estimators can reveal associations not detected by conventional correlation measuresIn the following sections, we analyze the performance of each metric as a statistical test where the null hypothesis (H0) is that a pair of variables is independent, and the alternative hypothesis (H1) is that a pair of variables share a dependent relationship. This is done to enable the direct comparison amongst the MI estimators and with their traditional correlation counterparts. The significance of each pair’s interaction (i.e., its p-value) was determined empirically by permutation and subsequently corrected for multiple testing using the Benjamini–Hochberg procedure [42] (Methods).Figure 2 displays the true positive rates (TPRs) of each metric when applied to simulated count tables; box and whiskers are constructed from 1,000 bootstrapped runs where true ecological relationships are tested against an equal number of null relationships. Immediately, it can be seen that all metrics are much more consistent at detecting commensal relationships than exploitative or amensal ones–in every case, commensal relationships were consistently assigned the highest TPRs regardless of prior distribution. For the correlation measures (Pearson’s correlation coefficient and Spearman’s rank correlation coefficient), amensal relationships were detected at a higher rate than exploitative ones, and for the MI estimators, exploitative relationships are generally better detected than amensal ones. It can also be observed that methods generally perform worse when analyzing data from discrete prior count distributions. This trend is broken in the case of the traditional correlation methods where an increase in performance for discrete distributions over continuous ones is observed (Fig. 2B, C). Interestingly, in the case of commensal relationships where conditions are closest to linearity, conventional correlation measures were only able to produce results on par with the worst performing mutual information estimators (Fig. 2B). However, in the case of the most difficult to detect interactions, amensal relationships, the conventional measures, namely the Spearman rank correlation coefficient, produced the best results (Fig. 2C) although the TPRs are low for all methods tested.Fig. 2The true positive rate (TPR) of different methods for detecting (A) exploitative, (B) commensal, and (C) amensal relationships based on different prior distributions. Results for log-normal, exponential, negative binomial, gamma, and beta negative binomial distributed data are distinguished by blue, light orange, green, dark orange, and pink boxplots respectively. TPR values are collected from 1,000 bootstrapped samples of true and null pairwise interactions. Results are separated on the x-axis by method. Boxplots were constructed using results from 1,000 bootstrapped iterations where the TPR was calculated after randomly sampling (with replacement) 100 true positive pairwise relationships and 100 null relationships. Results are shown for data that was TMM normalized and p-values that were corrected using the Benjamini–Hochberg procedureEffects of normalization techniques and choice of multiple test correction on performanceWhen analyzing metagenomic data, choice of prior count distribution and normalization technique is imperative to the quality of results [43]. All data used for analyses up to this point have been TMM normalized and resulting p-values have been corrected using the Benjamini–Hochberg false discovery rate correction. While these approaches are common in the field, they are not unilaterally considered the “best” normalization method or the “best” way to perform multiple testing correction. To this end, we explored the effects of different combinations of normalization approaches and multiple testing correction methods on the task of interaction detection.Figure 3 displays how TMM [41], RLE [44], and TSS (total sum scaling) normalizations impacted the ability of each method to (A) detect true pairwise interactions (TPR–true positive rate) and (B) avoid false detection of null ones (FDR–false discovery rate) for exploitative relationships. The results suggest that underlying count distribution plays a much more significant role in the detection ability of each method than choice of normalization (Fig. 3A). All approaches (with the exception of MINE and NWJ) showed stability in TPR across the three normalization procedures–the differences in performance seen in the figures can be primarily attributed to varying distributions. Figure 3B shows that the effect of normalization type and count distribution on the FDR of each method generally reflected those seen with the TPR. These results are also reflected in commensal and amensal relationship cases (Additional file 2: Figure S1, S2).Fig. 3Effects of normalization and distribution for each method on (A) TPR and (B) FDR for exploitative relationships. Generally, normalization does not impact results as much as data distribution. Two of the machine learning methods (MINE and NWJ) are exceptions to this, as restricting their input to TSS normalized data renders them uninformativeAddressing the multiple testing problem is an important step in supporting the statistical validity of any p-value. Due to its general acceptance as a standard, the bulk of this analysis has relied on the Benjamini–Hochberg procedure’s false discovery rate correction. However, there is no hard rule that requires this to be the correction method of choice. Figures 4 and 5 display the TPRs and FDRs, respectively, of each method for various significance thresholds in the context of exploitative relationships when using Benjamini–Hochberg procedure corrected p-values, Bonferroni corrected p-values, and q-values [62, 63]. Of the correction methods that produced viable results, the Benjamini–Hochberg procedure proved to be the more conservative approach (Fig. 4, blue). Replacing p-values with q-values consistently improved TPR albeit a small to moderate increase in the FDR; this held true for all metrics and for both empirical and parametric approaches of determining q-values (Fig. 5, green and red). On the other side of the spectrum, an overly conservative approach like the Bonferroni correction (Figs. 4, 5, orange) can lead to a complete loss of detection ability (TPR of 0.0 across all methods).Fig. 4True positive rates (TPRs) for varying significance thresholds using the Benjamini–Hochberg procedure (blue), Bonferroni (orange), empirical q-values (green), and parametric q-values (red). Both empirical and parametric q-value approaches produce a higher TPR for the same significance threshold than the Benjamini–Hochberg procedureFig. 5Respective false discovery rates for the data presented in Fig. 4. Both empirical (green) and parametric (dark orange) q-value approaches usually result in a slight increase in FDR for the same significance threshold than the Benjamini–Hochberg procedure (blue). The shaded blue regions in each plot correspond to FDR values at or below each significance thresholdData sparsity reduces performance, but the effects are mitigated by increasing sample sizeA final analysis on simulated data was interested in the effect of zero-inflated counts on detection ability. It is well established that biological count data is often of this form–approaches have been developed to address the issues sparsity introduces as they have been shown to have a profound impact on model performance [45,46,47]. We model zero-inflated data by subtracting the mean entry of each generated count table from itself, setting any negative values to zero. Here we return to treating the detection of pairwise relationships as a classification problem and assess the AUC of each metric as a binary classifier. As expected, performance drops for all tools when zeros are inflated (Table 2, top). However, increasing the number of samples used in estimation (Table 2, bottom) restores the effectiveness of nearly every approach along with some gains in performance. When augmenting the sample size of zero inflated tables from 50 to 200, MINE, NWJ, KSG, and naïve partitioning saw the largest increases in AUC at 0.155, 0.147, 0.153, and 0.163, respectively, DoE and LNC saw more moderate increases in AUC at 0.112 and 0.130, respectively, and MIC saw the smallest increase in AUC at 0.098 across prior distributions.Table 2 AUCs for each metric under exploitative relationships with zero inflated countsTable 2 AUC results for tools tested on zero inflated, TMM normalized data across several count prior distributions with n = 50 samples (top) and n = 200 samples (bottom) for the exploitative relationship. Similar results for the commensal and amensal cases are provided in Additional file 1: Tables S1 and S2.Application of mutual information estimators in the study of C. diff infectionTo explore mutual information estimators in the real data setting, we applied several of the aforementioned metrics to a publicly available dataset originating from a study on the dynamics of the microbiome following treatment of recurrent C. diff infection (CDI) [48]. This dataset contains 16S rRNA profiles of the microbiomes of 38 CDI patients and their respective treatment donors. Sequencing data was retrieved from NCBI using the accession PRJEB19232 and processed to yield genus-level counts (Methods).MINE, MIC, and KSG were chosen as representative MI estimators for further analysis due to their performances on simulated data. Using only CDI sample data as input, we restricted analyses to pairwise combinations amongst the 30 most abundant genera on average across those samples. Significance for each pair of genera was determined independently of other pairs using permutation (Additional file 3). Figure 6A shows the concurrence among the 20 most significant interaction pairs from each MI estimator for CDI patients. We observe that a large majority of each method’s top-ranking pairs are unique, with no pairs being shared amongst all three approaches. When compared against the correlation approaches, we find that MINE shares only 3 of its 20 highly ranked pairs with each of the Pearson and Spearman correlation coefficients while MIC (sharing 9/20 and 11/20 with Pearson’s and Spearman’s coefficients, respectively) and KSG (sharing 7/20 and 10/20 with Pearson’s and Spearman’s coefficients, respectively) displays much more overlap. Taking the top pairs from all three MI estimators, we see that their superset covers two-thirds (19/28) of the pairs identified by either the Pearson or Spearman correlation coefficients (Figs. 6B, 7).Fig. 6Venn diagrams detailing overlap of significant relationships found in the CDI dataset (A) between MI estimators and (B) between MI estimators and correlation measures for the case group. Only the top 20 most significant pairs of each metric are used in the construction of each diagram. (C, D, E) Scatter plots and accompanying density estimations for various relationships found by MI estimators. In each case, there is evidence of an exploitative interaction type, supported by the simultaneous shift of one genus to larger abundances (Enterobacter, Lactobacillus, Escherichia-Shigella) and the other to smaller abundances (Bacteroides, Bifidobacterium, Romboutsia) when comparing controls (blue) to cases (red). Abundance data is plotted after a \(\text{log}\left(x+1\right)\) transformFig. 7A Flowchart of the data simulation technique. (1) A \(d\times d\) target covariance matrix σ with diagonal elements equal to one and off-diagonal elements equal to zero is generated. (2) Using the target covariance matrix, \(n\) \(d\)-dimensional multivariate normal vectors with mean zero and covariance matrix σ are drawn resulting in an \(n\times d\) matrix. (3) Their values transformed into quantiles using the standard normal cumulative distribution function. (4) One of five marginal distributions are imparted on each of the \(d\) columns by applying the chosen distribution’s inverse cumulative distribution function. (5) Various interaction relationships (exploitative, commensal, and amensal) are introduced between random pairs of columns (representing microbes), producing a final table that simulates an ecological environment in the context of this study. B Description of each marginal distribution used in this study. The parameters of each distribution were randomly selected from ranges that resulted in each distribution having a comparable mean, \(\mu\), and standard deviation, \(\sigma\)While these results speak to the utility MI estimators have in linear settings, we are primarily concerned with their application in non-linear settings. For this, we searched for instances where interaction pairs were highly ranked by MI estimators but deemed insignificant by both correlation methods. Scatter plots and accompanying density estimation curves are presented for three of such cases (Fig. 6C-E), all of which can be described by the exploitative ecological relationship. For example, Fig. 6C displays the relationship between Bacteroides and Enterobacter. Multiple species of Bacteroides have been implicated in contributing to a healthy microbiome [49]. Specifically, bacteria from this genus have bile salt hydrolase, allowing them to hydrolyze bile acids (BAs) resulting in host benefits through BA signaling [50]. Proliferation of Enterobacteriaceae has been linked to increased gut inflammation [51] and more aggressive diagnoses of ulcerative colitis [52] as well as Crohn’s disease [53]. Past research has demonstrated that members of Enterobacteriaceae fare particularly well in microbiotas subject to BA dysmetabolism [54]. When examining the plot of CDI patient data alone, there is no obvious interaction taking place between the two genera. However, when control data is included in the plot, one can see a clear directional shift in the distributions of each genus. Reliance on correlation alone would result in this relationship not being detected (p-values of 0.5788 and 0.4431 for Pearson’s and Spearman’s correlations, respectively); however, all three MI estimators denoted this as a significant relationship (p-values of 0.01607, 0.01198, and 0.001996 for MINE, MIC, and KSG, respectively.

Hot Topics

Related Articles