Decoding pulsatile patterns of cerebrospinal fluid dynamics through enhancing interpretability in machine learning

After experts’ annotation of the CSF lumen in PC-MRI frames, the velocity and acceleration features from each phase frame were extracted. Following the training of ML models, performance evaluation and interpretability analysis were conducted through the computation of accuracy metrics and SHAP values for the models’ predictions.The training data consisted of 1824 images after interpolation of frames determined through fivefold cross-validation conducted on the training dataset. Pulsatile features, including velocity and acceleration features, were used to segment CSF lumen and obtain a quantity report of CSF fluid flow. To enhance segmentation accuracy, the obtained image underwent erosion and dilation processes employing a 2 × 2 square filter. This was done to eliminate irrelevant areas that were erroneously predicted as ROI. In Fig. 4, a detailed examination of the segmentation result of a participant is displayed, utilizing XGB, LR, and RF ML algorithms. Panel (a) features an original phase MRI image frame, while panel (b) provides the ground truth mask, labeled by expert clinicians. The subsequent panels, (c), (e), and (g), present the sequentially predicted ROIs denoting the CSF lumen, generated by the ML models. To enhance the precision of the predictions, panels (d), (f), and (h) depict eroded and dilated versions of the predicted images, employing a 2 × 2 square filter tailored to each corresponding model. Notably, the application of morphological operations such as erosion and dilation contribute to refining the model predictions, resulting in improved delineation of CSF structures. Panels (k) and (l) further elucidate the success of our models by presenting waveforms illustrating the mean velocity of CSF flow through the labeled and predicted ROIs, respectively.Figure 4presents an illustrative example depicting the segmentation of the CSF lumen in PC-MRI of a 20-year-old female. Panel (a) displays an original phase MRI image frame, while panel (b) displays the ground truth mask labeled by expert annotators. The subsequent panel (c) and (d), exhibit sequentially predicted ROIs representing the CSF lumen by XGB and eroded and dilated versions of the predicted images achieved using a 2 × 2 square filter for each corresponding model. Moreover, panels (e) and (f) present waveforms illustrating the mean velocity of CSF flow through the labeled and predicted ROIs, respectively. These waveforms offer a comparative analysis of the CSF flow characteristics as captured by the ground truth and predicted segmentation results from the different models.Accuracy metricsThe overall precision, recall, and F1 score (or Dice similarity coefficient) were calculated for evaluating the segmentation results of ML models using fivefold cross-validation. By applying the harmonic mean precision calculated as TP(TP + FP) and recall(sensitivity) calculated as TP(TP + FN). These metrics were derived from the four components of the confusion matrix: True Positive (TP), representing successful identification of positive cases; True Negative (TN), indicating correct identification of instances in the negative class; False Positive (FP), denoting instances incorrectly predicted as positive; and False Negative (FN), indicating instances incorrectly predicted as negative. The F1 score metric was frequently employed to quantify the ratio of overlap for both classes. F1 score metric is defined with terms precision and recall such as.$$F1 score= \frac{2*Precision*Recall}{Precision+Recall}$$
(1)
The weighted augmented average (WAA), which is an importance ratio of the classes (TP number of class / total instance number), is used to figure out the accuracy of each class individually37. For each of the metrics calculated using WAA demonstrated following Eq. 12 where \(N\) is sample size, \({V}_{i}\) is calculated evaluation metric for ith class, and \({C}_{i}\) is the number of ith class members;$$WAA= \frac{1}{N}\sum_{i=1}^{n}{V}_{i}x{C}_{i}$$
(2)
Table 2 provides precision, recall, and F1-score metrics for three ML models (LR, RF, XGB) across multiple folds in fivefold cross-validation. The average metrics highlight the robustness of the models, with slight variations in performance across folds. XGB demonstrates the highest average precision, recall, and F1-score among the evaluated models.
Table 2 Precision, recall, and F1-score metrics of ML models (LR, RF, XGB) in 5-fold cross-validation.Receiver Operating Characteristic (ROC) curves and area under curve (AUC) of the LR, RF, and XGB models are represented in Fig. 5. After evaluation in Fig. 5, it is evident that the XGB model outperforms LR and RF models in terms of AUC. Specifically, the mean AUC for XGB across the fivefold cross-validation is 0.958, surpassing RF (mean AUC = 0.944) and LR (mean AUC = 0.684) sequentially.Figure 5The ROC curve and the AUC demonstrating the segmentation ability of CSF lumen in the thoracic SAS by with the mean of fivefold AUC XGB = 0.95,8, RF = 0.94,4, and LR = 0.68,4.Flow metricsTypically, a PC-MRI report includes statistical metrics such as mean, median, and peak CSF flow velocities and stroke volume which is the volume of CSF displaced during each cardiac cycle. The computation of these flow quantities relies on the regions delineated by radiologists. In Fig. 6, we present a visual representation of the comparative metrics in an analysis of CSF flow measurements. The figure provides insight into the dynamic changes in CSF flow velocity across 32 frames, capturing the variations throughout a single cardiac cycle.Figure 6The comparative metrics in an analysis of CSF flow measurements: The figure illustrates the dynamic changes in CSF flow velocity throughout a single cardiac cycle across 32 frames. Each light gray line represents the flow velocity data for an individual pixel within the CSF lumen. The dashed red line represents the average flow rate, the dashed blue and green lines represent the median and peak values, respectively. Vertical gray dashed lines connect the mean flow velocity values to the x-axis, emphasizing temporal changes during the cardiac beat. The shaded areas under the mean curve correspond to the stroke volume, calculated as the integral of the mean flow velocity curve. Each computed area is texted between two mean values. This graph offers a visual depiction of the pattern of CSF flow over time within a single cardiac beat. The metrics calculated in a comparative analysis of CSF flow measurements are visualized and contribute to our understanding of the flow quantities used to evaluate the accuracy of these ML algorithms.In this context, evaluating the accuracy of ML algorithms in producing the flow quantities, a comparative analysis of CSF flow measurements between the labeled and predicted ROIs. As part of the evaluation process, we utilize the Interclass Correlation Coefficient (ICC), a metric that quantifies the degree of approximation between two quantities. The computation of ICCs for stroke volume, average, peak, and median flow velocity parameters across both annotated and predicted regions in a single fold is formulated as follows;$$\frac{\sum_{j=1}^{N}\frac{\sum_{i=1}^{F}\left({L}_{i}-\overline{L }\right)*\left({P}_{i}-\overline{P }\right)}{\sqrt{\sum_{i=1}^{F}{\left({L}_{i}-\overline{L }\right)}^{2}}*\sqrt{\sum_{i=1}^{F}{\left({P}_{i}-\overline{P }\right)}^{2}}}}{N}$$
(3)
where \(N\) is the samples count of each fold, \(F\) is the frame number of each PC-MRI slab, \({L}_{i}=\text{1,2},3,\dots i\) and \({P}_{i}=\text{1,2},3,\dots i\) are one cardiac cycle frame’s values (mean, peak and median) in the labeled ROI and predicted ROI sequentially.Table 1 presents the first fold cross-validation results of LR, RF, and XGB models. The table includes the stroke volume, mean, peak, and median ICC velocity values of the flow quantities through the labeled regions by experts and predicted regions by ML models to measure the consistency and agreement between the predicted and actual values. To enhance the interpretation of the results, Fig. 7 offers visual representations in the form of box plots for the stroke volume, mean, peak, and median ICC values. These visualizations aid in better understanding the distribution and variability of the data. XGB demonstrates the highest stroke and mean ICC of 0.7899 and 0.7749 sequentially, followed by RF with 0.7794 and 0.7628 and LR with 0.7448 and 0.7592. Due to their dependence on a single data point, peak values are highly susceptible to noise, potentially leading to less reliable correlation estimates. Consequently, ML models with lower recall exhibit lower peak CSF flow correlations. The LR model with the lowest recall value had the lowest peak correlation. Consequently, ML models with lower recall values tend to show diminished peak correlations in CSF flow. Specifically, the LR model, having the lowest recall value, exhibited the weakest peak correlation. Thus, while the stroke, mean, and median correlations consistently reveal a robust positive correlation, the peak correlation, influenced by its sensitivity to individual data points, exhibits a weaker relation.Figure 7Box plots of ICC quantities by 0-Fold.Explainability metricsXGB offers 3 types of feature importance assessments; frequency-based, gain-based, and coverage-based importance which provides a global importance value for each feature. Figure 8 shows the feature importance analysis of XGB model. The features vel_4, vel_6, acc_6, acc_28, and vel_1 in Fig. 8A are the five most consistently utilized features across all stages of DTs in the boosting process. This frequent usage suggests that these features play a significant role in shaping the final predictions. In part Fig. 8B, vel_6, acc_5, acc_7, acc_6 and vel_4 are the five most highlighted as the primary contributors to enhancing the model’s accuracy. These features demonstrate substantial gain values, indicating that their inclusion significantly improves the model’s ability to reduce prediction errors. In Fig. 8C, vel_6, acc_7, acc_5, acc_6, and vel_4 are the most five frequently chosen features for constructing splits within DTs suggesting their widespread presence in shaping the internal structure of the model.Figure 8XGB feature importance graphs; part (A) weight feature importance, part (B) gain feature importance, part (C) cover feature importance.While traditional feature importance methods usually provide an independent global importance value for each feature, SHAP values consider interactions between features and provide more accurate insights into feature importance. It provides localized insights into the probability that a pixel within the PC-MRI image resides within the CSF lumen. The global mean SHAP explanatory diagram of 0-fold XGB model is presented in Fig. 9. The features vel_6, vel_4, acc_7, acc_6, and acc_5 have the highest 5 SHAP values sequentially. In this case, each feature is of continuous nature and is arranged vertically based on its average influence on the prediction outcomes. It becomes evident that during the initial quarter of the cardiac cycle, the greatest influence on predictions for the CSF region is observed.Figure 9The global SHAP explanatory diagram.In addition to the global values, we also calculated local SHAP values of just one sample of the dataset to assess the significance of the contribution of each pulsatile feature to predictions on a PC-MRI slab. Thus, providing a deeper insight into the behavior of the XGB model in predicting pixels within the CSF lumen using SHAP. Prior to exploring into the local SHAP results, it is illustrative to direct our attention to Fig. 10 presented below, thereby to elaborate on the details of one sample where we calculated SHAP values. In this manner, we visualized velocity and acceleration flows of a sample in which we calculated SHAP values, specifically passing through the CSF lumen (to enhance the clarity of visualization by simplifying the complexity of the flow). On the visualization, we indicated the top 5 values of global SHAP importance using a vertical red line.Figure 10The visualization of flow velocity and acceleration values of a sample’s frames sequence. The top 5 values(vel_6, vel_4, of global SHAP importance are pointed with vertical red lines.Figure 11 plots provide insights into how each feature affects the model’s predictions on a 16-year-old male sample. The left plot in Fig. 11 visually represents the contribution of each feature to the models’ predictions on a pixel-by-pixel basis in the PC-MRI phase images. It is observable that features with the highest impact exhibit SHAP values extending to the right, indicating a positive contribution to the prediction of CSF lumen. Notably, the acc_19 feature, displaying a significant spread and high impact, appears to have outliers. This observation suggests the presence of outlier values, which could potentially lead to deviations from the model’s prediction norms and should be taken into consideration. The right plot in Fig. 11, the waterfall graph, reveals that all features have positive SHAP values compatible with the left graph. The trend shows that as the value of the top 15 most key features increases, the XGB model’s prediction of CSF lumen also increases. It is evident that vel_6, possessing the highest absolute SHAP value of 0.13, holds the greatest influence, just like its average effect on all predictions.Figure 11Left: Local SHAP scores for XGB model displayed as a bee diagram. Scores are shown for each feature for 0-folds. As shown in the ‘feature value’ legend—a high value is indicated in red, and a low value is indicated by blue; for binary variables this means red indicates a value of 1 (i.e., CSF lumen) and blue indicates a value of 0 (i.e., not CSF lumen). Right: A SHAP waterfall plot illustrating the feature-level contributions to a PC-MRI image pixel prediction made by the model. The graph depicts the journey from the baseline prediction (starting point) to the final prediction for the given phase image. Each feature is with upward steps indicating positive contributions.

Hot Topics

Related Articles