Boundary-aware convolutional attention network for liver segmentation in ultrasound images

We have selected several excellent image segmentation algorithms for comparative experiments, including UNet9, RDeeplab42, TransUNet15, UNeXt43, LWBNAUNet44, and SegNeXt35. RDeeplab is an improved structure based on Deeplabv3+ with ResNet18 as the backbone. Its core component is a multi-scale feature extraction module called Atrous Spatial Pyramid Pooling (ASPP). SegNeXt is the first image segmentation network to utilize the convolutional attention mechanism. The other models mentioned are variations or improvements upon UNet. This section evaluates the segmentation performance of all models from both quantitative and qualitative perspectives, and discusses in depth their differences in liver segmentation tasks. It is important to emphasize that, to ensure a fair comparison of the models’ performance, we did not use pre-trained weights from other datasets to fine-tune the models, but instead trained all models from scratch.Quantitative analysis of Dataset AThis section presents the quantitative analysis results of all experimental models on Dataset A, including an analysis of the training curves and the prediction performance on the test set. Furthermore, we conducted a fourfold cross-validation experiment on BACANet and reported the results for each fold.Figure 5 illustrates the average DSC changes of all models on the validation set over 50 epochs. Under the same number of training epochs, we can observe differences in the generalization abilities of different models on the validation set. The validation DSC of UNet exhibits the most significant variation, starting at a relatively low value in the early stage of training and experiencing a sharp decline in the mid stage. This indicates that the simple stacking of convolutional modules may struggle to capture the essential features of the liver region. In contrast, other models achieve a validation DSC above 0.75 in the early stage of training and demonstrate more stable fluctuations. RDeeplab extract rich multi-scale features through the powerful ASPP module, which enhanced the expression of liver features, peaking in the 34th epoch. TransUNet introduces the Transformer block for global dependency modeling of feature maps, achieving a peak value of 0.8835 in the 23rd epoch. SegNeXt and LWBNAUNet enhance key features through carefully designed attention mechanism, achieving their optimal values of 0.8870 and 0.8932 in the 5th and 34th epochs, respectively. UNeXt enriches feature representation through tokenized multi-layer perceptron and outperforms comparative models, reaching its highest value of 0.8971 in the 15th epoch. Finally, BACANet, by incorporating explicit boundary supervision and effective convolutional attention mechanism, is able to further capture the detailed information of liver regions and boundaries. Consequently, it achieved a validation DSC surpassing all comparative models, reaching an optimal value of 0.9207 in the 30th epoch. This indicates that BACANet possesses enhanced learning and generalization capabilities, enabling it to effectively understand the characteristics of liver US images and perform precise segmentation of liver anatomical areas.Fig. 5Validation DSC over epochs for all models on Dataset A.Table 2 shows the best predictive results of all models on the validation set of Dataset A, with each model achieving a DSC above 0.86. Further analysis reveals that UNet has a low TPR of 0.854, which suggests that it is deficient in understanding liver region features, resulting in some liver pixels being incorrectly predicted as background. TransUNet, RDeeplab, LWBNAUNet, UNeXt, and SegNeXt exhibit DSC and IOU around 0.89 and 0.80, respectively, indicating their strong perceptual ability for locating the liver body region. However, the higher ASD suggests limitations in their ability to refine the boundaries of the liver. DensePSPUNet was the first model to be experimented on dataset A. Under the same training and validation set conditions, it achieved DSC and IOU of 0.913 and 0.841, respectively, outperforming all comparative models. Nevertheless, BACANet exceled in overall performance, achieving a DSC of 0.921, an IOU of 0.854, and an ASD of 3.783, proving its effectiveness in refining liver boundary details. Additionally, BACANet achieved excellent PPV, TNR, and TPR of 0.920, 0.976, and 0.924 respectively, demonstrating its ability to effectively differentiate between liver regions and the background, achieving precise segmentation in liver US images.
Table 2 Best validation results for all models on Dataset A, bolded text represents the best indicators.Dataset A comprised only eight volunteers, and the validation dataset included data from just two individuals, potentially limiting the comprehensive assessment of the model’s performance. To fully utilize all experimental data and holistically evaluate the model, a fourfold cross-validation method was employed. Table 3 shows the amount of training and validation image data for each fold of the experiment, along with the average performance results of BACANet on the validation dataset. The results indicate that BACANet consistently achieved a DSC above 0.91 and an IOU above 0.84 in each fold. The best performance metrics reached were a DSC of 0.941 and an IOU of 0.889. The average DSC and IOU from the fourfold cross-validation were 0.925 and 0.862, respectively, with standard deviations of 0.013 and 0.023. This demonstrates the model’s stable performance and excellent generalization capability. Overall, the experimental results on Dataset A robustly confirm BACANet’s exceptional performance in precisely segmenting liver US images and detailing liver boundaries.
Table 3 The results of the fourfold cross-validation on Dataset A for BACANet.Quantitative analysis of Dataset BThis section presents a quantitative analysis of all experimental models on Dataset B, including an analysis of the variations in different losses of BACANet. It also provides the predictive metrics results for all models on both the validation and test set. Furthermore, ablation experiments are conducted to verify the impact of different modules on the overall segmentation performance of BACANet. Finally, a brief discussion on the model’s parameter scale and inference speed is included.Figure 6 illustrates the variations in different losses of BACANet during the training process. It is observed that all losses gradually decrease and stabilize as the number of training epochs increases. The total loss curve shows periodic fluctuations due to the adoption of a cosine annealing schedule for learning rate adjustments. After 100 training epochs, the body cross-entropy loss and dice loss decreased to their lowest points at 0.0515 and 0.0267, respectively. The boundary binary cross-entropy loss and mean squared error loss decreased to their lowest at 0.0336 and 0.0075, respectively, with the total loss reaching a minimum of 0.0773. These results indicate that each component of the loss contributes to the optimization of the overall total loss. Particularly, the body loss plays a dominant role, while the boundary loss serves a supportive function.Fig. 6Individual loss variations of BACANet during the training process.Figure 7 presents the average performance of various models on the validation dataset of Dataset B in the form of a radar chart. It is noteworthy that among the comparative models, the UNet model exhibits the least favorable performance, with a DSC of 0.9079 and an IOU of 0.8384. This indicates that UNet has limited capability in feature extraction and a weaker generalization ability on unfamiliar data. In contrast, SegNeXt performs notably better, effectively capturing multi-scale global feature information, with its DSC and IOU reaching 0.9293 and 0.8726, respectively. In terms of overall performance, the BACANet model stands out as the best, achieving the highest levels of DSC and IOU at 0.9456 and 0.8987, respectively, demonstrating its robustness and effectiveness in segmentation tasks.Fig. 7Best validation results of different models on Dataset B.Table 4 illustrates the predictive performance of various experimental models on test set of Dataset B. It is observed that all models achieved a DSC of over 0.90. UNet performed the worst with a DSC of 0.903, an IOU of 0.834, and an ASD of 4.691. The other comparison models all achieved DSC above 0.92 and IOU above 0.85, but their higher ASD values indicate deficiencies in refining liver boundary segmentation. BACANet achieved the best predictive performance across all segmentation metrics, with a DSC of 0.950, an IOU of 0.907, and an ASD of 2.075. Compared to UNet, BACANet showed improvements of 0.047 and 0.073 in DSC and IOU, respectively, and a reduction in ASD by 2.616. These experimental results demonstrate BACANet’s predictive capabilities on unknown liver US images, effectively achieving precise segmentation of both the liver’s main body and its boundary.
Table 4 Test results for all models on Dataset B, bolded text represents the best indicators.Table 5 presents the results of ablation experiments conducted on core modules within BACANet. The baseline configuration refers to a lightweight UNet model using ResNet10t as the backbone, from which the boundary decoder, MDCAM, and EAG were removed. These core modules proposed in the paper were incrementally added to the baseline, and their improved models were trained and evaluated on the test dataset. The results show that after explicitly introducing a boundary supervision branch, the model’s DSC and IOU improved by 0.013 and 0.024 respectively, demonstrating the auxiliary role of the boundary decoder in the final segmentation of the liver main body. With the introduction of MDCAM, the model’s ability to capture global features was enhanced, further increasing the DSC and IOU to 0.939 and 0.889. Finally, by incorporating boundary features into the decoder through the EAG, the liver main body decoder’s ability to locate the liver area and refine liver boundaries was effectively improved, reaching optimal levels of DSC, IOU, and ASD at 0.950, 0.907, and 2.075 respectively.
Table 5 Results of ablation experiments on the core module of BACANet.Table 6 shows the parameter count P (in millions, M) and the inference time per image on CPU (in seconds, s) and GPU (in milliseconds, ms) for different models. For BACANet, only the parameter required for the inference of the liver’s main body is listed, as it does not need to perform the forward propagation of the boundary branch during inference. It can be observed that UNet and TransUNet have larger parameter count, at 31.03 M and 105.27 M respectively, with corresponding TCPU of 2.085 s and 3.008 s. Therefore, they may not be suitable for real-time liver US segmentation tasks under resource-constrained conditions. In contrast, the parameter count of BACANet is 7.56 M (around 24% of UNet), and its corresponding TCPU is 0.32 s (around 15% of UNet). It is noteworthy that UNeXt has the fewest parameter at 1.47 M and the shortest TCPU at 0.093 s. However, the overall performance of UNeXt on the test set is still inferior to BACANet, with a lower DSC and coarser boundaries. The experimental results indicate that in the task of liver US segmentation, BACANet outperforms the other models, successfully balancing inference speed and accuracy. Additionally, except for TransUNet and SegNeXt, the inference time per image on GPU for the remaining models is within 5 ms. It is worth noting that TGPU is associated with multiple factors, including the deep learning framework, GPU version, and model architecture. In real-time scenarios, powerful computational resources may not be available, so the listed TGPU values are provided for reference only.
Table 6 Parameter count and inference time for different models.Qualitative analysis of Dataset BThis section presents the qualitative analysis results of all experimental models on Dataset B. We first visualized the predicted results for the liver’s main body, and then discuss the quality of the feature maps generated within the models.Due to the presence of speckle noise, shadows, and other factors in US images, accurately locating the liver region and refining anatomical boundaries is a challenging task. Furthermore, in liver and gallbladder US examinations, the technician will employ various scanning planes, such as longitudinal, transverse, and oblique views, in order to obtain detailed images of the corresponding locations. As shown in Fig. 8, the first and second rows of images focus on the right and left lobes of the liver, respectively, while the third row of images emphasizes the observation of the gallbladder, resulting in an incomplete depiction of the adjacent liver morphology.Fig. 8Visualization of the predicted results on test images for different models. The red region indicates the liver mask, the blue region represents the predicted results, and the green area shows the overlap between the two.Upon further analysis of segmentation results, most models can roughly locate the liver region but show variations in detailed predictions. In the first row of images, the liver occupies a large proportion of the image, with an unclear boundary in the upper right corner, leading to more false negatives (red areas) in comparative models. In contrast, BACANet accurately positions the liver early in the network and employs effective supervision through SLKM, identifying liver features deeply in the network layers, resulting in superior segmentation. In the second row, UNet, TransUNet, and RDeeplab exhibit more false positives (blue areas), possibly due to redundant feature extraction at larger scales, impacting segmentation accuracy. BACANet’s explicit supervision strategy effectively addresses the challenging liver boundary in the upper right, leading to the most accurate segmentation. The third row primarily observes the gallbladder, which has distinct morphological characteristics and contrast with the liver, enabling all models to achieve relatively accurate segmentation. Nonetheless, BACANet demonstrates the most precise delineation of the liver boundary, producing the most complete and smooth segmentation outcome.Figure 9 shows the internal feature maps of different models during the decoding of liver US images, revealing varying levels of understanding of the liver’s main body among these models. UNet roughly identifies the liver region but exhibits indistinct boundaries. In contrast, SegNeXt and UNeXt provide more precise localization and clearer boundaries by capturing contextual information from a multi-scale perspective, enhancing their understanding of the liver’s characteristics. Furthermore, BACANet not only gathers multi-scale global information through MDCAM but also effectively captures liver boundary features using boundary supervision and the EAG module. As depicted in Fig. 9, BACANet’s internal feature maps not only highlight the liver body region but also emphasize liver boundaries, which is absent in the comparative models. Analyzing these internal feature maps provides insights into BACANet’s regions of interest for liver segmentation tasks, enhancing interpretability of its predictions.Fig. 9Visualization of internal feature maps for different models.Qualitative analysis of liver segmentation results confirms BACANet’s superiority in precise liver segmentation, demonstrating superior visual outcomes compared to other models. Observing the internal feature maps further reveals BACANet’s deeper understanding of liver boundary information, significantly improving its predictive accuracy. These observations underscore BACANet’s effectiveness and validate its superiority in liver segmentation tasks.

Hot Topics

Related Articles