A mixed Mamba U-net for prostate segmentation in MR images

DatasetUsing two publicly accessible prostate MR image segmentation datasets, we evaluate the efficacy and performance of MM-UNet. The International Medical Image Processing Organizing Committee held the prostate segmentation competition in 2012, using the PROMISE12 dataset27. It comprises individuals with benign conditions and prostate cancer, and it originates from four different medical institutes. Of the 80 prostate T2-weighted axial MR images that are accessible, 50 have professional segmentation masks. 60 T2-weighted MR images, all of which are from patients with prostate cancer, make up the ASPS13 dataset28 that was utilized in the NCI-ISBI 2013 Automated Segmentation of Prostate Structures competition. The doctor marks the actual prostate boundaries in each training case. There are 10 prostate MR images in the test set; the ground truth of these images is not provided.Implementation details and evaluation metricsPython and the PyTorch framework are used in the construction of the proposed model, and experiments are conducted on hardware equipped with four NVIDIA 4090 GPUs. We used a variety of online data augmentation techniques, including random Gaussian noise addition, random flipping, and random rotation, to expand the training dataset after normalizing prostate MR images21. During training, a random crop size of 128 × 128 × 128 is used, and the loss function used is called cross-entropy loss. For all experiments, Adam29 was employed as the optimizer, and a poly learning approach was chosen, with a weight decay of 5 × 10−4 and an initial learning rate of 1 × 10−4. We employ a five-fold cross-validation approach for training and testing to acquire a fair and dependable performance of various methods30, and we apply the same data preprocessing and learning strategy for all experiments to produce a fair comparison31.To evaluate prostate segmentation results scientifically and measure model effectiveness, we utilize three performance metrics32: Dice Similarity Coefficient (DSC), 95% Hausdorff Distance (95HD), and Average Symmetric Distance (ASD). The better the segmentation accuracy, the higher the DSC value and the lower the 95HD and ASD values. These are determined using the following equations:$$ DSC(X,Y) = \frac{2|X \cap Y|}{{|X| + |Y|}} $$
(4)
$$ 95HD = \max_{k95\% } \left( {d_{HD} (X,Y),d_{HD} (Y,X)} \right) $$
(5)
$$ d(x,A) = \min_{y \in A} d(x,y) $$
(6)
$$ ASD = \frac{{\sum\limits_{{x \in B_{X} }} d \left( {x,B_{Y} } \right) + \sum\limits_{{y \in B_{Y} }} d \left( {y,B_{X} } \right)}}{{\left| {B_{X} } \right| + \left| {B_{Y} } \right|}} $$
(7)
where X represents the prediction result, Y represents the ground truth value, \(| \cdot |\) represents the cardinality computation operation, \(d_{HD} (X,Y)\) represents the Hausdorff distance between X and Y, \(\max_{k95\% }\) represents the maximum value at the 95th percentile, \(B_{X} ,B_{Y}\) represents the boundaries of X and Y, and \(d(x,A)\) is the Euclidean distance of the voxels that make up the image’s actual spatial resolution.Comparisons with the state-of-the-art methodsWe evaluate our method against nine state-of-the-art methods, which include classic general segmentation networks (Attention U-Net33, V-Net6, and 3D U-Net34), Transformer-based medical segmentation models (TransUNet35, SwinUNETR36 and UNesT37), and methods specifically designed for prostate segmentation (MSD-Net21, CCT-Unet11, and CAT-Net38). When comparing our evaluation metrics to other approaches, we display both the standard deviation and average performance in the form of the mean ± standard deviation.Table 1 quantitatively demonstrates the segmentation performance of MM-UNet and other comparison methods on the PROMISE12 and ASPS13 datasets. Our method works better than other techniques on two public prostate MR imaging datasets, according to experimental results. In particular, our method achieved an average DSC of 92.39%, 95HD of 3.43 mm, and ASD of 1.42 mm in experiments conducted on the PROMISE12 dataset. Our method gets the best indicator values out of all the methods. The efficacy of the proposed MM-UNet is demonstrated by the improvements in DSC, 95HD, and ASD compared to the traditional V-Net, which are 3.35%, 0.78 mm, and 0.61 mm, respectively. Among Transformer-based medical segmentation models and methods specifically designed for prostate segmentation, UNesT and MSD-Net are the techniques that perform the best, respectively, while our method is 1.42 and 0.51% higher than these two methods in DSC scores, respectively. In terms of DSC, 95HD, and ASD, the proposed method generally demonstrated improved reliability for prostate MR image segmentation. Similarly, in experiments on the ASPS13 data set, our method achieved the best performance, with values of 92.17%, 3.61 mm, and 1.67 mm for DSC, 95HD, and ASD, respectively. Compared with CCT-Unet, which ranks second on ASD, DSC, and 95HD, it has improved by 0.94% and 0.28 mm, respectively.Table 1 Quantitative comparison of our method and others on the PROMISE12 and ASPS13 datasets.Figure 5 and Fig. 6 display the qualitative evaluation results of our method and those of the competitors in common cases. In the slices of different situations, it can be observed that our method has fewer incorrect segmentations, and the segmentation boundaries are closest to the true situation. The careful model structure design of the proposed MM-UNet allows for considerable and consistent performance benefits on prostate MR images, as demonstrated by quantitative and qualitative analysis. These outcomes show the effectiveness of MM-UNet in solving the varying and complex semantics of prostate regions in this challenging task.Fig.5Qualitative comparison of our method and others on the PROMISE12 dataset. The red line represents the real situation, and the blue line represents the segmentation results of various models.Fig.6Qualitative comparison of our method and others on the ASPS13 dataset. The red line represents the real situation, and the blue line represents the segmentation results of various models.Ablation studyIn our proposed network, we conduct an ablation study on the PROMISE12 dataset to evaluate the performance of different methods loaded with various modules, therefore demonstrating the efficacy of diverse components. As a baseline, we employ the 3D U-Net34 architecture. To lower the computational complexity, we use a 4-layer structure in the encoder as opposed to the 5-layer structure seen in the typical U-Net design. To ensure a fair comparison, all competitors in our ablation experiments performed on the same computing environment with the same data augmentation.We conduct step-by-step ablation experiments by substituting our presented 3D Res2Net encoder, global context-aware module (GCM), adaptive feature fusion module (AFFM), and multi-scale anisotropic convolution module (MACM) for the corresponding modules in the baseline structure. As can be seen from the quantitative experimental results in Table 2, compared with the baseline, the 95HD of model 1 is reduced by 0.11mm, proving the advantages of the 3D Res2Net encoder. The DSC of Model 2, Model 3, and Model 4 improved by 0.81, 0.98, and 0.57%, respectively, in comparison to Model 1, demonstrating the efficacy of GCM, AFFM, and MACM. The DSC of MM-UNet is enhanced by 0.66, 1.19, and 0.94% in comparison to Models 5, Model 6, and Model 7, respectively. This indicates that utilizing GCM, AFFM, and MACM together can further enhance the model’s performance. For DSC and 95HD, MM-UNet gets significant improvements of 2.49% and 1.13mm, respectively, over the baseline, demonstrating the superior segmentation performance of the proposed network. Figure 7 qualitatively demonstrates the advantages of the proposed prostate MR image segmentation module. By comparing the segmentation results of Model 1 and Model 2, we can see that after the introduction of GCM, the network has a stronger ability to identify the prostate area, which proves that GCM can fully capture global context information to optimize segmentation edges. Model 5 shows less-segmentation and under-segmentation as compared to Model 2, indicating that AFFM can adaptively fuse low-level and high-level information to improve the model’s segmentation capabilities. MM-UNet integrates GCM, AFFM, and MACM at the same time, and more comprehensive and smooth segmentation results are produced. These results prove that the synergistic effect between modules can well improve the segmentation results of the prostate edge.Table 2 Prostate segmentation performances of different models in our system..Fig.7Visual comparison between different models in our system for prostate segmentation. The colors white, green, and red represent the correct segmentation, the under-segmentation, and the over-segmentation, respectively.Further, we conducted ablation experiments on GCM and MACM to explore the best performance combination. The ablation study results for GCM are displayed in Table 3. It can be seen that the use of Residual Block and HardSwish activation contributed to the final performance. At the same time, we also discussed the selection of multi-scale convolution kernels and the necessity of anisotropic convolution in MACM, and Table 4 presents the results of the study. It can be seen that by combining small kernel convolution and large kernel convolution, multi-scale feature information can be well captured so that scale changes between different image instances can be processed. At the same time, compared with using ordinary 3D convolution, using anisotropic convolution has improved the image quality by 0.12 mm on the 95HD. This indicates that the 3D context information of MR images with anisotropic resolution can be more effectively used by anisotropic convolution.Table 3 Ablation study on global context-aware module.Table 4 Ablation study on multi-scale anisotropic convolution module.

Hot Topics

Related Articles