Medical image segmentation with UNet-based multi-scale context fusion

DatasetsWe assess the performance of our proposed model using the gland segmentation benchmark datasets: GlaS33 and MoNuSeg34. The GlaS dataset comprises 165 images from T3 and T4 stages of 16 H &E stained tissue sections of colorectal adenocarcinoma. It consists of the Training Part, Test Part A, and Test Part B. For this study, 85 images from the training part are utilized for training, while 80 images from test parts A and B are employed for testing. The MoNuSeg dataset comprises 44 images, with 30 designated for training and 14 for testing. These images capture a variety of nuclear appearances from different patients, disease states, and organs, and the dataset includes over 21,000 meticulously annotated nuclear boundaries.Implementation detailsThe network model was implemented using PyTorch, with hardware consisting of dual Intel Xeon E5-2678 v3 @ 2.50GHz CPUs, 32 GB of RAM, and a single Nvidia RTX 3080 GPU card with 10GB of VRAM. For the Glas and MoNuSeg datasets, the batch size was set to 4, the number of epochs was set to 2000, and early stopping was configured after 200 epochs. The input resolution was set to 224 $\times$ 224, patch size was set to 16, and the Adam optimizer was used with a learning rate of 0.001. Binary cross-entropy and dice loss functions were employed as the training loss functions. To ensure experiment reproducibility, CUDA and Python seeds were set. The evaluation during the experiment used the dice coefficient and Intersection over Union (IOU) as performance metrics. To eliminate the randomness of a single experiment, a fivefold cross-validation was conducted, and the results were averaged, along with calculating the standard deviation.Loss functionAccording to the characteristics of binary cross entropy and Dice similarity coefficient, this paper designs a hybrid loss function, in which the value of $\alpha$ is 0.5 and the $\beta$ value of is 0.5. The formula is as follows:$$\begin{aligned} L = \alpha {L_{Dice}} + \beta {L_{BCE}}. \end{aligned}$$
(1)
In the two-class medical image segmentation task, the commonly used loss function is the binary cross entropy loss function$$\begin{aligned} {L_{\mathrm{{BCE}}}} = – \frac{1}{N}\mathop \sum \limits _{i = 1}^N \left( {{g_i} \cdot \ln \left( {{p_i}} \right) + \left( {1 – {g_i}} \right) \cdot \ln \left( {1 – {p_i}} \right) } \right) , \end{aligned}$$
(2)
where ${g_i}$ is the real category of pixel I, and ${p_i}$ is the prediction of the corresponding pixel.The binary cross-entropy loss function can effectively solve the problem of the disappearance of the network gradient. Because the loss function performs the same evaluation for each category, it is easy for images with more categories to change for images with unbalanced categories, and the optimization direction of the network also affects the experimental results. Another loss function used is the Dice loss function.$$\begin{aligned} {L_{Dice}} = 1 – \frac{{2\mathop \sum _{i = 1}^N {g_i}{p_i}}}{{\mathop \sum _{i = 1}^N {g_i} + \mathop \sum \limits _{i = 1}^N {p_i}}}. \end{aligned}$$
(3)
The Dice loss function is the most common in medical image segmentation, and it is more suitable for unbalanced sample distribution. However, its training error curve is very confusing, making it difficult to get information about convergence.Comparison with advanced methodsTo verify the performance of the proposed TBSFF-UNet model, we compare it with other CNN-based methods and Transformer-based methods, including UNet, Attention UNet, UNet++, and UCTransNet, which adopt their original released code. The comparison of our experimental results and model complexity is shown in Table 1, where the best results are shown in bold. As can be seen from Table 1, TBSFF-UNet has 17% higher Param and 11% higher FlOPs than UNet. TBSFF-UNet has 57% lower Param and 75% lower FlOPs compared to UNet++, 55% lower Param and 48% lower FlOPs compared to attUNet, and 20% lower Param and FlOPs compared to UCTransNet Param is reduced by 76%, and FlOPs are reduced by 20%. TBSFF-UNet achieves the best results on both the GlaS and MoNuSeg datasets, which shows that the TBSFF-UNet network designed in this paper has good and reliable performance while maintaining light weight.
Table 1 Fivefold cross-validation results of each model on the GlaS and MoNuSeg datasets.Figure 3 shows the visualization of segmentation results of different models. It can be seen that TBSFF-UNet achieves excellent performance and is more accurate than other models. The salient areas of the boundary are also very coherent, which once again verifies the effectiveness and advancement of the TBSFF-UNet method.Figure 3Illustration of the proposed TBSFF.Ablation studiesThe ablation studies on the proposed module are shown in Table 2. “Base+TBSFF” outperforms other networks on both the GlaS and MoNuSeg datasets, which also proves the accuracy of our proposed module. Also, it shows that contextual information fusion is necessary to improve segmentation performance.The previous experiments prove that adding skip connection and TBSFF module effectively improves network performance. To further evaluate the improvement, we spliced skip connections and adjusted the number of channels through convolution to exclude the impact of TBSFF on network performance. The model is defined as UNet+CAT. As can be seen from the experimental results in Table 2, the UNet+CAT model performs better than the UNet network on both GlaS and MoNuSeg datasets, which verifies the necessity of skip connections to improve network performance and also proves that the skip connections we added It is effective.
Table 2 Ablation experiments on GlaS and MoNuSeg datasets.To check the TBSFF module, we define UNet with the TBSFF module as Baseline+TBSFF and compare it with the Baseline and Baseline+CAT models. As can be seen from Table 2, on the GlaS and MoNuSeg data sets, the performance of Baseline+TBSFF is significantly better than Baseline and Baseline+CAT, which means that the TBSFF module has a positive effect on improving the network.

Medical image segmentation with UNet-based multi-scale context fusion

SCUBA-D: a freshly trained diffusion model generates high-quality protein structures

Mechanistic exploration of bioactive constituents in Gnetum gnemon for GPCR-related cancer treatment through network pharmacology and molecular docking

Chemoproteogenomic stratification of the missense variant cysteinome

UAlbany chemist receives federal support to advance tool for complete RNA sequencing

Druggable targets for Parkinson’s disease: transcriptomics based Mendelian randomization study

Hot Topics

SCUBA-D: a freshly trained diffusion model generates high-quality protein structures

Mechanistic exploration of bioactive constituents in Gnetum gnemon for GPCR-related cancer treatment through network pharmacology and molecular docking

Chemoproteogenomic stratification of the missense variant cysteinome

Related Articles

Balancing Act: Pregnancy and Bipolar Disorder

Cohesion at the cellular level: flexible yet stable

Gut bacteria influence responses to immunotherapy in patients with asbestos related cancer

Quick Links

Must Read

SCUBA-D: a freshly trained diffusion model generates high-quality protein structures

Mechanistic exploration of bioactive constituents in Gnetum gnemon for GPCR-related cancer treatment through network pharmacology and molecular docking

Chemoproteogenomic stratification of the missense variant cysteinome

UAlbany chemist receives federal support to advance tool for complete RNA sequencing

Popular Articles

SCUBA-D: a freshly trained diffusion model generates high-quality protein structures

Mechanistic exploration of bioactive constituents in Gnetum gnemon for GPCR-related cancer treatment through network pharmacology and molecular docking

Chemoproteogenomic stratification of the missense variant cysteinome