Multi-scale input layers and dense decoder aggregation network for COVID-19 lesion segmentation from CT scans

Description of datasetsIn assessing the efficacy of our proposed method, we conducted a rigorous evaluation using two widely recognized COVID-19 lesions segmentation datasets: Vid-QU-EX30 and QaTa-COV19-v231. These datasets, renowned for their comprehensive coverage and diverse array of COVID-19 lesions images, serve as invaluable benchmarks for gauging the performance and generalizability of our approach. Table 1 provides a detailed overview of the specifications associated with each dataset, including the number of images, resolution, and any pertinent metadata crucial for contextualizing the experimental results. In addition, to more intuitively understand the content and variability of these datasets. Figure 6 provides researchers with tangible examples of the COVID-19 lesions images used in our evaluation.Table 1 Descriptions of the datasets.Vid-QU-EXThe researchers of Qatar University have compiled the Vid-QU-EX dataset specifically to address the urgent need for a comprehensive dataset in the context of COVID-19. It consists of 1864 training images, 466 validation images and 583 test images, and each image within the dataset is imbued with a wealth of information. Researchers seeking to delve deeper into the dataset’s intricacies and specifications can avail themselves of detailed information accessible via the provided link: https://www.kaggle.com/datasets/anasmohammedtahir/covidqu.QaTa-COV19-v2A collaborative effort between the esteemed researchers at Qatar University and Tampere University has yielded the QaTa-COV19-v2 dataset, which consists of 5359 training images, 1786 validation images and 2113 test images. This dataset represents a paradigm shift with its COVID-19 chest X-ray images to encompass the wide range of manifestations and changes observed in clinical practice. Unlike previous iterations, this dataset introduces a breakthrough feature, including a true segmentation mask for the COVID-19 pneumonia segmentation task. These masks serve as valuable annotations, providing pixel-scale descriptions of COVID-19 pneumonia lesions in chest X-ray images. Detailed information is available at: https://www.kaggle.com/datasets/aysendegerli/qatacov19-dataset/data.Fig. 6Challenging cases of COVID-19 lesion images. The first and second rows: original images and corresponding gold labels on the Vid-QU-EX dataset. The third and fourth rows: original images and corresponding gold labels on the QaTa-COV19-v2 dataset.Implementation detailsOur research work is built on the powerful PyTorch platform, a dynamic framework renowned for its versatility and scalability in deep learning tasks. To execute our experiments with precision and efficiency, we harnessed the computational prowess of an NVIDIA Quadro RTX 6000 graphics card, equipped with a 24 GPU memory capacity that underpins the computational demands of our methodologies. Prior to commencing the training process, we meticulously prepared our dataset through a series of rigorous pre-processing procedures, and the images were cropped into small pieces of size 256 \(\times\) 256 that were served as the foundational inputs to our method. During the training phase, we employed the Adam optimizer, a state-of-the-art optimization algorithm revered for its efficacy in converging towards optimal solutions. With hardware limitations in mind, the initial learning rate is set to 10-3, the batch-size to 32 and the epochs to 250. Figure 7 illustrates the fluctuation of loss and accuracy values throughout the iterative training and verification process. The training loss consistently decreases over time, indicating that the model is progressively learning from the data. It starts around 0.5 and declines steadily, reaching very low values (approximately 0.05) by the 250th epoch. The training accuracy rises quickly in the initial epochs, reaching about 95% accuracy after only 50 epochs. It continues to improve slightly as training progresses, reaching close to 99% towards the end of training. This indicates that the model is learning to classify the training data correctly with high precision. However, the validation loss and validation accuracy show a more fluctuating behavior, especially in the earlier epochs. Despite these fluctuations, the validation accuracy still stabilizes at around 95%, which shows relatively strong performance.Fig. 7The changes process in loss and accuracy values during training and validation of MD-Net. The first row: results on the Vid-QU-EX dataset. The second row: results on the QaTa-COV19-v2 dataset.Evaluation metricsTo ensure a rigorous and unbiased assessment of our model’s performance, we employed a comprehensive suite of three key performance evaluation metrics: the Dice value32,33, Matthews correlation coefficient34,35, and Jaccard index36,37. These metrics serve as indispensable yardsticks for quantitatively gauging the efficacy and accuracy of our model’s predictions across various tasks and datasets, which are defined as:$$\begin{aligned} Dice&=\frac{2TP}{2TP+FN\text {+}FP}, \end{aligned}$$
(8)
$$\begin{aligned} Mcc&=\frac{TP\times TN-FP\times FN}{\sqrt{(TP+FN)(TP+FP)(TN+FN)(TN+FP)}}, \end{aligned}$$
(9)
$$\begin{aligned} Jaccard&=\frac{TP}{TP+FN+FP}, \end{aligned}$$
(10)
where TP and TN indicate instances correctly identified as positive and negative, FP and FN refer to instances incorrectly classified as positive and negative.Ablation studiesTable 2 provides a comprehensive overview of the meticulous ablation experiment carried out on the Vid-QU-EX dataset. The evaluation criteria encompass Dice value, Matthews correlation coefficient, and Jaccard index, indicative of segmentation accuracy. The baseline model, represented by the U-Net architecture, serves as the foundation for comparison against a series of augmented models. The baseline model, represented by the U-Net architecture, serves as the foundation for comparison against a series of augmented models. Notably, the addition of SE-Conv, designed to recalibrate channel-wise feature responses, yields a noticeable improvement across all metrics. Similarly, MIL aims to help further improve performance by resolving label noise and ambiguity through instance-level monitoring. In addition, DDA facilitates adaptive enhancement strategies tailored to data set features. However, the most compelling findings emerge from the synergistic integration of these methodologies. The combination of Baseline+MIL+DDA+SE-Conv recorded the highest score in segmentation accuracy: Dice value was 0.8425, Mcc was 0.8176, and Jaccard index was 0.7292. This nuanced analysis not only validates the effectiveness of individual strategies, but also underscores the importance of their cohesive integration in advancing the latest semantic segmentation.Table 2 Ablation experiment of MD-Net on the Vid-QU-EX dataset (Bold represents the best result).Furthermore, we present the visualization outcomes stemming from our meticulous module ablation experiment. As depicted in Fig. 8, the first row is the images of the test set, and the second row is the corresponding ground-truth label images. The third to last rows are the predicted segmentation visual renderings after the introduction of MIL, SE-Conv, DDA, MIL+DDA, MIL+SE-Conv, SE-Conv+DDA, MIL+DDA+SE-Conv, respectively. Through detailed comparative analysis of these segmentation visualizations, it is clear that the segmentation effect of the MD-Net network model is significantly better than that of the basic U-Net backbone network and the combination of MIL, SE-Conv and DDA networks. In addition, MD-Net shows commendable adaptability in handling complex and challenging scenes characterized by low contrast and blurred boundaries. This adaptability is due to MD-Net’s inherent ability to seamlessly blend deep and shallow features extracted from feature maps in its decoding structure, thus helping to obtain more precise target region boundaries.Fig. 8Visualization of ablation results on the Vid-QU-EX dataset. (a,b) original images and corresponding gold labels on the Vid-QU-EX dataset. (c–j) are the results of Baseline, Baseline+MIL, Baseline+SE-Conv, Baseline+DDA, Baseline+MIL+DDA, Baseline+MIL+SE-Conv, Baseline+SE-Conv+DDA, Baseline+MIL+DDA+SE-Conv.Comparisons with the state-of-the-art methodsTable 3 Results of different models on the Vid-QU-EX dataset.To validate the effectiveness of our proposed method in accurately segmenting infected areas within CT images, we conducted a comprehensive evaluation using various models on the Vid-QU-EX dataset. The comparison networks included U-Net, Attention-U-Net, DCANet, M-Net, DCSAU-Net, MCDAU-Net, META-Unet, MSRAformer, Swin-Transformer, MCAFNet, MDUNet, and DualA-Net, all of which were conducted under the same experimental environment. Following 250 iterations of training with meticulously processed COVID-19 datasets, we conducted rigorous testing and compared the segmentation results of different networks based on meticulously recorded numerical evaluation indicators. As shown in Table 3, U-Net initially displayed commendable performance across all metrics, boasting Dice score of 0.8265, Mcc of 0.7992, and Jaccard index of 0.7051. When analyzing the performance metrics, it is evident that MSRAformer, Swin-Transformer, and DualA-Net consistently underperform in comparison to the traditional U-Net across several key evaluation measures. Attention-U-Net, DCANet, M-Net, DCSAU-Net, MCDAU-Net, META-Unet, MCAFNet, and MDUNet outperform U-Net across all metrics, with improvements in Dice, MCC, and Jaccard index. However, the MD-Net achieves the best results in the three evaluation indicators of Dice score, Mcc and Jaccard index, which indicated that the segmentation results of the MD-Net had a high similarity with the real labeled lesion areas. Moreover, the boundary similarity between the segmentation results and the real labeled areas was also high. Notably, our MD-Net demonstrates robust capabilities in accurately identifying COVID-19 lesion areas, and even has decent segmentation performance for accurately delineating smaller areas.In order to compare each model more clearly, we made a visual analysis of the segmentation results, as shown in Fig. 9. In the COVID-19 lesion segmentation task, the U-Net network has obvious over-segmentation problems, resulting in rough edges, uneven contours, and insufficient placement of details. Taking inspiration from the effectiveness of the attention mechanism, Attention-U-Net managed to achieve performance comparable to U-Net. However, despite this improvement, U-Net and Attention-U-Net still fall short in providing satisfactory segmentation results. Transformer models such as MSRAformer and Swin-Transformer excel at capturing global context information through remote dependencies in the image. However, in the COVID-19 focus segmentation task, their inability to focus on local features with sufficient precision resulted in inaccurate or incomplete segmentation. DCANet, MCDAU-Net, META-Unet and DualA-Net have difficulty in effectively preserving edge detail textures, resulting in blurred images and instances of missing or error-detecting areas. Due to multi-scale and attentional mechanisms, MCAFNet and MDUNet are able to produce visually clearer and more accurate segmentation maps, but they can struggle when small, irregular areas of infection are often involved. In contrast, M-Net, a variant of U-Net, has emerged as a promising solution by integrating multi-scale input layers and side output layers, yielding commendable results. In addition, by the introduction of primary feature conservation mechanism, DCSAU-Net cleverly utilizes both low-level and high-level semantic information, showing excellent segmentation performance. However, the MD-Net method is able to segment even the smallest infections scattered throughout the COVID-19 lesion region, which highlights the superior accuracy of our method.Fig. 9Visualization of different models on the Vid-QU-EX dataset. The first and second rows: original images and corresponding gold labels on the Vid-QU-EX dataset. The third to last rows are the predicted results of U-Net, Attention-U-Net, DCANet, M-Net, DCSAU-Net, MCDAU-Net, META-Unet, MSRAformer, Swin-Transformer, MCAFNet, MDUNet, DualA-Net and MD-Net.Table 4 Results of different models on the QaTa-COV19-v2 dataset.Second, we performed the evaluation on the QaTa-COV19-v2 dataset, and the results were shown in Table 4. Notably, our model achieved impressive scores on key evaluation metrics, with Dice scores of 0.8395, Mcc of 0.8232, and Jaccard Index of 0.7311. These measures serve as robust indicators of the model’s ability to accurately portray diseased areas, even in areas with low contrast. Compared to U-Net, MD-Net demonstrated notable improvements across all metrics, with increases of 1.02% in Dice score, 1.33% in Mcc, and 1.67% in Jaccard index. Furthermore, when compared to the sub-optimal DCANet method, MD-Net exhibited marginal yet noteworthy improvements. The Dice score, Mcc, and Jaccard index increased by 0.08%, 0.03%, and 0.08%, respectively. However, although the improvement in accuracy is small, it has superior advantages in terms of parameters and efficiency. Thus, based on a comprehensive evaluation considering both performance and computational complexity, MD-Net emerges as the optimal choice. Its ability to achieve high segmentation accuracy while maintaining reasonable computational demands positions it as a promising solution for the precise detection and segmentation of COVID-19 lesions in medical imaging applications.Furthermore, we complement our quantitative analysis with a visual examination of the segmentation outcomes generated by our model, as illustrated in Fig. 10. After closely examining the visual results, it was clear that MD-Net does an excellent job of capturing local detail with amazing accuracy. Overall, our model had the best results in COVID-19 lesions, confirming that our model had better generalization. Through a combination of quantitative and qualitative evaluations, we affirm the advantages of MD-Net as a universal and reliable tool for COVID-19 lesion segmentation in medical imaging.Fig. 10Visualization of different models on the QaTa-COV19-v2 dataset. The first and second rows: original images and corresponding gold labels on the Vid-QU-EX dataset. The third to last rows are the predicted results of U-Net, Attention-U-Net, DCANet, M-Net, DCSAU-Net, MCDAU-Net, META-Unet, MSRAformer, Swin-Transformer, MCAFNet, MDUNet, DualA-Net and MD-Net.Efficiency analysisTo ensure a fair and thorough comparison, we performed an extensive efficiency analysis across thirteen state-of-the-art models, utilizing both the number of parameters (Params) and frames per second (FPS) as key evaluation criteria, as indicated in Tables 3 and 4. U-Net stands out as a model that optimally balances computational resources, with relatively low parameters, minimal model size, and high FPS, positioning it as one of the most computationally efficient networks in our study. Similarly, DCSAU-Net and DualA-Net are efficient models that use fewer parameters and require shorter training times, which enhances their suitability for real-time applications. Despite their advanced architecture, models like MSRAformer and Swin-Transformer demand significantly higher computational resources, both in terms of parameters and extended training times. However, this increased complexity does not necessarily translate into superior segmentation performance. In contrast, MD-Net offers a compelling alternative by striking an ideal balance between precision and efficiency. With a compact network size of just 8.5747 MB and an impressive frame rate of 73 to 75 milliseconds per frame, MD-Net proves to be a highly practical solution that provides advanced capabilities for diagnosing COVID-19 lesions without requiring a significant amount of computing power.

Hot Topics

Related Articles