A full-scale lung image segmentation algorithm based on hybrid skip connection and attention mechanism

Experimental configurationThe configuration environment for this study was a Windows 11 workstation with an AMD Ryzen 7 5800H processor and Radeon Graphics (3.20 GHz). It also had 16 GB of RAM and an NVIDIA GeForce RTX 3050 Laptop GPU (4 GB VRAM). The development environment used was PyCharm 2022, with the PyTorch 1.12.1 framework. In model training, the loss function uses the TaskAligedAssigner positive sample allocation strategy and introduces the Distribution Focal Loss function, the batch size was 64, the number of model iterations was epochs = 100 and select the optimal checkpoint as the model validation parameters, and the initial learning rate of the model was 0.001. The stochastic gradient descent method was selected as the optimizer to update the model parameters.Evaluation metricsIn order to verify the validity of the method proposed in this paper, quantitative evaluation indicators are used. Precision is a measure of the accuracy with which the model predicts positive cases, the higher the value, the better the algorithm’s predictive performance. Recall is used as a measure of the coverage of positive examples by the model, and a higher value indicates a better predictive performance of the algorithm. Pixel accuracy (PA) is the ratio of the number of correctly classified pixels to the total number of pixels in the image segmentation result, which is used to measure the accuracy of the image segmentation algorithm, and the higher the value indicates the better performance of the algorithm. Mean Intersection Over Union (mIoU), which represents the ratio of the intersection and merger of the true labels and the predicted values of the class, the greater the value, the better the algorithm’s segmentation effect and the higher the accuracy. Dice coefficient a measure of the similarity between the real and predicted samples, the higher the value the better the prediction of the model. The meaning of mAP value is the average area enclosed by the Precision-Recall curve, the higher the value, the better the performance of the algorithm. GFLOPs denote the number of floating point operations, which measures the computational complexity of the model, the larger the value of GFLOPs, the higher the complexity of the model and the lower the computational efficiency. The formulae are shown in Eqs. (5–11).$$Precision = \frac{{N_{TP} }}{{N_{TP} + N_{FP} }}$$
(5)
$$Recall = \frac{{N_{TP} }}{{N_{TP} + N_{FN} }}$$
(6)
$$PA = \frac{{N_{TP} + N_{TN} }}{{N_{TP} + N_{FP} + N_{TN} + N_{FN} }}$$
(7)
$$Dice = 2 \times \frac{Precision \times Recall}{{Precision + Recall}}$$
(8)
$${\text{mIoU}} = \frac{1}{k + 1}\sum\limits_{i = 0}^{k} {\frac{{N_{TP} }}{{N_{FN} + N_{FP} + N_{TP} }}}$$
(9)
$$AP = \sum\limits_{i = 0}^{i = k – 1} {[{\text{Re}} call(i) – {\text{Re}} call(i + 1)] \times \Pr ecision(i)}$$
(10)
$$mAP = \frac{{\sum\limits_{i = 1}^{k} {AP_{i} } }}{k}$$
(11)
Where k denotes the number of classifications, NTP represents the number of positive samples predicted to be positive by true positive, NFP represents the number of negative samples predicted to be positive by false positive, NFN represents the number of positive samples predicted to be negative by false negative, and NTN represents the number of negative samples predicted to be negative by true negative.DatasetsMontgomery County chest X-ray set28, This dataset was collected in collaboration with the Department of Health and Human Services, Montgomery County, Maryland, USA. Contains 138 frontal chest x-rays from the Montgomery County Tuberculosis Screening Project, of which 80 are normal cases and 58 are cases with tuberculosis manifestations. This dataset can be used as a segmentation and classification task and some of the results of the data are shown in Fig. 6. In order to increase the number of samples, we use the data enhancement method of 180 degree rotation and image flipping. The data can be downloaded via the URL: https://openi.nlm.nih.gov/imgs/collections/NLM-MontgomeryCXRSet.zip.Fig. 6Montgomery County chest X-ray Lung dataset images.Shenzhen chest X-ray set29 is a digital image database of tuberculosis created by the Third People’s Hospital of Shenzhen, China, in collaboration with Guangdong Medical College. This dataset contains 336 radiographs exhibiting tuberculosis and 326 normal cases. The dataset contains two parts, the original image and the segmented image, and part of the data is displayed as shown in Fig. 7, and the data can be downloaded via URL: https://openi.nlm.nih.gov/imgs/collections/ChinaSet_AllFiles.zip.Fig. 7Shenzhen chest X-ray lung dataset images.Since the Montgomery County chest X-ray set and the Shenzhen chest X-ray set datasets are the same modal data, these two datasets are combined in the experiments in this paper. The number of samples after data preprocessing is 1896, and we randomly divide them according to 70% of the training set, 20% of the testing set, and 10% of the validation set.Ablation experimentIn order to prove the effectiveness of the proposed method on the data set, different combinations are used to check the experimental effect of the improved algorithm. In this paper, Yolov8 is used as the base network, by incorporating hybrid skip connection and attention into the structure of the base network. The results of the ablation experiments are shown in Table 1.Table 1 Results of ablation experiments.Through the experimental results in Table 1, we can see that method 4, which incorporates hybrid skip connection and attention gate into the underlying network, shows optimal results in Precision, Recall and mAP@0.5 metrics compared to the other different combinations of methods, so it can be concluded that the effectiveness of the algorithms proposed in this paper.Figure 8 shows the heat map of different models on the dataset. method 1 is the basic approach, method 2 is the approach with the addition of an attention gate, method 3 is the approach with hybrid skip connection, and method 4 is the approach with a combination of hybrid skip connection and attention gate. Through the analysis of experimental results in Fig. 8, the heat map of method 2 is more obvious than that of method 1, with segmentation features, proving the validity of attention gate. The heat map of method 3 is more pronounced and more convergent than method 1, proving the effectiveness of hybrid skip connection. The heat map of method 4 is more prominent and convergent than those of method 1, 2, and 3, proving that this improvement of method 4 is the optimal method.Fig. 8Heat map of ablation experiment.Comparison experimentMeanwhile, in order to verify the performance of the algorithm proposed in this paper compared to other methods, Table 2 Comparison experiments of semantic segmentation methods and Table 3 Comparison experiments of instance segmentation methods are carried out. Tables 2 and 3 show the mean values calculated from the results of the three operations as well as the margin of error.Table 2 Comparison experiment of semantic segmentation methods.Table 3 Comparison experiment of instance segmentation methods.According to the experimental results in Table 2, The algorithm in this paper has the advantages of Precision, Recall, Pixel Accuracy (PA), Dice and Mean Intersection Over Union(mIoU) improved by 0.5%, 0.7%, 0.3%, − 0.4% and 1.4%, respectively, compared with the optimal results of other algorithms. On the evaluation metric of Dice, there is a decrease compared to the optimal value, but the proposed algorithm can be proved to be advanced as a whole.The experimental results in Table 3 can be analyzed that the algorithm proposed in this paper improves by 2%, 7.7%, 2.9%, 2.1% and reduces by 3.2 compared to Mask RCNN19 and improves by 1.2%, 4.3%, 2.3%, 1% and reduces by 0.1 compared to Yolo v820 in Precision, Recall, mAP@0.5, mAP@0.5–0.95 and GFLOPs evaluation metrics, respectively. It is proved that the algorithm proposed in this paper is the optimal effect in terms of computing performance and algorithm complexity.Figure 9 shows the semantic segmentation effect diagram, we enlarge the error map and discharge it from top to bottom according to the sequence numbers 1, 2, 3, and 4 corresponding to the numbers in the larger figure. Through the experimental results, it can be analyzed that the algorithm proposed in this paper has a good segmentation effect on the subtle features of the image. Compared to other semantic segmentation algorithms, the proposed algorithm in this paper has better performance in image feature retention. As shown in the red box in the figure, there is a loss of subtle features in different methods, while our proposed method is closest to the standard segmentation effect, so it can prove the sophistication of the algorithm proposed in this paper.Fig. 9Semantic segmentation effect diagram.Figure 10 represents the comparison experiments conducted on the dataset by selecting representative instance segmentation algorithms, and it can be seen through the running effect that the algorithm proposed in this paper improves compared to the Mask RCNN19 and Yolo v820 algorithms in the calculation of instance segmentation confidence, which proves the advancement of the algorithm proposed in this paper.Fig. 10Instance segmentation effect diagram.In order to further evaluate the advantages and disadvantages of the algorithm, we will use the method of loss function to verify the convergence of different algorithms and the training termination position of the algorithm in the iterative process. The curves in Fig. 11 and Fig. 12 are the results of the average loss curves obtained through three 100-round training.Fig. 11Semantic segmentation method loss curve.Fig. 12Instance segmentation algorithm loss function.Through the analysis of the loss curve in Fig. 11, we can see that the convergence of our method has been lower loss values than that of other comparison methods during the iteration process, and tends to be stable after 70 epochs. Therefore, it can be proved that the proposed method has faster convergence speed and better stability in semantic segmentation.In Fig. 12, our method converges in a faster way at the initial position, while the loss values are lower than the other compared methods. After 40 epochs, our loss values change slowly in a stable way. Thus it can be proved that our method converges better and the model is more stable during instance segmentation.

Hot Topics

Related Articles