A deep learning based assisted analysis approach for Sjogren’s syndrome pathology images

IoU loss function simulation experimentSimulation experiment 1To evaluate the performance of S-MPDIoU compared to other IoU loss functions, we conduct simulation experiment 1. Different IoU loss functions exhibit high sensitivity in adjusting the learning rate, so even small changes in the learning rate may lead to significant differences in the final convergence effect. To ensure a more equitable evaluation, the learning rate is fixed at 0.02. The choice of this small learning rate value is carefully considered, as it enables all IoU loss functions to converge while minimizing the impact of learning rate changes on the final result.Assuming the four parameters of the ground truth bounding box are \(\left( x^{gt},y^{gt},w^{gt},h^{gt} \right)\), the four parameters of the predicted bounding box are \(\left( x,y,w,h\right)\), the loss defined in this simulation experiment is as follows:$$\begin{aligned} Loss=\left| x^{gt}-x \right| +\left| y^{gt}-y \right| +\left| w^{gt}-w \right| +\left| h^{gt}-h \right| \end{aligned}$$
(21)
The training epochs are set to 100,000. A predicted bounding box is considered converged when the loss is less than 0.01. The specific experimental results are shown in Fig. 5. The experiment examines the convergence effects from three different cases: horizontally aligned boxes, diagonally aligned boxes, and vertically aligned boxes. The legend indicates the different IoU loss functions and the epochs needed to achieve convergence. Notably, in Fig. 5, S-MPDIoU achieves convergence in all three cases.Fig. 5Diagram of the simulation experiment 1. The diagram depicts three cases: horizontal, diagonal, and vertical, arranged from left to right. The ground truth bounding box center coordinates are fixed at [1, 1], with both width and height set to 0.5. The predicted bounding box has a fixed width of 1 and a height of 2, with center coordinates of [7, 1], [7, 7], and [1, 7], respectively.As seen in Fig. 5, the initial predicted bounding boxes are larger in size compared to the ground truth bounding box. The convergence processes of DIoU and CIoU exhibit distinct shape changes. Initially, they tend to enlarge the dimensions of the predicted bounding box and then shift the box towards the ground truth. Once the predicted bounding box encompasses the ground truth, DIoU and CIoU will shrink its dimensions to achieve convergence.On the other hand, EIoU and SIoU initially shrink the width and height of the predicted bounding box. As one or both of the dimensions (width or height) converge, the predicted bounding box gradually moves closer to the ground truth. Notably, EIoU exhibits significantly faster convergence than SIoU. This difference can be attributed to the complexity of SIoU’s penalty term, which potentially introduces additional interference, thereby slowing down the convergence process.The convergence of GIoU is more challenging and unstable. In the diagonal case, GIoU behaves similarly to DIoU. In the vertical and horizontal cases, GIoU behaves similarly to EIoU. However, these convergence processes are all very slow.MPDIoU does not exhibit significant deformation in all three cases. Its convergence trend is initially to reduce the distance, and then to adjust dimensions. However, when the predicted bounding boxes are very close to the ground truth yet do not overlap, IoU does not contribute to the gradient calculation, and the gradient contribution from MPDIoU’s penalty term diminishes sharply, leading to a notably slow convergence process.S-MPDIoU exhibits a convergence trend analogous to MPDIoU. In the diagonal case, where the centers of the predicted and ground truth bounding box form a 45-degree angle, the baseline length remains identical to that of MPDIoU. However, the introduction of scale factor enables S-MPDIoU to sustain a considerable gradient, even when the predicted bounding box is close but does not overlap with the ground truth bounding box, ultimately achieving a convergence speed almost 6 times faster. In the vertical and horizontal cases, the baseline in S-MPDIoU penalty term takes into account the angle between the centers of the two bounding boxes, further improving convergence speed by almost 10 times.Based on the simulation experiment, it is clearly demonstrated that S-MPDIoU preserves the strengths of MPDIoU, effectively preventing abrupt shape variations and wandering during the training process. In each epoch, it consistently converges along a fixed direction between the two bounding boxes. By introducing the scale factor and redefining the penalty term, S-MPDIoU overcomes the problem of gradient diminution encountered by MPDIoU, leading to a marked improvement in convergence speed.Simulation experiment 2To further evaluate the convergence performance of different IoU loss functions, we conduct simulation experiment 2. In Fig. 6, A and B simulate feature maps of sizes 20 \(\times\) 20 and 40 \(\times\) 40, right figures show the convergence effect of different IoU loss functions. Inside circles with radii of 10 and 20, respectively, 50 center coordinates of predicted bounding boxes are randomly generated and represented as blue dots. Additionally, 5 center coordinates of ground truth bounding boxes are randomly generated and represented as red dots. Each dot corresponds to 7 bounding boxes with varying aspect ratios, shown as red boxes in the left figures. These boxes maintain an area of 1, with aspect ratios of 7:1, 4:1, 2:1, 1:1, 1:2, 1:4, and 1:7, respectively. Furthermore, each predicted bounding box has 7 additional scales: 0.5, 0.67, 0.75, 1, 1.33, 1.5, and 2. Therefore, the total number of regression cases is 89,500 = 50 \(\times\) 7 \(\times\) 7 \(\times\) 5 \(\times\) 7.Fig. 6Schematic diagram of simulation experiment 2. (A) simulates the 20 \(\times\) 20 feature map and (B) simulates the 40 \(\times\) 40 feature map. The right figures show the convergence performance of different IoU loss functions.To reduce the complexity of the experiment and accelerate the convergence, the learning rate is fixed at 0.02. A predicted bounding box is considered converged when the loss is less than 0.5. Training epochs are set to 10,000.Figure 6A simulates the 20 \(\times\) 20 feature map. During the early stage of training, since the predicted bounding boxes are situated relatively close to the ground truth, EIoU effectively adjusts the width and height of the predicted bounding boxes, leading to a swift reduction in the calculated loss. However, in the later stage of training, due to insufficient gradient contribution from its penalty term for the distance, its convergence speed begins to slow down. On the other hand, S-MPDIoU maintains a stable convergence trend and converges earlier than EIoU in the end.Figure 6B simulates the 40 \(\times\) 40 feature map. In this case, the distance between the predicted and ground truth bounding box is further increased, resulting in a significant increase in the difficulty of convergence for loss functions such as CIoU, DIoU, and GIoU, which rely more on IoU. In the early stage of training, these IoU loss functions constantly enlarge the width and height of the predicted bounding box, but the movement cannot compensate for the loss caused by the shape change, leading to an increase in the total loss. These IoU loss functions begin to converge as the deformation of the predicted bounding box gradually tapers off and ultimately comes to a halt. However, due to the large distance between the predicted and ground truth bounding box and the insufficient gradient contribution from the distance penalty term, their convergences are very slow.MPDIoU’s advantages are especially apparent when the distance between the predicted and the ground truth bounding box is relatively large. Its penalty term significantly contributes to the gradient, allowing the predicted bounding box to move quickly towards the ground truth during the early stage of training. However, once the predicted bounding box approaches the ground truth, it still faces the problem of a sharp decrease in gradient, resulting in a slowdown in convergence speed. Nevertheless, its overall convergence speed is superior to other IoU loss functions mentioned above. Although EIoU converges relatively quickly, its convergence also slows down due to the distance. In contrast, S-MPDIoU consistently maintains a faster convergence speed, demonstrating a significant advantage.Cell detect experimentExperimental environment and parameter settingsThe original YOLOv8 algorithm provides five different scale models: N, S, M, L, and X. Although the structure of these five scale models remains same, each scale model has different depths and widths, resulting in different sizes and complexities. In this paper, we test and analyze the YOLOv8n’s ability to detect lymphocytes in experiments.The platforms used for model training in this experiment are Intel Core i9-13900 CPU and NVIDIA GTX4060 8G GPU. The software uses the Windows system, Python 3.11, PyTorch 2.0.1, and Cuda11.8 deep learning framework. The implementation uses libraries like torch, torchvision, pyyaml, opencv, matplotlib, and Numpy.The training epochs are set to 100 with a batch size of 6. The input image size is 640 \(\times\) 640. The initial learning rate is set to 0.001, and ADAM is used as the optimization algorithm. The weight decay is 0.005, and the momentum is set to 0.937. All default data augmentation methods have been deactivated.Experimental datasetThe experimental dataset used in this paper is sourced from WSI of labial gland biopsy specimens from YanTaiShan Hospital. The use of this dataset has been approved by the Hospital Ethics Review Committee. Due to the large size of WSI, direct detection is not feasible. Therefore, we segment the original WSI into patches with a size of 640 \(\times\) 640 at the highest resolution, and create a dataset containing 600 images under the manual screening and annotation of professional physicians. It should be noted that the average number of lymphocytes contained in a single image exceeds 30, which makes the manual labeling process extremely cumbersome and time-consuming. The sole detection object is lymphocyte, and there are a significant number of lymphocytes with distinctive and consistent features present in a single image. So we manually annotate 600 images and choose a lighter model for training to achieve a balance between model training and annotation complexity. The dataset is divided into training, validation, and testing sets in 8:1:1. An example of annotated images is shown in Fig. 7, where the green boxes represent manually annotated lymphocytes.Fig. 7Illustration for manually annotating dataset. Green boxes indicate the manually annotated lymphocytes.Evaluation indicatorsWe utilize a set of standard metrics to evaluate the performance of the improved YOLOv8n in lymphocyte detection tasks. The primary metrics considered in this paper are Recall (R), Precision (P), and mean Average Precision (mAP). Since the sole object to be detected in our dataset is lymphocyte, these metrics can be represented as follows:$$mAP = \int_{0}^{1} P \left( R \right)\left\lceil R \right.$$
(22)
$$\begin{aligned} & P=\frac{TP}{\left( TP+FP \right) } \end{aligned}$$
(23)
$$\begin{aligned} & R=\frac{TP}{\left( TP+FN\right) } \end{aligned}$$
(24)
Among these metrics, TP (true positive) refers to instances that are correctly predicted as positive, TN (true negative) refers to instances that are correctly predicted as negative, FP (false positive) refers instances that are incorrectly predicted as positive, and FN (false negative) refers instances that are incorrectly predicted as negative.Experimental resultsWe first evaluate the performance of different IoU loss functions, and the experimental results are presented in Table 1. As can be seen from Table 1, S-MPDIoU has the highest mAP among all IoU loss functions, and its precision and recall are also stable. Compared with the CIoU of YOLOv8, its detection precision slightly reduces by 0.7%, but the recall increases by 7.8%, mAP.5 increases by 4.3%, and mAP.95 increases by 2.1%. The experimental results fully demonstrate the effectiveness of S-MPDIoU.Table 1 Detection performance of different IoU loss functions.Then we conduct an experiment to compare and analyze the detection performance of different attention mechanisms. The results are shown in Table 2. Compared with SE, CBAM, GAM, CA, ECA, EMA, and SA, the MDA module has slightly lower recall, but higher precision and mAP, with relatively fewer parameters. Its detection precision increases by 0.5%, recall increases by 6.3%, mAP.5 increases by 3.6%, and mAP.95 increases by 1.9%. This experiment proves that integrating MDA modules into the backbone of the detection model helps enhance and fuse features, achieving an effective balance between detection accuracy and parameter efficiency.Table 2 Detection performance of different attention mechanisms.The heatmaps presented in Fig. 8 demonstrate the efficacy of different attention mechanisms. As shown in the figure, the attention area of MDA is smoothly distributed and comprehensive, evidently focusing on the prominent features of cells while overcoming background interference that would distract attention. Even the cell features in the edge area can be effectively captured. Further demonstrate the advantages of the MDA module.Fig. 8Heatmaps of different attention mechanisms.Afterwards, we conduct an ablation experiment to evaluate the effectiveness of these improvements. As shown in Table 3, after integrating the above improvements into the yolov8n, the final precision of the model reaches 80%, a relative decrease of 0.4%. However, the recall reaches 86.1%, a significant increase of 9.1%. In addition, mAP. 5 reaches 88.5%, a relative increase of 3.2%, and mAP. 95 reaches 37.9%, a relative increase of 2%. These results confirm the effectiveness of the improvements proposed in this paper.Table 3 Results of ablation experiment.We conducted an exhaustive comparative experiment to rigorously evaluate the performance of our improved YOLOv8 model against several state-of-the-art models with similar parameter settings and capabilities, including YOLOv9t, YOLOv10n, RT-DETR27, GOLD-YOLO28, and PP-YOLOE29. The results of this experimentation, presented in Table 4, offer evidence of the balance our model achieves in terms of detection accuracy and complexity.Specifically, our improved YOLOv8 model demonstrated a notable increase in mean average precision (mAP) compared to YOLOv9t and YOLOv10n. This enhancement in accuracy is particularly significant given the competitive nature of these models and underscores the effectiveness of our proposed improvements. Furthermore, when compared to the transformer-based RT-DETR, our model achieved comparable accuracy. In addition, while the experimental results indicate that our model’s accuracy is marginally lower than the powerful GOLD-YOLO, it significantly reduces the parameters and complexity, underscoring its proficiency in striking an effective balance between performance and efficiency. Similarly, when compared to the highly optimized PP-YOLOE, our model demonstrated competitive results, demonstrating its robustness and adaptability across different optimization strategies.Table 4 Algorithm comparison results.The detection effect of our improved model is shown in Fig. 9, where the red boxes represent the lymphocytes detected by the model. As shown in the figure, the improved model fully learns the characteristics of lymphocytes, follows the criteria for lymphocyte discrimination, and overcomes the interference caused by complex backgrounds. For suspicious similar cells such as epithelial cells and plasma cells, it basically achieves correct discrimination and completes the detection and localization task.Fig. 9Detection effect of our improved model, where the red boxes represent the lymphocytes detected by the model.Auxiliary diagnostic system for Sjogren’s syndromeAs presented in Fig. 10, the top three images represent the lesions annotated by the physicians, while the bottom three images represent the lesions identified by auxiliary diagnostic system. It is evident that the system effectively identifies and labels suspicious lesions, which roughly correspond to those annotated by the physicians, indicating that the system boasts high accuracy in lesion discernment and can assist in pathological diagnosis to a certain extent.Fig. 10Effect of auxiliary diagnostic system. Top three images represent the lesions annotated by the physician, and the bottom three images represent the lesions identified by auxiliary diagnostic system.LimitationsFrom the experimental results, it is evident that our designed diagnostic system has attained a high level of detection accuracy, thereby partially fulfilling the objective of assisting physicians in diagnosing Sjogren’s syndrome. Nevertheless, the system still possesses certain limitations. Firstly, due to the small number of datasets, the model may not have fully learned the underlying patterns and relationships within the data. This limitation can lead to underfitting, where the model fails to capture the complexity of the data and is unable to generalize well to new, unseen examples. As a result, the model’s performance may be suboptimal, and it may struggle to make accurate predictions or classifications. Secondly, as the system directly performs pathological diagnosis on pathological images, and the preparation of actual pathological images involves multiple steps, including specimen collection, processing, staining, and digitalization, the practical application of the system in clinical practice still requires careful validation and integration into existing workflows to ensure accuracy, reliability, and efficiency in patient care.

Hot Topics

Related Articles