Adaptive loss-guided multi-stage residual ASPP for lesion segmentation and disease detection in cucumber under complex backgrounds | BMC Bioinformatics

The main issues faced by cucumber disease spot segmentation and disease detection include:

1. Pixel ratio imbalance This issue mainly stems from sparse pixels in the target area, leading to an imbalance in the ratio of background area pixels to target pixels, such as in the task of extracting small disease spots. As shown in the figure, the ratio of disease spot area pixels to leaf pixels is small, making disease spot area pixels prone to loss. Additionally, a large number of easily classified pixels in the background area can generate significant losses, resulting in the total loss, when finally computed, far exceeding that of the disease spot area pixels. This directly reduces the efficiency of model training and severely impacts segmentation results.

2. High number of hard samples Hard samples mainly originate from complex background images in the natural environment. Due to interference from clutter, leaf overlap, shadow occlusion, uneven lighting, etc., the pixel areas of the interfering parts can be considered hard samples. This leads to incomplete leaf edge segmentation and difficulties in disease spot extraction. These hard-to-distinguish pixels all impact disease spot extraction.

To address the above issues, this study makes improvements in two aspects: loss function and model structure. By adopting an adaptive loss function, issues like low model efficiency due to the sum of the losses of a large number of easy samples during training exceeding the total loss are ameliorated. This can to some extent solve problems such as pixel imbalance and low precision due to hard samples. By carrying out the tasks of leaf and disease spot segmentation in stages, interference from pixels in complex backgrounds is reduced. The first stage uses the Leaf-ASPP network model to segment leaf contours, and the second stage uses the Spot-ASPP network structure to extract disease spot areas.Adaptive loss functionThe introduction of an adaptive loss function mainly aims to resolve issues such as pixel ratio imbalance and an excessive number of hard samples in the task of segmenting diseased cucumber leaves, problems that traditional Cross-Entropy (CE) loss cannot solve. While balanced loss can effectively alleviate class sample imbalance, it overlooks the issue caused by an excessive number of hard samples. Therefore, by improving CE loss and balanced loss, an adaptive loss is generated and its optimization effect is discussed.As a classic loss function in semantic segmentation in image processing, the Cross-Entropy binary classification loss function is defined as shown in the following Eq. (1):$$CE = \left\{ {\begin{array}{*{20}l} { – \log \left( p \right),} \hfill & {if\;y = 1} \hfill \\ { – \log \left( {1 – p} \right),} \hfill & {if\;y = 0} \hfill \\ \end{array} } \right.$$
(1)
where \(y \in \left[ {0,1} \right]\) represents whether the pixel value \(p \in \left[ {0,1} \right]\) is true or false, represents the probability that the model predicts that this pixel belongs to the class \(y = 1\). Specifically, in the context of this paper, during image segmentation, it is determined whether the pixel value belongs to the foreground pixel, otherwise it is a background pixel. In the first stage, \(y = 1\) indicates that the pixel belongs to the target leaf area.For easily classified pixels, such as those with probability values far greater than 0.5, the CE loss function generates a very small loss value. However, due to the vast number of pixels, for example, when the number of easily classified pixels in the data greatly exceeds the loss of hard-to-distinguish pixels, it produces overwhelming results, leading to insufficient training and poor network performance. Therefore, their loss cannot be ignored.α-balanced CE loss is a common method to solve the problem of class imbalance. By introducing a balancing factor \(\alpha\) to the CE loss function, the following Eq. (2) is formed:$$\alpha{\text{-}}balanced\;CE\left( {y,p} \right) = \left\{ {\begin{array}{*{20}l} { – \alpha \log \left( p \right),} \hfill & {if\;y = 1} \hfill \\ { – \left( {1 – \alpha } \right)\log \left( {1 – p} \right),} \hfill & {if\;y = 0} \hfill \\ \end{array} } \right..$$
(2)
In practical operation, by setting cross-validation, \(\alpha{\text{-}}balanced\) CE loss can increase the weight of smaller categories. Although it assigns weights to samples belonging to the same category, it can reduce the impact of a large proportion of data on the loss. However, the problem of hard samples existing in the data must be considered, such as how to effectively partition off pixels in areas of leaves covered by shadows, raindrops, dust, or interference pixels overlapping with other leaves in the background, to exclude interference. CE loss, however, \(\alpha{\text{-}}balanced\) cannot effectively solve the problem of hard samples. Therefore, a new type of adaptive loss function will be tried to solve the above problems.The question of whether the model can actively focus on hard-to-classify pixels during training, without the need for human intervention in setting weights, becomes a key link in disease spot segmentation. A modulation factor \(\left[ {\cos \left( {p + \pi /2} \right) + 1} \right]\) is introduced into the CE loss function. This term will decay as the pixel classification confidence increases, thereby changing the loss weight of hard-to-classify pixels and sparse category pixels in the overall loss. The adaptive loss function (3) is as follows:$$\alpha{\text{-}}balanced\;CE\left( {y,p} \right) = \left\{ {\begin{array}{*{20}l} { – \left[ {\sin \left( {p + \pi } \right) + 1} \right]\log \left( p \right),} \hfill & {if\;y = 1} \hfill \\ { – \left[ {\sin \left( {1 – p + \pi } \right) + 1} \right]\log \left( {1 – p} \right),} \hfill & {if\;y = 0} \hfill \\ \end{array} } \right.$$
(3)
here \(p\) represents the probability value that the model predicts the pixel belongs to the \(y = 1\) class, and the value of the modulation factor is determined by the probability \(P\). It decays as the probability value p increases, thereby reducing the loss value of easily classified pixels. Equation (3) includes the following content:

(1)

When the probability \(p\) decays, it indicates that the pixel value is hard to classify, and the size of the modulation factor increases as the probability \(p\) decays. When the probability p is 0, the modulation factor \(\sin \left( {p + \pi } \right) + 1\) is 1. The loss is infinitesimal and does not affect the overall loss.

(2)

When the probability value p increases, indicating that the pixel is easy to classify, the modulation factor decreases as the probability p rises, thus the loss value of easily classified pixels will decrease. When the probability p rises to 1, the modulation factor reaches its minimum value. The pixel loss value will be reduced to a minimum.

The modulation factor can dynamically adjust the weight size according to the difficulty level of the probability values belonging to different categories, thereby adaptively adjusting the loss value. This process reduces the impact of the total loss of easily classified pixels on model performance. The adaptive loss function can reflect dynamic attention to pixels of two categories of different difficulty levels. To a certain extent, it can alleviate the problem of an excessive number of hard samples in leaf segmentation. It can adaptively assign gradually decreasing weight values to easily classified pixels in the background area, improving segmentation accuracy and enhancing network model performance. At the same time, it effectively mitigates the imbalance problem of the pixel ratio in disease spot segmentation. Overall, the adaptive loss function can effectively improve the network model’s disease spot segmentation performance.Two-stage LS-ASPP network modelMost leaves in complex environments overlap each other, and the background clutter and irrelevant leaves overlap with the target segmentation leaves, affecting the segmentation effect. In addition, there may be diseased leaves in the background image, and areas similar to disease spots can also interfere with the target leaf segmentation. Therefore, a single-stage segmentation task may result in an incomplete disease spot segmentation area, and the low segmentation accuracy can lead to inaccurate disease detection. Therefore, the segmentation task is refined into two stages, from obtaining the disease leaf outline to extracting the disease spot area, optimizing the segmentation process, and improving segmentation precision. This study uses the ASPP as the benchmark network structure, designing a two-stage segmentation model for cucumber disease leaf and disease spot segmentation.The two-stage LS-ASPP segmentation network model consists of Leaf-ASPP and Spot-ASPP. Both stages use the Atrous Spatial Pyramid Pooling (ASPP) as the benchmark network structure. The proposed model’s first-stage network structure uses Leaf-ASPP to extract target leaves from complex scenes. Then, in the second stage, Spot-ASPP is used to segment the more complete disease spot area in the segmented disease leaf. Each stage focuses only on one specific type, reducing the difficulty of segmentation.Leaf-ASPPIn real scenarios, the image background often contains overlapping leaves, which makes it difficult to accurately extract the contours of target leaves, as well as other leaves. More so, uneven illumination, raindrops, and dust can also directly affect segmentation. Therefore, to address these issues and enhance the capability of capturing cucumber disease leaf outlines, we have improved upon the original Atrous Spatial Pyramid Pooling (ASPP), rebranding the optimized network as the Leaf-ASPP network. The main structural optimizations include the replacement of the ASPP module with the Mult Residual ASPP module, enhancing the model’s ability to perceive disease leaf outlines in complex backgrounds. The detailed Leaf-ASPP network model is composed of encoder and decoder parts, and its architecture (Fig. 5) is shown below:Fig. 5To improve the disease leaf segmentation performance of the model in complex scenes, we introduced the Mult Residual ASPP module, also known as the MRA-Net network, to capture more different multi-scale feature leaf outlines.Generally, the larger the receptive field, the better the network’s ability to perceive and judge each pixel. However, due to the characteristics of large neural network models, the number of network layers that increase sequentially, and the frequent use of up-sampling and down-sampling modules to process features, can lead to loss of detail information and reduced segmentation accuracy.Both the Mult Residual ASPP and ASPP modules use dilated convolutions to enlarge the receptive field to obtain different scale feature maps. However, the original ASPP model mainly consists of three parallel dilated convolutions applied to a feature map, with the basic kernel size being 3 × 3. As such, in the initial model, the features extracted by the convolution kernel are similar and cannot distinguish difficult pixel features, which leads to an inability to accurately capture disease leaf outlines. Therefore, we have improved upon the original network, and the embedding of the MRA module enhances the model’s edge extraction ability.As shown in Fig. 5, each branch of the MRA module consists of ordinary convolution, dilated convolution, and attention modules. The difference from the original spatial pyramid pooling structure lies in the different kernel sizes between the branches of the MRA module. In ordinary convolution, different kernel sizes will capture different receptive fields for each branch. Each branch’s basic feature map will effectively capture different information and improve feature distinguishability. Finally, the outputs of each branch are fused to form multi-scale features.In the encoder, two features are output: low-level features and high-level features. Low-level features are extracted by the Xception backbone network, mainly containing shallow information such as disease spot outlines and shapes. High-level features are processed by the backbone network and residual ASPP, mainly containing deep information such as texture and color features.The Residual ASPP inputs the original features into three 1 × 1 convolution modules, four extended attention convolution units, and one residual unit. Each extended attention convolution unit consists of an ordinary attribute convolution module, a 3 × 3 convolution module, and an attention module, while the residual unit is composed of a 1 × 1 convolution module and an attention module. Among them, the dilation rates of the four extended convolution attentions are 1, 3, 3, and 5, with a kernel size of 3 × 3. Then, the outputs of each extended attention convolution unit are added to the output of the residual unit to get the four output feature maps of the Residual ASPP. Finally, the four feature maps are concatenated, and the merged result is input into a 1 × 1 convolution module. Through the above operations, the high-level features are finally obtained.In the decoder, the outputs of low-level and optimized high-level features from the encoder are received. First, the low-level features are input into the attention module and the 1 × 1 convolution layer, yielding a small-scale refined low-level feature map. Then, the up-sampled high-level features are concatenated with the shallow features to obtain a fused feature map. Finally, the fused feature map is input into a 3 × 3 convolution layer for up-sampling processing to get the network’s prediction base map.By improving upon the original ASPP module, the MRA module’s extraction of multi-scale features will reduce more irrelevant information, enhancing the model’s ability to perceive disease leaf edge pixels. This will significantly improve the original network’s segmentation performance.Spot-ASPPIn the first stage of image segmentation, the information of disease leaf contours has been obtained. The disease leaf image contains only a small amount of sparse disease spot features, and the spot pixels account for a small proportion of the total disease leaf area pixels. This increases the difficulty of disease spot extraction in the second stage, resulting in a lower accuracy of disease spot segmentation. Therefore, the original network’s spatial pyramid pooling is optimized again to enhance the model’s segmentation performance. The main improvements are: (1) Adjusting the dilation rate in the ASPP module to reduce the loss of detailed information. (2) The Convolutional Block Attention Module (CBAM) is introduced to highlight important information features again, capture small area pixels such as disease spots, and suppress irrelevant other disease leaf information to improve disease spot segmentation accuracy. The improved network structure is named Spot-ASPP, and its framework is shown in the Fig. 6.Fig. 6Firstly, to enhance the segmentation effect of disease spots, smaller-sized dilated convolutions in the original network are retained, such as those with dilation rates of 2, 4, 6, and 8. The improved network structure is referred to as the CBAM-Net network. The receptive field range is mainly expanded by increasing the dilation rate, but due to the reduced correlation of adjacent local information in the feature map, small target area details will be lost directly. Therefore, the CBAM-Net structure retains smaller dilation rate dilated convolutions, which are more conducive to small disease spot pixel extraction, to achieve a more precise segmentation accuracy.Secondly, to enhance the robustness of the model’s segmentation performance, the Convolutional Block Attention Module (CBAM) is introduced following the optimization of the original ASPP network. In the channel attention module, the feature maps are input into the max pooling layer (Maxpool) and the average pooling layer (Avgpool), generating feature maps that are passed to a Multilayer Perceptron (MLP), thus creating the channel attention map (see Fig. 7).Fig. 7Architecture of channel attention moduleThe channel attention module uses Avgpool and Maxpool modules to integrate the channel information of the feature maps, outputting two types of spatial information contexts processed by max pooling and average pooling respectively. Following a matrix summation operation, the fused matrix map is multiplied by the input features. This operation will effectively enhance the extraction capability for important features and strengthen the expressive power of the features.Model trainingExperiment configurationTo validate the effectiveness of the U-shaped network model of the LS-ASPP network, the proposed method or model is applied to the task of cucumber leaf disease spot image segmentation, and is compared with other methods or models. The network model training and testing environment are both the Ubuntu 18.04 LTS 64-bit operating system. The proposed method is designed using the Python programming language, with Python version 3.7, and the experiment platform uses PyTorch 1.10.2 as the deep learning open-source framework. The experimental hardware platform environment includes an Intel(R) Core(TM) i9-10900F CPU @ 2.80 GHz processor, 32 GB of memory, and an NVIDIA GeForce RTX 3080Ti with 26G of video memory. CUDA_11.6.0 and CUDNN_10.2 are used as the library tools for network model training acceleration.Model parametersThe U-shaped network convolutional layer of the LS-ASPP network refers to the Unet network model pre-training parameters for initialization, which has now been adopted by PyTorch as the default parameter initialization function. The negative slope of the activation function is 0. Kaiming initialization is mainly designed for deep neural networks using nonlinear activation, which can effectively prevent the explosion or disappearance of activation layer outputs in the forward propagation of deep neural networks, thus accelerating model convergence. The model learning rate is 0.0001, the number of training epochs is 15, the total number of iterations is 360, and the batch size for disease spot segmentation training is 4. The optimizer is Adam [17], with a weight decay of 0.00005.

Hot Topics

Related Articles