The impact of fine-tuning paradigms on unknown plant diseases recognition

This section outlines the datasets, including their splits and experimental settings. We also discuss the vision transformer31, a popular feature extractor in recent research. However, we will not delve into its specific parameters, layers, or self-attention mechanism. Instead, our focus will be on the advantages of the visual sensor and the rationale for its selection. Lastly, we will describe various fine-tuning methods, OOD detection methods, and evaluation metrics used in our study.Problem statementIn this section, we define the anomaly detection problem. The training set is \(D^{train}=\{x_i, y_i\}_i^N, i \in N\), where x, y and N denote the sample, label, and the number of images. We define the set of known classes as \(K = \{k_1,k_2,k_3,…k_t\}\), so \(y_i \in K\). In particular, for few-shot settings we have training set \(D^{train}=\{x_i, y_i\}_{i=1}^{M*k_t}, M \in \{2,4,8,16\}\), where M and \(k_t\) denote the number of training samples of each known class and the number of classes. We assume that there exists a set of unknown classes \(U = \{k_{t+1},…\}\), which the model does not witness during training but may encounter during inference, and \(K \cap U = \emptyset\). That means unknown samples should not have labels that overlap with the training data. We can define anomaly detection as a binary classification problem which is formalized as Eq. 1:$$\begin{aligned} Decision_{\gamma }(x_i) = {\left\{ \begin{array}{ll} Unknown\ Class &{} S(x_i) > \gamma \\ Known\ Class &{} S(x_i) \le \gamma \end{array}\right. } , \end{aligned}$$
(1)
where a higher score \(S(x_i)\) for a sample \(x_i\) indicates higher uncertainty. A sample with a score greater than the threshold \(\gamma\) will be classified as unknown classes, and vice versa.DatasetsFor our experiments on open-set recognition, we employ the Cotton32, Mango33, Strawberry34, and Tomato35 disease datasets. We illustrate samples from these datasets in Fig. 3 and provide the detailed training and testing splits in Table 2. Additionally, the Plant Village dataset35 is utilized for our OOD detection and few-shot OOD detection experiments. In addition to the ID categories shown in Fig. 1, we present the OOD categories in Table 5 of the experimental section. Due to the extensive range of categories within the Plant Village dataset, they are not all displayed here. We confirm that all aspects of our study, including both experimental research and field studies on plants, have been conducted in strict accordance with the relevant guidelines and legislation. This compliance covers institutional protocols as well as national and international regulations about plant research. For further details, please refer to our repository.Table 2 Classes information of the dataset.Cotton disease dataset32 comprises five plant diseases, including Aphids, Army Worm, Bacterial Blight, Powdery Mildew, and Target Spot. Its primary focus is on leaf diseases, with no images of diseases affecting stems, buds, flowers, or bolls. The dataset features a balanced class distribution, with around 800 images per category, and was collected in real-world conditions, Besides, it also provides 800 images for healthy leaves.Mango leaves disease dataset33 compiled by Ahmed et al., is a comprehensive collection of 4000 mango leaf images, each with a resolution of 240×320 pixels. The dataset encompasses seven specific mango leaf diseases and healthy leaves. Each disease category contains roughly 500 images, ensuring a balanced distribution across the eight classes. The images, mainly captured using mobile phone cameras, originate from four mango orchards in Bangladesh.Figure 3Dataset examples used for open-set recognition. Class numbers are provided at the bottom. Please refer to Table 2 for class names.Strawberry disease dataset34 released by Afzaal et al., this dataset includes 2500 images that capture various strawberry diseases. The data were collected using camera-equipped mobile phones in both real-field and greenhouse settings, mainly across multiple greenhouses in South Korea. This dataset, encompassing the early, middle, and late stages of the diseases, was designed to enhance disease detection and segmentation. To ensure consistency with other datasets, we have supplemented it with images of healthy strawberry leaves from the Plant Village dataset35.Tomato disease dataset35 focuses on tomato diseases. It includes ten categories of tomato leaves, encompassing nine disease types and one healthy leaf category. This dataset is notable for its imbalanced sample distribution, ranging from 300 to 5000 samples per category, which adds complexity to the analysis. For comprehensive evaluation, we have utilized color, grayscale, and segmented images.Plant village dataset35 is a vast collection of 54,309 images, covering 14 crop species and a wide range of diseases, including fungal, bacterial, oomycete, viral, and mite-induced diseases. It also features healthy leaves for twelve crop species. For our study, we used images of 12 types of healthy leaves as an in-distribution (ID) dataset and constructed six out-of-distribution (OOD) datasets based on species categorization: apple (3 types), corn (3), grape (3), potato (2), tomato (9), and others (6). We also evaluated OOD detection performance under a few-shot learning setting using this partitioning approach.Overview of frameworkWe present an overview of the framework for this study in Fig. 4. The post-hoc out-of-distribution detection method consists of two steps. The first step involves training or fine-tuning the model on a training set. In the second step, post-hoc OOD detection methods are deployed to obtain uncertainty scores, such as those based on energy and maximum softmax probability.In our study, we employed the ViT-base model as a feature extractor to assess the effectiveness of various fine-tuning paradigms in open-set recognition (OSR), out-of-distribution (OOD) detection, and few-shot OOD detection. One of the significant advantages of ViT is its ability to achieve remarkable performance on large-scale image datasets with minimal architectural modifications. For example, a Transformer model trained on textual data can be directly used for fine-tuning visual tasks. It can be easily fine-tuned for specific tasks such as object detection36, key-point detection localization37, and image segmentation28, further demonstrating its flexibility and ability to generalize across different vision tasks.Figure 4The architecture of three fine-tuning paradigms for OOD detection. Step 1 compares three fine-tuning paradigms for ViT model, where visual prompts fine-tune the model by adding a set of learnable tokens to the input space. Step 2 indicates the pipeline of post-hoc OOD detection methods.In Step 1, we use ViT-B/16, pre-trained on ImageNet-21k, as our pre-trained model. To address the problem of OOD detection, we explored two traditional fine-tuning paradigms (FFT and LPT), and an efficient fine-tuning paradigm (VPT). The three different fine-tuning paradigms are illustrated in Fig. 4. It can be observed that all of them involve fine-tuning the classifier head. In Step 2, the logits output by the classifier head are transformed into uncertainty scores \(S(x_i)\) using different post-processing out-of-distribution (OOD) detection methods. If \(S(x_i)\) exceeds a threshold \(\gamma\), the sample is treated as an OOD sample, thus achieving out-of-distribution detection.OOD detection methodsThe classification head can be considered as a feature mapping that aims to map the input image’s features \(F\in R^d\) to the label space \(L\in R^c\), where c represents the number of classes in ID dataset, and L represents the logic values for each class. Post-hoc methods for OOD detection estimate the distributions of ID and OOD data by processing these logic values as an uncertainty score, thereby separating them. The advantages of post-hoc methods lie in their ease of use and the fact that they do not require any modification of the training process and loss function. Consistent with the OOD detection methods used in13, in this paper, we employed five commonly used post-hoc processing methods: energy25, entropy11, variance38, maximum softmax probability (MSP)11, and maximum logits (ML)24. The conversion formula from logical values to uncertainty scores is shown in Eq. 2 to Eq. 6.$$\begin{aligned}{} & {} Energy=-log\sum \nolimits _{j=1}^K z_j/T \end{aligned}$$
(2)
$$\begin{aligned}{} & {} Entropy=Entropy\left(\frac{e^{z_i}/T}{\sum \nolimits _{j=1}^K e^{z_j}/T}\right) \end{aligned}$$
(3)
$$\begin{aligned}{} & {} Variance=-Variance\left(\frac{e^{z_i}/T}{\sum \nolimits _{j=1}^K e^{z_j}/T}\right) \end{aligned}$$
(4)
$$\begin{aligned}{} & {} MSP=-Max\left(\frac{e^{z_i}/T}{\sum \nolimits _{j=1}^K e^{z_j}/T}\right) \end{aligned}$$
(5)
$$\begin{aligned}{} & {} Max-Logits=-Max\left(\frac{{z_i}/T}{\sum \nolimits _{j=1}^K {z_j}/T}\right) \end{aligned}$$
(6)
where \(z_i\) denotes the logits of class i, and T denotes the temperature scaling factor. In this paper, we used \(T=1\) as default.Evaluation metricsFPR@959: FPR@95 can be interpreted as the probability that a negative (out-of-distribution) example is misclassified as positive (in-distribution) when the true positive rate (TPR) is as high as \(95\%\). The true positive rate can be computed by TPR = TP/(TP + FN), where TP and FN denote true positives and false negatives, respectively. The false positive rate (FPR) can be computed by FPR = FP/(FP + TN), where FP and TN denote false positives and true negatives, respectively.The area under the receiver operating characteristic curve (AUROC)39: By treating ID data as positive and OOD data as negative, various thresholds can be applied to generate a range of true positive rates (TPR) and false-positive rates (FPR). From these rates, we can calculate AUROC.The area under the precision-recall curve (AUPR)39: Using the precision and recall values, we can compute metrics of AUPR. Please note that for AUROC and AUPR, higher values indicate better OOD detection performance, while a lower FPR@95 value indicates better OOD detection performance.In-Distribution Accuracy (ID Acc.)40: OOD detection and open-set recognition also require evaluating the model’s performance on ID samples. Therefore, we use Accuracy as the evaluation metric for ID samples.

Hot Topics

Related Articles