Integrated ensemble CNN and explainable AI for COVID-19 diagnosis from CT scan and X-ray images

This section covers the specifics of the datasets employed in the study, the evaluation criteria for assessing the model’s performance, and the necessary model tuning and hyperparameter optimization.DatasetsFor this research, we employed publicly available chest radiographs and CT scans from Kaggle and other repositories to create and validate our models for C-19 classification. The datasets were meticulously selected to provide a balanced representation of both C-19 and NC-19 cases. Several preprocessing steps were applied to enhance image quality and maintain consistency across datasets. Figure 3 illustrates examples of C-19 and NC-19 cases obtained from these radiographic and CT imaging datasets. Recognizing the importance of a rigorous dataset preparation process that directly impacts model reliability and accuracy, we further incorporated two external datasets—one with chest radiographs and another with CT scans—to evaluate our models’ generalization ability in diverse real-world situations and across various data sources.The public datasets of chest radiographs and CT scans used in this work consist of confirmed C-19 cases, obtained from various public sources. The collected images were preprocessed to remove noise, normalize pixel values, and resize them to a consistent shape. Additionally, augmentation techniques (e.g., rotations, flips, and crops) were employed to expand the dataset size, improve model generalization, and correct class imbalances.Out of 2,482 total CT scan images, 1,252 CT images were classified as positive for C-19, while the remaining 1,230 images represented patients without the virus.This dataset consists of a total of 30,386 images obtained from 16,648 patients. Among these, 16,194 images were identified as C-19 positive, while 14,192 images were categorized as C-19 negative.This dataset encompasses 16,752 CT scan slices sourced from 7 different public repositories, featuring 7,593 scans marked as C-19 positive and 9,159 as NC-19. It is continually updated to include diverse images from multiple origins.This dataset comprises 3,616 images positive for C-19 and 10,192 images of NC-19 cases, making a total of 13,808 images. The dataset is updated periodically and collected from various sources.Data preprocessingAll images were adjusted to a size of 224 × 224 pixels to ensure uniform processing. The CT scan dataset utilized for this study consisted of preprocessed 2D slices, which were extracted from original 3D volumetric CT scans by the dataset providers. Therefore, our analysis was based solely on these preprocessed 2D slices without any additional modifications or selections from the original 3D data. For the CT scan dataset, the data was segmented into training, validation, and test subsets with a distribution ratio of 70% for training, 10% for validation, and 20% for testing. The training subset was used to build the model, the validation subset for evaluating performance and fine-tuning hyperparameters, and the test subset for final evaluation. We ensured that the distribution of positive and negative samples remained balanced across these subsets. Similarly, the CXR dataset was divided into training and validation subsets in an 80:20 ratio, with testing images provided separately. Table 3 shows the distribution of images within both the X-ray and CT scan datasets, detailing the allocation of images for training, validation, and testing.To improve the model’s generalization and address class imbalance, various data augmentation techniques were applied exclusively to the training sets of all datasets. These methods, including rotation, flipping, and cropping, help simulate variations that may occur in real-world scenarios, thereby enhancing the robustness of the model.Evaluation metricsModel’s performance evaluation metricsTo evaluate the significance of the CNN models and the introduced ensemble model, we have employed a range of evaluation metrics. These metrics are represented by Eqs. (1), (2), (3), (4), and (5). The exact mathematical expressions for each of these evaluation criteria are detailed below:Fig. 3Illustrative examples of COVID-19 (C-19) and non-COVID-19 (NC-19) cases from X-ray and CT image datasets.Table 3 Image distribution across different subsets in the chest X-ray and CT scan datasets.$$\:Accuracy=\frac{{TP}_{c}+{TN}_{n}}{{TP}_{c}+{TN}_{n}+{FP}_{c}+{FN}_{n}}$$$$\:Specificity=\frac{{TN}_{n}}{\left({TN}_{n}+{FP}_{c}\right)}$$$$\:Sensitivity=\frac{{TP}_{c}}{\left({TP}_{c}+{FN}_{n}\right)}$$$$\:G-mean=\sqrt{Sensitivity\times\:Specificity}$$$$\:F1-Score=\frac{{2TP}_{c}}{\left({2TP}_{c}+{FP}_{c}+{FN}_{n}\right)}$$To measure the importance and accuracy of our model in classifying C-19 case, we examine the following factors: true positive (correctly identified C-19 cases, TPc), false positive (incorrectly identified C-19 case, FPc), true negative (correctly identified non-C-19 case, TNn), and false negative (incorrectly identified non-C-19 case, FNn). True positive are instances accurately classified as C-19, while false positives are cases mistakenly labeled as C-19. True negatives represent correctly identified non-C-19 case, whereas false negatives indicate cases incorrectly labeled as non-C-19. Assessing our model’s ability to differentiate between C-19 and non-C-19 situations relies heavily on these factors.Table 4 Details of the newly added layers in the pretrained CNN models.Table 5 Details of the hyperparameter settings for the training of Pretrained Models.Model’s interpretations evaluation metricTo assess the effectiveness of interpretation methods, we use the following metrics44:

Impact on Decision Making: This metric quantifies the change in classification decisions when the key regions identified by the XAI methods are excluded. Let f(z) represents the function used by the deep learning model to produce the classification outcome for an input image z, referred to as the decision function. The Decision Impact Ratio (DIR) is evaluated as:

$$DIR = \frac{1}{N} \sum_{j=1}^{N} \mathbb{1}_{\left(f\left(z_{j}\right) \neq f\left(z_{j} – r_{j}\right)\right)}$$
(6)
where \(\mathbb{1}_{\left(f\left(z_{j}\right) \neq f\left(z_{j} – r_{j}\right)\right)}\)is an indicator function that equals 1 if the decision changes when the critical region rj is omitted from the image \(\:{\text{z}}_{\text{j}}\), and 0 otherwise.

Impact on Confidence: This metric assesses the decrease in confidence scores when the key regions recognized by the XAI methods are omitted. Let γ(z) denote the confidence function of the deep learning model that calculates the probability of classification for an input image z, known as the confidence function. The Confidence Impact Ratio (CIR) is given by:

$$\:\:\:\:\:\:\:\:\:\:\:\:\:\:CIR=\frac{1}{N}\sum\:_{j=1}^{N}{\text{max}\left(\gamma\:\left({z}_{j}\right)-\gamma\:\left({z}_{j}-{r}_{j}\right),0\right)}$$
(7)
where \(\:\gamma\:\left({z}_{j}\right)\:\)represents the confidence score for the j-th image, and \(\:\gamma\:\left({z}_{j}-{r}_{j}\right)\) represents the confidence score when the critical region \(\:{r}_{j}\) is excluded.Model tuning and hyperparameter optimizationTo make the pre-trained models work well with both CT and X-ray image datasets, we adjusted the top layer(fully connected layers or classification heads) by adding more layers. We designed the network architecture with several layers to improve the training process. These additional layers are displayed in Table 4. To achieve optimal results, the experiment utilized a learning rate and a batch size of 0.003 and 32 respectively, both of which were determined to be the most effective hyperparameters. Table 5 provides an overview of the hyperparameters used in this research for the pre-trained models. These parameters were chosen as the most effective settings for all models. Although the number of training epochs varied—100 for the CT scan images and 10 for the X-ray images—the other hyperparameter settings were uniform across both datasets. The training process aimed to minimize the loss function based on categorical cross-entropy, with optimization performed using the Adam algorithm.

Hot Topics

Related Articles