Pretrained subtraction and segmentation model for coronary angiograms

After training the model, it is further fine-tuned on the FS-CAD or XCAD. The former is used to produce the best-performing model for the fine-grained segmentation of small vessels; this model is referred to as the FS-Model. The latter serves as the basis for quantitative comparisons with models trained using other methods and is denoted as the XCAD-Model. Due to the limited sample size of the FS-CAD or XCAD dataset, we employ previously described data augmentation strategies to mitigate overfitting during fine-tuning. The number of epochs used for fine-tuning is set to 100. For additional details regarding the training and testing procedures, readers are directed to our source code repository.Deep subtraction resultsIt’s important to note that our ‘deep subtraction’ method differs fundamentally from traditional DSA. While we use the term ‘subtraction’ for ease of comparison, our method does not perform pixel-wise subtraction between two frames. Instead, our generator \({G}_{xy}\) learns to produce a ‘virtual mask’ for any given input frame, effectively addressing the challenges posed by cardiac motion and other dynamic factors in angiographic imaging.Both PT-Model and FS-Model generate what we term “deep subtraction” outputs. These processed images exhibit subtracted angiograms in which the majority of nonvascular tissue is effectively eliminated. Given the absence of any quantitative standards for assessing subtraction techniques, Fig. 2 serves as a visual comparative analysis between deep subtraction and DSA.Fig. 2Comparison between deep subtraction and DSA. The first column shows the original images, while the second column displays the results obtained through DSA. The third and fourth columns present the effects achieved using deep subtraction.The second column of Fig. 2 displays the results achieved using DSA, a technique that relies on the identification of a mask frame from a given continuous video sequence for subtraction. While DSA can successfully remove static anatomical structures such as the ribs and vertebrae, it falters in terms of addressing artefacts induced by motion, particularly those originating from cardiopulmonary activities. Such artefacts become markedly visible in areas including lung markings and the diaphragm.In sharp contrast, the third and fourth columns highlight the advantages of deep subtraction. This innovative approach eliminates the necessity of utilizing a predetermined mask, leveraging I2I translation to implicitly generate a corresponding mask for each frame. Deep subtraction overcomes the limitations of DSA. One primary issue with DSA is its inability to use an optimal mask for each frame. This leads to an incomplete background removal effect. This shortcoming has fostered a certain hesitancy among cardiovascular physicians to fully embrace DSA. Deep subtraction excels at more cleanly removing the background, attaining a performance level previously only observed in anatomically stable regions, such as in cerebral angiograms. Moreover, the fine-tuned FS-Model demonstrates superior subtraction outcomes compared to those of the PT-Model, as evidenced by its clearer displays of small blood vessels and more comprehensive removal of catheters. In summary, both models markedly outperform DSA in coronary angiography.Segmentation resultsEvaluation metricsCommon metrics employed for evaluating medical image segmentation performance include the pixel accuracy (PA), intersection over union (IoU), and Dice coefficient. The PA quantifies the proportion of correctly classified pixels within an image. However, its reliability can be compromised in cases with class imbalance; for instance, in our dataset, the background comprises a more significant portion of the data, thereby disproportionately influencing the PA score. The IoU is a prevalent metric in the realm of semantic segmentation. It measures the area of overlap between the ground truth and the predicted segmentation outcome, normalized by the area of their union. Due to its straightforwardness and efficacy, the IoU is widely utilized. The Dice score is another related metric that is calculated as twice the area of overlap divided by the total number of pixels in both the segmented and ground-truth images. While IoU and Dice scores are closely related, we present both to facilitate comparisons across medical and computer vision domains. For primary analysis, readers may focus on the Dice score, which is more commonly used in medical image segmentation.Evaluation resultsAfter applying thresholding to the outputs of the FS-Model, we obtain vessel segmentation images. Figure 3 shows the qualitative metrics, demonstrating the efficacy of our deep subtraction and segmentation algorithms on the test set of the LM-CAD dataset. Each column in the figure constitutes a sample organized as “original image”- “deep subtraction”- “egmentation.” The segmentation results clearly indicate that not only the primary branches of the coronary artery but also their secondary and tertiary vessels are precisely segmented. Furthermore, pathological alterations such as stenoses are effectively preserved in the segmentation outputs.Fig. 3The effects of deep subtraction and vessel segmentation. The first row contains the original images, the second row includes the deep subtraction images, and the third row provides the semantic vessel segmentation images. Both the deep subtraction and segmentation results enable enhanced visualizations of pathological alterations such as stenoses.Given the limited sample size of the FS-CAD dataset, we resort to using a fivefold cross-validation strategy for performance evaluation purposes (Table 1). Specifically, in this fivefold cross-validation scheme, the dataset is partitioned into five equal subsets. One of these subsets is held out for testing, while the remaining four subsets are utilized for training. This process is iterated five times, with a different subset serving as the test set each time. The final performance metric is the average of the five individual test results. Our FS-Model achieves a Dice score of 0.828, further corroborating its robust segmentation capabilities. In comparison, the PT-Model, which is not fine-tuned on the FSCAD dataset, achieves a respectable Dice coefficient of 0.792 through the utilization of the AutoThresh method. A baseline U-Net model (with the same network structure but random initialization), which employs the same architectural underpinnings as those of the PT-Model but is trained exclusively on the FS-CAD dataset, records a significantly lower Dice score of 0.657. This underscores the utility of pretraining: the PT-Model already captures a majority of the vascular features and requires only minimal fine-tuning on a small sample set to effectively adapt to a specific task.Table 1 Model performance achieved on the FS-CAD dataset.Considering that the ground-truth annotations for the vessels in the FS-CAD dataset are nearly within the limits of human visual discernment, the high Dice score achieved by the FS-Model attests to its ability to effectively segment even the most diminutive vascular structures.Validation on the XCAD datasetTo further substantiate the advantages of pretraining, we extend our experimentation to the XCAD dataset, which is a publicly available coronary vessel segmentation dataset comprising 126 images with human-annotated vessel boundaries. Unlike in the FS-CAD dataset, in the XCAD dataset, the groundtruth annotations are primarily focused on larger vessels. Given that the FS-Model is specifically designed for comprehensive vessel detection, directly comparing its performance on the XCAD dataset, which emphasizes larger vessels, may not yield a fair assessment. However, the PT-Model is designed to learn generalized feature representations of coronary arteries, making it adaptable for use in various downstream tasks. To adapt our PT-Model to the specific characteristics of the XCAD dataset, we fine-tune it, resulting in what we refer to as the XCAD-Model. Based on the original authors’ training and evaluation protocols, we employ threefold cross-validation. The resulting scores are documented in Table 2. The data for the other methods and models are extracted directly from the work of19. The comparisons presented in this section, particularly in Tables 1 and 2, are designed to demonstrate the effectiveness of our approach in scenarios with limited annotated data. While traditional methods might achieve better results with larger annotated datasets, our goal is to show that good performance can be achieved with minimal manual annotation by leveraging pretraining on unannotated data. This approach addresses the common challenge of limited annotated data in medical imaging.Table 2 Model performance achieved on the XCAD dataset.To obtain supervised learning scores, we conduct a threefold cross-validation evaluation on theXCAD dataset. Domain adaptation methods, such as MMD/citepbermudez2018domain and YNet27, transfer knowledge from annotated datasets in the source domain to unannotated datasets in the target domain. For unsupervised learning, IIC28 is a clustering-based method, while another method, named ReDO28, utilizes an adversarial architecture to extract the object mask of the input. Self-supervised vessel segmentation (SSVS)19, proposed by Ma, employs adversarial learning to acquire vascular representations of unlabelled samples and includes a fractal synthetic module to generate synthetic vessels. SSVS was previously the best-performing unsupervised method on XCAD but fails to surpass the performance of supervised methods.Our tests show that the XCAD-Model achieves the highest Dice score of 0.755, surpassing the solely supervised learning methods. The high Dice score produced by the XCAD-Model aligns with our expectations, as it is fine-tuned based on the PT-Model that had already learned vascular features through cycle-consistent training. Intriguingly, the PT-Model, which is never trained on the XCAD dataset, still achieves a Dice coefficient of 0.715 after the AutoThresh method is implemented. This result slightly lags behind the performance achieved through supervised learning but confirms the robust generalization ability of the PT-Model and establishes it as the best-performing unsupervised learning method, significantly surpassing SSVS. In contrast, alternative methods such as MMD, YNet, IIC, and ReDO register Dice scores below 0.6, revealing a substantial performance gap between them and supervised learning.Figure 4 shows a visual representation of the segmentation results produced on the XCAD dataset. Compared to the ground truth, all the models display marginal differences when segmenting larger vessels; these disparities lie mainly in the identification of secondary and tertiary vessels as well as conduits. Remarkably, the XCAD-Model outperforms solely supervised learning methods in terms of recognizing conduits. The PT-Model, which has never been trained on this specific dataset, also approximates the performance of supervised learning methods and demonstrably outperforms the previously best unsupervised or self-supervised method, i.e., SSVS.Fig. 4Visualization vessel segmentation results produced on the XCAD dataset.Stenosis detectionIn addition, we develop an additional task with potential clinical value. On the basis of the PTModel, we fine-tune a network that is capable of identifying the vascular stenosis locations within coronary angiography images, which is referred to as SDNet. More precisely, we select 60 coronary angiography images with stenosis and manually annotate the stenosis sites. A subset of 10 images is designated as a test set, while the remaining images are allocated for training and validation purposes. Figure 5 illustrates the efficacy of SDNet in terms of detecting coronary stenosis within the test set. Despite the limited number of available training samples, SDNet proficiently identifies stenosis sites, attaining a Dice coefficient of 0.56 on the test set. For comparison, a U-Net model trained from scratch (without pretraining) yields a slower convergence rate and inferior performance, as evidenced by its Dice coefficient of merely 0.35. This outcome underscores the utility of the PT-Model as an exceptionally robust pretrained model. Comprehensive details concerning the training process and additional results acquired from this experiment are provided in the Supplementary materials F.Fig. 5This figure demonstrates the performance achieved by the vascular stenosis detection network (SDNet) on the test dataset. The first row shows the original images, while the second row presents the manual annotations (highlighted in white). The third row displays the detection outcomes of SDNet. Instead of a binary representation, the results are visualized using a pseudocolour scale. On this scale, the intensity of the red colour indicates an increasing likelihood of a pixel being part of a vascular stenosis site.

Hot Topics

Related Articles