Prospective clinical evaluation of deep learning for ultrasonographic screening of abdominal aortic aneurysms

Study design and ethical approvalThis prospective evaluation study was conducted at the Kaohsiung Chang Gung Memorial Hospital in Taiwan between June and August 2023. Study approval was granted by the institutional review board at the hospital (IRB number: 202102311B0), and written consent was obtained from each participant.Data collection and preprocessingFor developing the DL model, we collected ultrasound images from the ultrasound machines, include Sonosite Edge II and Hitachi Noblus, in the ED of Kaohsiung Chang Gung Hospital from January 2019 to December 2021. Ultrasound images focusing on the abdominal area were collected. Images that were not related to abdominal aorta examination were excluded, resulting in a dataset comprising 2101 labeled ultrasound images. This dataset was split into an 8:2 ratio for training and validation sets. We also collected 492 ultrasound images from a regional hospital for external validation.As the original ultrasound images vary in size, we opted for the ‘letterbox’ preprocessing method to standardize them to the model’s required dimensions. Letterboxing scales the image while maintaining the original aspect ratio; any remaining space after scaling is filled with the background, mimicking the effect of placing a picture into an envelope, hence the term ‘letterbox’. Each ultrasound image was resized to 600 × 400 and get rid of the information that may reveal personal identification. Two medical experts on the point of care ultrasound then manually labeled the selected anatomical structures—the aorta, inferior vena cava (IVC), and spine (Supplemental Fig. 1) with a polygon mask. We adopted the commonly used labeling software, Labelme25, for this study and saved the labeling file under COCO dataset format. This large, annotated dataset served as the foundation for training the AI models, enabling them to recognize and correctly identify these structures in ultrasound images.Development of the DL modelWe developed the DL model to offer real-time, continuous guidance during scanning to assist users in obtaining videos for AAA screening. The model, which emulates physician expertise, utilizes a You Only Look Once (YOLO) architecture, known for its real-time object detection capabilities. It is specifically tailored to analyze ultrasonographic images, focusing on identifying anatomical structures, including the abdominal aorta, spine, and inferior vena cava. The architecture employed in our study is YOLOv5 instance segmentation. The input is an ultrasound image, and the inference output includes bounding boxes and pixel area identification for each category (abdominal aorta, spine, and inferior vena cava). As the AI application scenario in this project is divided into two parts—real-time Aorta recognition for guidance and post-scan calculation of the maximum Aorta diameter—the former requires a faster model, while the latter requires a more accurate model. Therefore, among all YOLOv5 architectures, we trained YOLOv5s (225 layers, 7.4 million parameters, and approximately 26 GFLOPs of computation) for real-time guidance and YOLOv5m (367 layers, 21.2 million parameters, and approximately 73 GFLOPs of computation) for calculating the maximum Aorta diameter.The training was conducted on an NVIDIA 3090 24GB GPU, utilizing the PyTorch framework. The training process we employed involved transfer learning from YOLOv5 pretrained on the COCO dataset, given that the ultrasound training data is relatively limited. We used the SGD optimizer with a batch size of 16 and a learning rate of 0.01 and trained the model for up to 100 epochs with early stopping if no improvement in validation performance was observed for 5 epochs. The result of validation is shown in Supplemental Table 1.Integrated DL model with POCUSThe POCUS equipment used in this study is the ArtUs-EXT-1H from Telemed, an FDA-certified platform for capturing raw ultrasound signals. It can be used in conjunction with a portable tablet computer. The tablet runs on a Windows 11 environment and uses an Intel CPU (detailed specifications are listed below). To accelerate the inference speed of the deployed model, the trained YOLOv5 model was converted from the PyTorch (.pt) format to the OpenVINO IR (FP16) format and utilized via OpenVINO Runtime. We used Python’s built-in ctypes library to load the dll (dynamic link library) provided by Telemed, allowing real-time ultrasound images to be captured within the Python program. The program also allows for model inference and uses OpenCV to visually represent the identified Aorta. After integration, the software monitors and processes the ultrasound display continuously through its application programming interface (Supplemental Fig. 2, Supplemental video 1). Additionally, to accommodate for the need to adjust parameters such as TGC (time gain compensation) during scanning, a control panel interface is displayed using Tkinter for the operator to adjust as necessary.Prospective study designPatients at least 65 years old visiting the outpatient clinic of the Cardiology department at the studied hospital were recruited between June and August 2023. Individuals were excluded if they were unable to lie flat or were unable or unwilling to provide informed consent.Ten registered nurses without prior experience performing or interpreting ultrasonography were recruited for the trial from hospital personnel. Each nurse underwent a 15-min tutorial to familiarize him- or herself with the POCUS machine and DL guidance. Before undertaking the study, each nurse performed 1 practice scan on volunteer models to familiar with the software’s user interface. They were instructed to acquire a 10-s standard abdominal aorta tracing video under DL guidance. For control, a duplicate scan was obtained by a physician using the same POCUS machine on the same day but without AI guidance. The physician also labeled the maximal width of the aorta for the control scan. The nurse scans were conducted independently, solely with DL guidance, and always preceded the control scans. Following each scan, the Telemed POCUS machine stored two ultrasonography videos at 20 frames per second, which the DL system then processed to predict the maximal width of the abdominal aorta. Fig. 1 illustrates the study design.Upon completion of all study and control scans, a panel of three expert physicians (Y.-C.Z, X.-H.L., and F.-J.C.) independently and blinded to whether a nurse or a physician performed the study, assessed whether each scan was of diagnostic quality, served as the primary endpoint. For cases with discrepancies, the majority rule was applied directly, where the judgment agreed upon by at least two experts was taken as the gold standard. All expert readers were certified board physicians in Cardiology or Emergency Medicine. The time to complete the study, defined as the interval from placing the probe on the patient’s abdomen to completing the scan, was recorded. The maximal aortic width predicted by the DL model was compared with expert measurements for the secondary endpoint.Statistical analysisThe study sought to evaluate the performance of nurses conducting AAA screening under DL guidance, with continuous variables reported as medians and interquartile ranges (IQR), and categorical variables as numbers and percentages. The proportion of qualified studies, as judged by the expert panel, was compared between DL guidance and physician scans for the primary endpoints using the non-inferiority test. This test provided a p-value to assess whether the DL-guided scans were not inferior to physician scans within a pre-specified margin. The maximal abdominal aortic width measurement and the time required to complete the study were evaluated as secondary endpoints and were compared using the Mann–Whitney U test. For both primary and secondary parameters, the proportion judged clinically evaluable is reported with 95% confidence intervals (CIs).To measure the maximal width of the aorta from ultrasound video frames, we utilize the YOLOv5m architecture for bounding box prediction. Each video is 10 s in length with a frame rate of 20 frames per second (fps), resulting in a total of 200 frames per video. The model outputs bounding boxes, confidence scores, and class labels for detected objects in each frame. We adopted postprocessing steps, including non-maximum suppression26, to filter out overlapping bounding boxes, ensuring that only the most confident predictions for the abdominal aorta are retained. We then extracted the four coordinates of the bounding box, which represent the top-left and bottom-right corners of the box enclosing the aorta. To calculate the width of the aorta, we averaged the differences between the coordinates on the vertical and horizontal axes. This process loops through each frame of the video, and the highest width, along with the corresponding image frame, is stored for human expert inspection.

Hot Topics

Related Articles