Improving prognostic accuracy in lung transplantation using unique features of isolated human lung radiographs

Study cohortClinical EVLP cases performed from 2008 to 2022 at Toronto General Hospital were considered in this study. All bilateral EVLP cases that had both 1 h and 3 h radiographs were included, yielding a total of n = 650 EVLP cases and n = 1300 radiographs. The cohort was split in an 80:20 ratio temporally into the training set (n = 520) and the validation set (n = 130), where the training set included cases performed between 2008 and 2020 and the validation set included cases from 2020 to 2022.Inclusion and ethicsThis study included all EVLP cases performed at our institution from 2008 to 2022. In accordance with the Declaration of Helsinki, University Health Network (UHN) Research Ethics Board (REB) and institutional approval was obtained for the collection, storage, and analyses of the biospecimens and data used in this study (UHN REB#12-5488-13 and UHN REB#11-0170-AE); informed patient consent was obtained from study participants.Data collection and storageThe Toronto EVLP protocol has been previously reported4. Physiological and biological assessments were conducted using an ICU-grade ventilator, perfusate samples from the EVLP circuit, and pressure monitors. Radiographs were taken every 2 h15. Tabular data was stored within the Toronto Lung Transplant Program database, and radiographs on the UHN Picture Archiving and Communication System (PACS).Image processing pipelineThe computational pipeline consisted of two main stages – pretraining and finetuning. Established CNN architectures including ResNet-5020, ResNeXt-5021, RexNet-10022, EfficientNet-B223, EfficientNet-B323, and DenseNet-12124 were used in both stages. The pretrain stage was performed on the PyTorch Image Models library25 available on GitHub. The CNN architectures were pretrained separately using two public chest X-ray datasets from the National Institutes of Health: one from the entire ChestX-ray14 dataset (n = 112,120) expanded from the ChestX-ray8 dataset26, and the other one that is a subset of the ChestX-ray14 dataset with cleaner labels (n = 10,000), published by the CADLab researchers27.The finetune stage was developed using the PyTorch and Lightning libraries on Python (Version 3·10), which is composed of four modules: data loader, model training, on-the-fly validation, as well as inference. The model weights from the pretrained CNNs were used to initialize and further finetune the models using EVLP radiographs. The models were separately trained to classify donor lung outcomes as three classes: (1) transplanted lungs with recipient mechanical ventilation <72 h, (2) transplanted lungs with recipient mechanical ventilation ≥72 h, and (3) lungs deemed unsuitable for transplant.The model performance was evaluated on the validation set using accuracy and area under the receiver operating characteristic (AUROC) curve. In the finetuning stage, the pipeline was designed to process 1 h and 3 h images separately from a given EVLP case, and then perform one single classification using the concatenated latent features from both images. An overview of this pipeline is shown in Fig. 3.Fig. 3: CNN-based image processing pipeline.1 h and 3 h X-ray images are simultaneously processed by convolutional layers, and one single classification is performed for both images of the same lungs. (CNN convolutional neural network).Class activation mappingAs a method for interpreting CNN classifications, gradient-weighted class activation (GradCAM)28 maps of the ResNet-50 model were generated using the pytorch-grad-cam library29 available on GitHub, as well as PyTorch Image Library and Seaborn packages. The last convolution layer in the third block of the ResNet-50 architecture was used to generate saliency maps. The activation saliencies were shown on the original images in full resolution.Manual scoring of radiographsAs described in the previous study15, the manual labeling of the X-ray images was derived from scoring radiographic consolidation, infiltrate, atelectasis, nodule, and interstitial line findings across six lung regions (right upper lobe, right middle lobe, right lower lobe, left upper lobe, lingula and left lower lobe). In this dataset, infiltrate was defined similar to ground glass opacity, as an area of abnormal increased density through which the underlying lung markings can still be observed. The total score of each finding across all lung regions was used in subsequent analyses.Principal component analysisResNet-50 and 115 cases in the validation cohort were used for the following analysis. A set of latent features that described the input images were extracted from the second to the last layer in the CNN, just before the classifier. PC analysis was performed to explain the latent features using ten vectors. The ten PCs from each EVLP case were correlated with donor information and EVLP parameters using Pearson correlations. Within the 115 cases, PCs from 38 cases were correlated with manually-derived radiographic labels from our previous study15 using Spearman correlations. The resulting heatmap was generated using the Corrplot package (version 0·92) in R.Extreme gradient boosting modelThe extreme gradient boosting (XGBoost) model30 is a state-of-the-art machine learning method for analyzing tabular data. The mechanism involves building an ensemble of decision trees and improving their overall performance by correcting errors from previous trees. In this study, two XGBoost were built using the validation cohort from the CNN image analysis (N = 115), which was then split 80:20 into training and validation sets. The first XGBoost model was trained in a similar fashion to InsighTx10, using EVLP physiological and biological parameters to classify the transplant endpoints described above. Another XGBoost model was trained on the same cohort with both EVLP tabular data and the ten PCs summarizing latent CNN features. The mean and standard deviation of the model performances were derived through bootstrapping. Statistical significance tests were conducted through one-tailed T-tests and Mann-Whitney U tests, using an alpha of 0.05. Model development and evaluation were performed through the Python Sci-kit Learn library.Reporting summaryFurther information on research design is available in the Nature Research Reporting Summary linked to this article.

Hot Topics

Related Articles