Detection of diffusely abnormal white matter in multiple sclerosis on multiparametric brain MRI using semi-supervised deep learning

Image datasetThe data used in this study included multiparametric MRI data accessed from the CombiRx Phase-III multi-site clinical trial (identifier: NCT00211887) on relapsing remitting MS (RRMS) patients30. MRI scans were acquired across different scanner models and vendors (General Electric Healthcare, Milwaukee, USA; Philips Healthcare, Best, the Netherlands; Siemens Healthineers, Erlangen, Germany) at both 1.5-T (85%) and 3-T (15%) field strengths. A total of 1006 patients were enrolled at baseline and scanned using a standard MRI protocol including 2D FLAIR and 2D dual-echo turbo spin echo images (yielding proton density weighted and T2w images of 0.94 mm × 0.94 mm × 3 mm voxel dimensions), and pre- and post-contrast T1-weighted images with geometry identical to the FLAIR and dual-echo images. In addition, 3D T1-weighted images were also acquired. Baseline scans from the trial were used in this study.Our analysis of this anonymized dataset was approved by our Institutional Review Board (IRB). Across all sites within the CombiRx trial, patients were recruited and scanned with IRB approval and written consent was obtained from each patient. All research was performed in accordance with the Declaration of Helsinki.After image quality assessment31, all images were filtered using anisotropic diffusion to reduce noise32,33, and FLAIR and T1w images were registered with the T2w images. Skull stripping and bias field correction were performed, and image intensities were normalized34. Brain volumes were initially segmented into 4 tissue classes (NAGM, white matter including both NAWM and DAWM, cerebrospinal fluid or CSF, and T2L) using MRI automated processing (MRIAP)35, an automated segmentation pipeline that is based on both parametric and non-parametric techniques. Lesion masks were manually inspected and corrected by imaging experts.Expert evaluationTwo neuroimaging experts, a neuroradiologist and an MS neurologist, with 10+ and 15+ years of experience in analyzing MRI of MS, respectively (AK and JAL), independently segmented DAWM volumes from a total of 40 randomly chosen scans. Prior to manual segmentation, the two readers had practice sessions to identify the best MRI contrast types and window setting for DAWM visualization and manual segmentation of DAWM. The manual annotation process began with prior segmentations of the brain and MS lesions as the starting point. Using the ITK-Snap software (version 3.8.0, www.itksnap.org) voxels were reclassified as DAWM based on their appearance on the FLAIR and T2w images (Fig. 1).Figure 1Examples of DAWM presentation on FLAIR and T2-weighted images in three MS patients. On zoomed FLAIR images, focal lesions are labeled with red arrows, and DAWM is labeled with yellow arrows. A consensus segmentation by two expert physicians (five tissues—NAGM = grey, NAWM = white, CSF = teal, T2L = salmon pink, DAWM = sky blue) is shown in the middle column.Special care was taken in areas with possible ambiguity between DAWM and other tissues. Delineation of DAWM from adjacent gray matter structures is especially challenging due to overlap in signal intensity such as in white matter lying along the superior margins of the basal ganglia or in the cortex. Segmentation in these regions was avoided, and the central deep white matter of the brain was taken as the focus of this study. The signal intensity of the large ascending/descending white matter bundles of the brain may also mimic the DAWM signal intensity, such as in the white matter of the corona radiata. To avoid this mimic, only areas with discrete signal changes from the adjacent white matter bundles were segmented.After initial independent segmentation, the two readers generated a set of consensus segmentations using their previous independent segmentation as a prior. Maps displaying overlap and disagreement of DAWM segmentations were generated, and regions of disagreement were reviewed and either eliminated or accepted to create consensus segmentation maps.Of the 40 cases assessed by the two readers, 15 were randomly selected for fine-tuning the segmentation model (described in the next section) and the remaining 25 were withheld as an independent test set.DAWM-Net segmentationFigure 2 provides a flowchart summarizing the DAWM segmentation process, which consists of three stages. In this study, 2D U-Nets were used as the DL segmentation model in all experiments.Figure 2Segmentation approach for DAWM. (Stage 1) Development of heuristic DAWM segmentation based on class score thresholds from a 2D U-Net trained for 4-class segmentation, (Stage 2) generation of imperfect DAWM segmentations for weakly-supervised training of a 2D U-Net, and (Stage 3) fine-tuning on a small set of reader cases to create a final DAWM-Net. Segmentations by algorithm and readers (dark blue) were compared on a withheld test set of 25 patients. Segmentations from an automated intensity thresholding method22 were also tested and their performance was compared to DAWM-Net.In the first development stage, the heuristic U-Net-based segmentation algorithm was developed based on the observations that DAWM has signal intensity (1) intermediate to NAWM and T2L and (2) similar to that of NAGM. First, a 4-class (4 tissues and background) U-Net was trained for brain segmentation based on the validated MRIAP segmentation on the CombiRx anonymized imaging dataset. Based on the histogram of the U-Net class scores, heuristic thresholds were identified such that voxels meeting observations (1) and (2) were reclassified as DAWM36. Additional information on this technique can be found in the Supplementary Material.In the second stage, the brain segmentation maps, now augmented with the heuristic DAWM segmentation, were used as the target labels to train another 5-class U-Net with weak supervision, with a similar architecture to that of the 4-class U-Net.In the third stage, a subset of 15 cases with corresponding consensus segmentations by the two expert readers were used for further fine-tuning of the U-Net (10 cases for training, 5 cases for validation). This resulted in the final trained network referred to as DAWM-Net, which segments the brain into background and 5 tissue classes: NAGM, NAWM, CSF, T2L, and DAWM.To assess the contribution of different image contrasts (FLAIR, T2w, T1w, or proton-density weighted) to the segmentation performance, a sensitivity analysis was performed. In this experiment, a separate set of models were trained with omission of one image contrast and their segmentation performance was evaluated on the test set.Finally, an intensity thresholding method for automated DAWM and T2L segmentation was applied to the test set for comparison22. In that method, an initial tissue segmentation of the brain was performed, and DAWM and focal lesions were detected using two intensity thresholds based on the mode of the NAWM signal on the T2w images. Our implementation of this technique is described in the Supplementary Materials.Statistical analysisDAWM volumes were extracted, and the mean and standard deviation were calculated across the 25 patients in the held-out test set. Inter-reader agreement was assessed using the Dice Similarity Coefficient (DSC) and Spearman correlation of DAWM volumes. Segmentation performance of the proposed DAWM-Net and the automated intensity thresholding method was assessed against consensus segmentations using DSC and Spearman correlation. DSC scores were compared between segmentation methods using a paired t-test. Finally, Bland–Altman analysis was used to determine bias and the 95% confidence limits of agreement (95% LOA) between the segmentation methods37. Descriptive statistics, t-tests, and correlations were calculated using the Python NumPy module (version 1.19.5). Two-sided p values below 0.05 were considered statistically significant.

Hot Topics

Related Articles