Finite element models with automatic computed tomography bone segmentation for failure load computation

In the following section, we detail the different datasets used for the two segmentation tasks and the simulations, as well as the simulation parameters and software used for failure load assessment. We also explain the pre-processing pipeline developed to be applied prior to the training phase of the neural networks. We describe our approach for simulation comparisons with manual and automatic segmentations.The study protocol was approved by the French Ethics Committee (CPP SUD-EST 1 France) under registration number ID-RCB: 2019-A01202-55. All procedures have been conducted in compliance with national and European regulations. All included patients received clear information and provided written consent, and informed consent was obtained from all subjects and/or their legal guardian(s).DatasetsWe use two datasets for bone segmentation: one publicly available for the vertebrae and one from the project for the femurs. Existing datasets with CT-scans of femur along with manual segmentation are either not available or lack the accuracy required for the training of robust models, especially on the femoral head. For the femur segmentation task, MEKANOS cohort was used (Hospices civils Lyon, agreement number N. 21 5467, May 28th, 2021, Supplementary Table S5). This cohort consists of eleven in-vivo CT-scans of hips, where both femurs are present. Those scans were acquired in clinical routine following a specific procedure (constant table height, quality phantom QA Mindways, 120 kV, 270 mAs, 1 Pitch, Field of view 360 mm and 200 mm, reconstruction: standard filter B, 512 × 512 matrix, slice thickness 0.7 mm) and with three manufacturers acquisition systems (General Electric, Philips and Siemens). A few femurs have metastatic osteolytic lesions which complicates the segmentation task but allows trained models to segment metastatic bones more efficiently, which is important for our study. From this database, eighteen femurs were manually segmented (4 femurs were not available).As a secondary test dataset to better assess the robustness of the chosen trained model, additional femurs (n = 16, 9 patients) from four different centers were added from MEKANOS cohort a posteriori, along with manual segmentations. Inter-operator variability was also measured using 6 ex-vivo femurs (Supplementary Table S4). On those femurs, the impact of inter-operator variability on failure load is illustrated in Supplementary Fig. S2).For the vertebrae segmentation task, we used two publicly available datasets: VerSe2019 and VerSe202029. Those datasets contain 374 CT-scans of various sizes all with manual segmentation. The number of vertebrae on each scan ranges from 3 to 25, with all types of vertebrae present in the dataset. In this study, we retained 363 patients, excluding those with additional transitional T13 scanned. Figure 3 shows examples of axial, coronal and sagittal slices taken from those datasets. The data used for simulation tests consists of 12 femurs from 6 patients (6 scans), among which 6 are healthy and 6 with metastasis, as well as 15 vertebrae from 2 patients (2 scans) among which 13 are healthy and 2 with metastasis (1 thoracic and 1 lumbar), all taken from the MEKANOS database.Figure 3Example of CT data used for vertebrae and femurs segmentation.Simulation pipelineFor both vertebrae and femurs failure load simulations, a custom finite elements simulation pipeline, described in Fig. 4 was used with dedicated parameters in order to compute the failure load of the considered bones. When working on femurs, we used a published model from Sas et al. based on voxel-based hexahedral meshes10. These meshes were either obtained using a manual CT-segmentation using the software 3D Slicer, or using an automatic segmentation from a neural network described in part D. The intensity values in the CT-scan were converted to bone density using the calibration phantom included in the acquisition10. Each element of the mesh was attributed a bone density corresponding to the voxels bone densities. The bone density of each element was then used to compute the mechanical parameters of the non linear constitutive law10 (cf. Supplementary Fig. S1).Figure 4Simulation pipeline for fracture risk assessment.The grey density values were converted to Young’s modulus using the calibration phantom included in the acquisitions10:$$E\left( {MPa} \right) = 14900\rho QCT^{1.86}$$
(1)
Using Ansys software (version 2021 R1), an axial compression was applied on the femurs, mimicking a standing position. An incremental displacement was applied on the top nodes of the femoral head (quasi-static) until it reached a maximal displacement. The nodes at the distal end of the diaphysis were constrained by a null displacement. The failure is defined as the maximum load occurring during the simulation10.For vertebrae simulations, a quadratic tetrahedral mesh (10 nodes) with a volume of element of 1 mm3 was used based on experimental data from33, and for the numerical model, we used the elasticity-density relationship from34 using the calibration phantom:$$E\left( {MPa} \right) = 3230\rho QCT – 34.7$$
(2)
We used a linear elastic—perfectly plastic constitutive law, with a yield strain of 1.5% strain35. The failure criteria consisted in considering a strain of 1.9% of the total vertebral height reduction36. All the simulations were also run using Ansys software.Pre-processing pipelineFor femur segmentation, we propose a fully automated segmentation method with a pre-processing pipeline as illustrated in Fig. 5 to facilitate the deep learning training. Our dataset contains only few annotated data, and dedicated pre-processing is important to ensure the robustness of the proposed models.Figure 5Segmentation pipeline used for femur segmentation.The pre-processing pipeline is made of several steps: after selection and manual expert annotation of the femurs, the volumes are all resampled to the median voxel size (0.78 × 0.78 × 0.67 mm), then cropped when both femurs are present in order to separate them into two distinct volumes. The split left femurs are then flipped to obtain a comparable orientation for all volumes. To further increase the homogeneity of the dataset, especially the spatial orientation of the femurs, the flipped left femurs and right femurs are registered together, using affine transforms. Adding the co-registration step between femurs makes the global spatial orientation more similar between femurs. Co-registration ensures the robustness of the network despite few training data, while preserving the automatic aspect of our pipeline. The resulting volumes are then all normalized before being used as input of the convolutional neural network.In addition to ensuring the proper training of the neural network, the pre-processing pipeline allows to increase the size of the dataset, thanks to the splitting of the initial volumes.Neural networksWe used several convolutional neural networks all based on the U-Net architecture31. We implemented a 2D multi-planar U-Net, as well as a 3D U-Net for femur segmentation. We compared the results with nnUNet, the state-of-the-art convolutional neural network for medical image segmentation32.Three 2D-UNet were trained on axial, coronal and sagittal slices for 500 epochs. The 3 resulting segmentations were then fused using majority voting. The 3D U-Net model was trained using random patches of size 64 × 64 × 64 for 300 epochs. Data augmentation, such as random rotations, translations, shearing and scaling was used on-the-fly to prevent overfitting. All custom UNets were trained using Adam optimizer (β1 = 0.9, β2 = 0.999) and a DICE loss, with a learning rate α = 2 × 10−4 and a batch size of 16 for 2D U-Net, α = 3 × 10−5 and a batch size of 4 for 3D U-Net. We also added morphological post-processing operations based on binary dilation and erosion to remove small unwanted islands and improve segmentation results.The nnUNet architecture used is the ‘3d fullres’, with patch sizes automatically selected (238 × 196 × 208 for femur segmentation and 205 × 205 × 205 for vertebrae segmentation) and default parameters, and was trained for 1000 epochs. The optimizer used is stochastic gradient descent with an initial learning rate of 0.01. The batch size was set to 2 for both trainings. We only used this architecture for vertebrae segmentation as the results for femur segmentation were comparable to our custom 3D U-Net and the amount of training data was sufficient to avoid the need for dedicated pre-processing, and the only pre-treatments operations were automatically made with nnUNet.The models were trained on a Nvidia P100 GPU with 16 GB VRAM. The total training time was 12 h for 2D U-Net per axis, 16 h for 3D U-Net and 48 h for nnUNet on the femur dataset. This substantial difference is also present during inference, where nnUNet takes up to 30 min for a prediction when standard models only take up to 3 min.Segmentations and simulations comparisonTo quantify the segmentation results, we used the Sørensen-Dice score (noted DSC) to evaluate the similarity between the ground-truth and the automatic segmentations ([0;1] where 1 is the best), as well as the Hausdorff distance (noted HD) to evaluate the maximum errors of the automatic segmentations (in mm, smaller the better). All metrics are computed on 3D volumes. We used a fivefold cross-validation to quantify more accurately the results. Among the 18 available femurs, 12 were used for training, 4 for validation and 2 for testing. For vertebrae segmentation, 242 scans were used for training, 61 for validation and 60 for testing.To compare the influence of the segmentation on the failure load simulations, we computed the failure load on 12 femurs, using automatic segmentations and using expert manual annotation for comparison. We also compared results using automatic and manual segmentation on 15 thoracic and lumbar vertebrae. In both cases, we also applied simple morphological operations (dilation/erosion), with either one or two iterations to the automatic segmentation as a way to introduce variability to the automatic segmentations. The objective is to investigate the effect of slight segmentation variations on the resulting failure load.Statistical analysisStatistical tests were performed using SPSS software (SAS Institute, Cary, NC). Differences among groups were evaluated using non-parametric test (Friedman test). When a significant overall F value (P < 0.05) was present, differences between individual group means were tested using Dunn’s multiple comparison post-hoc tests. Only comparisons with the manual segmentation are presented. For all tests, P < 0.05 was considered statistically significant. Data are presented as mean ± standard error.

Hot Topics

Related Articles