Deep learning-assisted segmentation of X-ray images for rapid and accurate assessment of foot arch morphology and plantar soft tissue thickness

To address the aforementioned challenges, our study amassed a substantial dataset of weight-bearing lateral foot X-ray images, a type for which there is currently no publicly available dataset. A set of 1497 images is retrospectively collected from the foot and ankle database of Huashan Hospital (Shanghai, China) spanning the last decade, with the personal info anonymized and ethic review approved. Utilizing deep learning image segmentation techniques, we preprocessed these images by adjusting grayscale, removing noise, and normalizing the images, enhancing the model’s robustness, stability, and accuracy29. We then trained a deep neural network to perform precise segmentation of the first metatarsal (FM), talus (TA), calcaneus (CA), navicular (NAVI) bones, as well as the overall foot boundary. This approach enabled automated, standardized, and batch processing for precise computations of FAM and PSTT, thereby yielding significant time and cost efficiencies.Our study focuses on analyzing the homogeneity and heterogeneity within large datasets, employing data-driven methods to identify patterns of similarity and dissimilarity across population groups. We specifically explored the correlation between FAM and PSTT among diverse demographic groups. Section “Methods” details the methodology, including data sourcing, dataset composition and preprocessing, development of deep learning image segmentation models, and evaluation metrics for FAM and PSTT. Section “Results” presents the results, elaborating on the performance of the segmentation models, data outcomes for FAM and PSTT, and the correlation analyses across different demographic groups. Section “Discussion” discusses the methodologies, results, and hypotheses, concluding with a summary and future outlook of the research. The overall study workflow is depicted in Fig. 1.Fig. 1The overall workflow of this study.Human ethical statementsWe confirm that all methods were carried out in accordance with relevant guidelines and regulations. We confirm that all experimental protocols were approved by The Ethical Review Committee of Huashan Hospital, Fudan University (HIRB). This is a retrospective study, all the images are provided anonymously, and this paper only reports general statistical results over the dataset, therefore the informed consent was waived. This waiver was approved by the Ethics Committee of Fudan University, ensuring compliance with ethical standards for the use of pre-existing data where participant identification is not disclosed.DatasetThe application of deep learning for image detection and segmentation requires a substantial dataset. Due to the unavailability of public X-ray image datasets, we undertook a retrospective data collection to facilitate efficient and cost-effective research. We compiled 1497 weight-bearing lateral full foot X-ray images from Huashan Hospital’s foot and ankle imaging database, spanning from 2013 to 2022, involving 1098 patients. The data, stored in DICOM file format30,31, includes demographic details such as sex, age, and imaging timestamps. All data samples were anonymized during processing and subsequent research phases to ensure privacy. Additionally, to account for the developmental stage of children’s skeletons, we excluded samples from individuals under the age of 14 years. The collection process also involved manual screening by foot and ankle surgeons to exclude images from patients with skeletal or soft tissue foot defects, a history of foot ulcers, neurological joint diseases, post-foot surgery conditions, and those unable to walk independently.The X-ray images were sourced from medical imaging devices produced by several manufacturers, including GE, Canon, Philips, CARESTREAM, and KODAK. These devices capture images with an average pixel spacing of approximately 0.14 mm/pixel. The X-rays are collected as grayscale images with a depth of 16 bits, and the resolution of these collected images ranges from 1010 to 4260 pixels in length (columns) and 965–4259 pixels in width (rows). For visualizing and processing the X-ray images, we employed the PyDicom library, a medical image processing tool, to parse DICOM files and convert the X-ray grayscale images into JPG format for easier handling32. Table 1 presents the basic information of these data samples. In this study, each X-ray image is treated as an individual data sample. This includes both left and right foot X-ray images of the same patient and multiple images taken from the same patient over the past decade, without filtering for duplicate individuals in the dataset.Table 1 Statistical information of data samples.Image preprocessingFrom the dataset, 220 images were randomly selected and divided into training, validation, and testing sets, with 180, 20, and 20 images respectively. Under the supervision of foot and ankle surgeons, these images were manually annotated for precise boundary delineation of the entire foot and the four bone structures: FM, TA, CA, and NAVI, using the LabelMe library. These annotations served as the ground truth for model training33. Once the model’s accuracy and generalization were confirmed, it was applied to all sampled images to automatically calculate metrics related to FAM and PSTT. This facilitated large-scale data analysis to investigate the factors influencing these measurements.Additionally, to enhance the robustness and generalization capability of the model, we employed the Contrast Limited Adaptive Histogram Equalization (CLAHE) algorithm for contrast enhancement34, and converted the 16-bit X-ray image into 8-bit images with sufficient contrast. Subsequently, image normalization is performed to reduce differences in brightness and contrast, mitigating the model’s susceptibility to outliers or extreme pixel values. Next, in order to reduce computational complexity and memory usage, we utilized the bilinear interpolation method35 for resizing the original images to a unified size of 384 × 576 pixels as input for the model, and ensured that the key semantic information in the images was preserved. Figure 2 illustrates an example weight-bearing lateral foot X-ray image.Fig. 2A weight-bearing lateral foot X-ray image and manually annotated ground truth: (a) the original grayscale image parsed from the DICOM file serves as the input to the model; (b) the manual labeling results using the Labelme library, stored as a JSON file; (c)–(g) the boundaries of the entire foot, FM, TA, CA and NAVI bones obtained from parsing the JSON file. Here, for visual clarity, these label boundaries are overlaid on the original image for visualization, though each labeled image is actually a binary black-and-white image; (h) visualization of various labeled images with different pixel values overlaid on one image.Deep learning image segmentation modelIn our selection of deep learning network models, for the task of calcaneus (CA) segmentation, we evaluated four widely used models in medical image segmentation: FCN36, U-Net37, SegNet38, and DeepLab V3+ 39. Due to the optimal performance of the DeepLab V3+ model, we chose it for automatic image segmentation. To enhance robustness and accuracy, we constructed five independent DeepLab V3+ segmentation models, one for the entire foot boundary and one for each of the four bone boundaries (FM, TA, CA, NAVI). Each model was trained separately to optimize parameters. The input image dimensions were standardized to 384 × 576 pixels, and outputs were binarized using the sigmoid function 40. To ensure reproducibility, all training runs were conducted with fixed seed settings. PyTorch was used for model construction and training, with parameters as follows: Adam optimizer41, learning rate is set to 10–4, batch size is set to 4, epoch is set to 20. The environment and versions are macOS Ventura 13.2.1, 4-cores CPU, 16 GB RAM, PyTorch version 1.8. For the loss function and evaluation metrics, we selected the Dice coefficient and Intersection over Union (IoU). The Dice coefficient is particularly sensitive to small targets, making it ideal for precise segmentation of smaller anatomical structures, while IoU is well-suited for large target detection and segmentation tasks. Therefore, we utilized Dice loss for training to optimize our model’s ability to detect small variations, and employed IoU as the evaluation metric to assess the overall accuracy and integrity of the segmentation across larger areas42.Additionally, in the test set, in rare extreme cases where X-ray images contained high-intensity artifacts, the model might misclassify noise and contamination during segmentation. Therefore, post-processing was applied to the segmentation masks using the DBSCAN algorithm for clustering43. This step retained the largest clustered area as the target region and set the values of smaller misclassified noise regions to 0, eliminating interference in subsequent tasks such as extracting bone axes and calculating PSTT.Calculation and evaluation of FAM and PSTT indicatorsIn this study, we focused on three primary descriptors of FAM as advised by foot and ankle surgeons: the angle between the axes of the first metatarsal and the talus (“angle-fm-ta”), the inclination of the calcaneus axis relative to the plantar surface (“angle-ca-plantar”), and the longitudinal arch height (LAH). Additionally, we measured PSTT at the forefoot and rearfoot regions.To calculate the “angle-fm-ta” and “angle-ca-plantar” in weight-bearing lateral foot X-ray images, we first applied the Principal Component Analysis (PCA) algorithm44 to determine the principal axes of the segmented FM, TA, and CA bones. We then calculated the angle between the principal axes of the FM and TA to determine the “angle-fm-ta.” This method mirrors the standardized manual angle measurements performed by surgeons using X-ray reading software, reducing subjective variability. The “angle-ca-plantar” was defined as the angle between the main axis of the CA (which derived by PCA) and the horizontal plane, as suggested by surgeons.For the calculation of LAH, we identified the center of the NAVI bone based on the PCA algorithm and defined it as the distance from the NAVI bone center to the median of the PST boundary points on the forefoot and rearfoot. Figure 3 displays schematic diagrams of these measurements for both the left (a) and right feet (b). Notably, here we stipulated that the “angle-fm-ta” is the angle between the FM axis and the TA axis, potentially resulting in angles greater than 180°.Fig. 3The schematic diagrams of the calculation of FAM and PSTT metric for (a) left foot and (b) right foot.We also measured PSTT by calculating the distance from the lowest boundary point of the FM to the foot’s lower border directly beneath it, denoted as the forefoot PSTT (arrow A in Fig. 3). Similarly, the rearfoot PSTT was measured from the lowest point of the CA to the foot boundary beneath it (arrow B in Fig. 3).For comparative analysis, we calculated the foot length (FL), defined as the distance between the outermost points of the toe and heel, marked by a red line in Fig. 3. The LAH and PSTT values were then normalized by dividing by the FL, resulting in normalized indicators: normalized LAH, normalized forefoot PSTT, and normalized rearfoot PSTT.

Hot Topics

Related Articles