An enhanced deep learning method for the quantification of epicardial adipose tissue

Study populationIn this study, 108 patients aged between 48 and 70 years (mean age ± standard deviation: 60.1 years ± 6.0; 62 of 108 [57.4%] were men) undergoing routine CCTA scans at the Second Xiangya Hospital of Central South University from April 2022 to September 2022, were included. And their CCTA data were analyzed for EAT volume quantification. The study was conducted in accordance with the Declaration of Helsinki and approved by the Ethics Committee of the Second Xiangya Hospital of Central South University (Approval number: LYEC2024-0042). Due to the retrospective nature of the study, the need of informed consent was waived by the Ethics Committee of the Second Xiangya Hospital of Central South University.CCTA imagingAll CCTA examinations were conducted using a (192 × 2)-slice dual-source CT (Somatom Definition Force, Siemens Healthineers, Germany) at the Second Xiangya Hospital of Central South University. The scan reached from the apex of the heart to the tracheal bifurcation. Scan parameters were as follows: Prospective “FLASH” ECG gating along with tube-current modulation in the angular and longitudinal direction (CareDose 4D, Siemens proprietary technology) was employed. The pitch was set at 3.2 with a collimation of (196 × 0.6) mm in the craniocaudal direction. The tube voltage ranged from 80 to 120 kV, depending on the body size of patient. Patients typically received approximately 50-70 ml of contrast medium, followed by 50 ml of saline and injected at a rate between 4 and 6 ml per second, unless solely arterial contrast was needed. The initiation of CT scanning depended on the injection site, injection velocity, duration of injection, and expected circulation time. Imaging commenced at 70% of the R–R interval. All CCTA images were reconstructed with a slice thickness of 3 mm at intervals of 3 mm using a vascular kernel. All scans were performed in free breathing. The resolution of each contrast CT slice was chosen to be N = 512.Creation of manual segmentation datasetTo establish and validate our deep learning method, the EAT in CCTA images was manually segmented for 108 patients. All manual segmentations of the EAT were performed by X.L (radiologist with 5 years of experience) and SQ.H (radiologist with 3 years of experience). These segmentations were also cross-checked by XB. L (with 20 years of cardiac surgery experience) and J.L (with 21 years of radiology experience) using 3D Slicer software (version 4.10.2) to reach a consensus. For each patient, images were extracted from the picture archiving and communicating system and imported in DICOM format into the validated post-processing software, 3D Slicer (version 4.10.2)25. Manual EAT segmentation and quantification occurred in two phases using the mediastinal kernel. Initially, two labels were established: one to encapsulate the pericardium and its internal area, and another for EAT storage. The pericardial line, visible in CCTA images, was delineated using slice-by-slice draw methods in the axial view, with automatic interior filling. Different perspectives were switched during annotation to ensure accurate pericardial labeling. EAT was defined as fatty-like tissue located between the myocardium and the visceral pericardium, with intensity limits between [-190HU, -30HU]. All patient information was anonymized before manual segmentation. The manual segmentation schematic diagram is presented in Fig. 1.Figure 1Labeled images during the labeling process. The labeling process is implemented in the order from (A) to (D). Part (A) is the original contrast CT slice, (B) is the drawing of pericardium, (C) is the label of the pericardium and its inner region in axial, sagittal, coronal and 3D views, (D) is the EAT fat pixels in Axial and 3D views.CNN architectureWe firstly establish and train a deep CNN denoted as \(\:{P}_{\theta\:}\) to predict the pericardium in each contrast CT slice for the segmentation task under consideration. Here, \(\:\theta\:\) represents certain parameters determined during the training process. The deep neural network \(\:{P}_{\theta\:}\) is expected to solve an image segmentation task, distinguishing the pericardium from the background. For this purpose, we parameterize \(\:{P}_{\theta\:}\) using a CNN known as U-Net, renowned for its U-shaped architecture. The original version of U-Net was first proposed by Ronneberger et al. for biomedical image segmentation26. In this work, we adopt a modified version of U-Net as shown in Fig. 2, which is similar to neural network in27. Each green and orange element represents a multi-channel feature map. The left segment and the right segment of \(\:{P}_{\theta\:}\) correspond to the contracting path and the expansive path, respectively. In the expansive path, each up-sampled output is combined with the corresponding multi-channel feature map from the contracting path. Both paths employ a series of convolutional layers using (3 × 3) convolution with zero-padding and (1 × 1) convolution stride, followed by batch normalization (BN), and rectified linear unit (ReLU). Additionally, the contracting path integrates (2 × 2) max pooling layers for down-sampling, while the expansive path utilizes (2 × 2) transposed convolutions for up-sampling layers. It is worth noting that during training, we exclusively employ CT slices containing the pericardium. However, during testing, we utilize complete series of CT slices from actual patients, which include slices both with and without the pericardium. In the training process, \(\:{P}_{\theta\:}\) is trained over 200 epochs by minimizing the \(\:{L}^{2}\) loss function.Figure 2The architecture of \(\:{P}_{\theta\:}\). Each green and orange item represents a multi-channel feature map. The number of channels is shown at the top of the volume, and the length and width are provided at the lower-left edge of the volume. The arrows denote the different operations, which are explained at the lower-right corner of the figure.Post-processing methodInspired by the a priori knowledge of the integrity and continuity of the pericardium, the proposed post-processing method is mainly based on 1D and 2D connected component analysis28,29. In order to state our method clearly, it is necessary to introduce the notion of connectivity of a binary image (e.g., segmentation results).Connectivity in a binary image can be defined in terms of adjacency relations among pixels. The pixels with coordinates \(\:(i,j)\) in the image will be denoted by \(\:{p}_{i,j}\). For binary images, the value of a pixel is either 1 (white) or 0 (black). Two pixels \(\:{p}_{{i}_{0},{j}_{0}}\) and \(\:{p}_{{i}_{1},{j}_{1}}\) are said to be neighbors if they share one edge, i.e., \(\:\left|{i}_{0}-{i}_{1}\right|+|{j}_{0}-{j}_{1}|\)=1. Two white pixels \(\:{p}_{{i}_{0},{j}_{0}}\) and \(\:{p}_{{i}_{k},{j}_{k}}\) are said to be path-connected if there exists a sequence of white pixels \(\:{p}_{{i}_{h},{j}_{h}}\:(1\le\:h\le\:k)\), such that \(\:{p}_{{i}_{h-1},{j}_{h-1}}\) and \(\:{p}_{{i}_{h},{j}_{h}}\) are neighbors.Let \(\:R\) represent the connectivity relation defined on a binary image \(\:S\) as follows: For any pairs of pixels \(\:p,q\in\:S\), we have \(\:(p,q)\in\:R\) if and only if \(\:p\) and \(\:q\) are both white and are path-connected in \(\:S\). It is obvious that 𝑅 is an equivalence relation, and the corresponding equivalence classes are called connected components. We note that the notion of 1D connected component can be defined similarly. To make the concept of connected components more intuitive, we provide a visualization for both 2D case and 1D case in Fig. 3.Figure 3An illustration for connected components in 2D case and 1D case. The white area A, B, C, D in the 2D case denote different connected components in the binary image. Similarly, the white area a, b, c, d in the 1D case denote different connected components in the binary sequence.We denote the segmentation results for a patient given by the well-trained \(\:{P}_{\theta\:}\) as \(\:\left\{{S}_{i}\right\}\). In some cases, \(\:{S}_{i}\) tends to be chaotic and displays multiple connected branches, which violate the a priori knowledge of the integrity and continuity of the pericardium. However, it is difficult to explicitly encode such a priori information in \(\:{P}_{\theta\:}\) during the training stage. Based on the above observation, we aim to propose a post-processing method to improve the segmentations by\(\:\:{P}_{\theta\:}\). For a binary segmentation \(\:S\in\:\left\{{S}_{i}\right\}\), we denote it by the sum of its connected components \(\:\left\{{C}_{j}\right\}\), i.e., \(\:S=\sum\:{C}_{j}\). Our post-processing method can be described by the following three steps.

Step 1. The area of a true pericardium intersects a slice should be large in general, and thus small connected components will be removed. Mathematically, we set

$$\:S^{\left(1\right)}:={\sum\:}_{\left|C_j\right|>\epsilon\:}C_j,$$where \(\:|\bullet\:|\) denote the total number of pixels and the parameter \(\:\epsilon\:\) will be chosen carefully in the numerical experiment. For simplicity, we denote \(\:{S}^{\left(1\right)}\) by the sum of its connected components \(\:\left\{{C}_{j}\right\}\), i.e., \(\:{S}^{\left(1\right)}=\sum\:{C}_{j}\).

Step 2. Since we have removed small noises, it is expected that there is only one connected component in \(S^{\left(1\right)}\) . At least, the largest connected component \(C_\ast\) should dominate \(S^{\left(1\right)}\). Otherwise, it is reasonable to deduce that there is no pericardium in \(S^{\left(1\right)}\) . To be more specific, we set

$$\:S^{\left(2\right)}:=\left\{\begin{array}{c}C_\ast,\:\:\:if\:m\leq\:\delta\:\:and\:\left|C_\ast\right|>\gamma\:\left(\left|S^{\left(2\right)}\right|-\left|C_\ast\right|\right).\\\:0,\:\:\:otherwise.\end{array}\right.$$Here, m represents the number of connected components. The parameters \(\:\delta\:\) and \(\:\gamma\:\) will be chosen carefully in the numerical experiment.

Step 3. Apply Steps 1 and 2 to each slice \(S\in\left\{S_i\right\}\) and obtain the segmentations\(\left\{S_i^{(2)}\right\}\) . Then we define the sequence\(\left\{\alpha_i\right\}\;(\alpha_i\in\left\{0,1\right\})\)  to indicate the presence of predicted pericardium in\(\left\{S_i^{(2)}\right\}\), where \(\alpha_i=0\) signifies its absence and \(\alpha_i=1\) indicates its presence. Since slices without the pericardium are distributed at the upper and lower ends of a series of CT slices, whereas slices with the pericardium are continuous, we further extract the largest connected component \(\left\{\alpha_{i,\ast}\right\}\)  from \(\left\{\alpha_i\right\}\) and define

$$\:S_i^{\left(3\right)}:={\alpha\:}_{i,\ast}S_i^{\left(2\right)}.$$Finally, \(\:\left\{{S}_{i}^{\left(3\right)}\right\}\) is set to be the improved segmentation results as the output of our post-processing method.The above post-processing method is illustrated in Fig. 4. The code for our post-processing method is available at https://github.com/kxtang/MIDL-for-EAT/. This way, our post-processing method not only improves the pericardium segmentation results but also accomplishes the classification of slices with and without the pericardium. Finally, we compute the 3D EAT volume of a patient based on the improved segmentation results.Figure 4The three steps of our post-processing method.Numerical experimentsWe present numerical experiments to demonstrate the effectiveness of our proposed algorithm for EAT quantification. The training process is performed on COLAB (Tesla P100 GPU, Linux operating system) and is implemented on PyTorch, while the proposed MIDL is implemented by Python 3.7 on a desktop computer (Intel Core i7-10700 CPU (2.90 GHz), 32 GB of RAM). For the selected 108 patients, we construct the whole dataset by pairing their original contrast CT matrices and the pericardium matrices labeled by the experts. Then, we divide the whole dataset into training dataset (60 objects, 2205 contrast CT slices), validation dataset (8 objects, 361 contrast CT slices) and the testing dataset (40 objects, 1862 contrast CT slices). Here, the training dataset is utilized for training the modified U-Net. Throughout this training stage, we employ the Adam optimizer30 alongside Xavier initialization31, utilizing a batch size of 5 and a learning rate of 0.001. To enhance the efficiency of data utilization, we also performed data augmentation on the original CCTA images by applying random rotations, flips, and translations. By using a grid search method on the validation dataset that maximizes their Dice score, we choose hyperparameters \(\:\epsilon\:=500\), \(\:\delta\:=5\), and \(\:\gamma\:=3\) for the proposed MIDL. Finally, the testing dataset is utilized to evaluate the overall performance of the proposed MIDL. Ablation experiments without post-processing steps, as well as training and testing using nnU-Net with CCTA slices containing the pericardium, were conducted for comparison with MIDL.Performance evaluationIn order to quantitatively evaluate the performance of our algorithms in EAT quantification, we introduce the Dice score coefficient (DSC) to quantify the overlap ratio between expert segmentation and automatic output in 2D CT slices and 3D volume. The calculation for DSC is defined as follows:$$\:DSC\left({EAT}_{ex},\:{EAT}_{dl}\right)=\frac{2\left|{EAT}_{ex}\cap\:{EAT}_{dl}\right|}{\left|{EAT}_{ex}\right|+\left|{EAT}_{dl}\right|}$$where \(\:{EAT}_{ex}\) represents fat tissue within the manually delineated pericardium by expert on each CT slice, and \(\:{EAT}_{dl}\) represents the EAT portion segmented automatically by the algorithm. A higher DSC value approaching one indicates better algorithm performance. Additionally, we compute the volume of an EAT segmentation by using the following formula:$$\:Vol\left(EAT\right)=\left|{EAT}_{seg}\right|\times\:{Spacing}_{x}\times\:{Spacing}_{y}\times\:{Spacing}_{z}$$where \(\:{EAT}_{seg}\) is an EAT segmentation for a patient. We further define the Relative Error (RE) to evaluate the accuracy of volume quantification:$$\:RE\left({EAT}_{ex},\:{EAT}_{dl}\right)=\frac{|Vol\left({EAT}_{ex}\right)-Vol\left({EAT}_{dl}\right)|}{Vol\left({EAT}_{ex}\right)}$$Moreover, the comparison between the volumes obtained from both MIDL measures and expert measures is conducted using Pearson correlation coefficient, Bland-Altman analysis and paired t-test. The inter-class correlation coefficient (ICC) analysis is employed to evaluate the consistency of EAT measurements performed by the initial two experts.

Hot Topics

Related Articles