Unsupervised few shot learning architecture for diagnosis of periodontal disease in dental panoramic radiographs

This study proposes an unsupervised learning framework that integrates the few-shot learning for analyzing dental panoramic radiographs. The framework is specifically designed to address the issue of sparse labeled medical image data for diagnosing periodontal disease. The proposed framework combines UNet architecture with CVAE in a way that is suitable under limited available data. The method begins with the UNet model in detecting probable RoIs within dental panoramic radiographs. The expertise of UNet in medical picture segmentation aids in accurate identification and extraction of regions indicating the presence of periodontal disease. In a sequence, extracted RoIs are fed into the CVAE module in our proposed architecture. CVAE plays a crucial role in extracting important latent features from these regions to reduce the dimensionality of the data, while maintaining key diagnostic information using small size of base training data. The combined UNet-CVAE architecture is specially designed to handle limited size of image data commonly encountered under few-shot learning scenarios.After feature extraction, our method employs various unsupervised clustering algorithms for classifying dental imaging data into separate clusters to lessen the need for a substantial collection of labeled data. To enhance the accuracy of the diagnostic process, we integrate a specific collection of annotated photos to assign diagnostic labels to clustered images only during validation phase, connecting unsupervised clustering with practical needs of medical diagnostics. The proposed architecture shows a potential in adopting advanced artificial intelligence (AI) methodologies, namely few-shot learning, in transforming medical images particularly under resource-constrained environments. Operational flow and components of our framework are depicted in Fig. 1.Fig. 1Proposed integrated framework for diagnosing periodontal disease in dental panoramic radiographs.UNetUNet architecture, which was initially designed to segment biological images, serves as a crucial component of our framework for diagnosing periodontal disease. UNets have been widely adopted due to its exceptional capability in localizing visual objects. We introduce the UNet architecture to handle subtle characteristics that are inherent in dental panoramic radiographs21,22, since dental caries, peridontal disease, and perapical lesions are lurking in and around teeth, requiring tooth segmentation to highlight surrounding areas as an important basis for automatically diagnosing tooth-related diseases23.Fig. 2UNet architecture is an end-to-end fully CNN and consists of two paths: contraction path (left side) and expansion path (right side), which looks like ‘U’ shaped structure as in Fig. 2. Each blue box corresponds to a multi-channel feature map, where the number of channels is listed on top of the box. The x–y sizes are provided at the lower left edge of the box. White boxes represent copied feature maps and the arrows denote different operations. For more details on the UNet architecture, refer to Ronneberger et al.21. In contraction path, the UNet architectural design uses a series of two successive \((3 \times 3)\) convolutions, where each convolution is immediately followed by a Rectified Linear Unit (ReLU) activation. This path enables the retrieval of multi-resolution characteristics that is essential for detecting subtle dental structure. After the convolution process, a \((2 \times 2)\) maximum pooling operation is performed with the stride of 2. This procedure is designed to serve two functions: downsampling the image and doubling the number of feature channels. The downsampling procedure D at layer l can be mathematically represented as$$\begin{aligned} x^{l+1}=D(x^{l})=\underset{2\times 2}{max}(\text {ReLU}(Conv_{3\times 3}(x^{l}))), \end{aligned}$$
(1)
for the convolution layer \(Conv_{3\times 3}(x^{l})\).On the contrary, the expanding path uses \((2 \times 2)\) transposed convolutions to increase the size of feature map for accurately identifying the location of tooth and surrounding structures. Skip connections S from the contracting path are incorporated, merging low-level feature maps with upsampled outputs, denoted as U to maintain spatial information, which is crucial for precise RoI extraction. The upsampling process at next \((l+1)\) layer can be defined as$$\begin{aligned} x^{l+1}=U\left( x^l\right) \oplus S\left( x^{l_{down}}\right) , \end{aligned}$$
(2)
where the symbol \(\oplus\) represents concatenation and \(l_{down}\) denotes the layer in the contracting path. The use of ReLU activation in UNet effectively addresses the issue of vanishing gradients, hence enabling accelerated training. The network’s schematic provides a clear representation of its organization and showcases the presence of multiple channels and spatial dimensions, which are clearly labeled for better understanding. In our application, we adopt He initialization24 for convolutional layers to avoid activation saturation. This method is mainly represented by normally distributed weights, that is, \(W\sim \mathscr {N}\left( 0,\sqrt{\frac{2}{n_l}}\right)\) and well suited for layers that arise after ReLU activation. Here, \(n_{l}\) is the number of inputs to a layer. It considers nonlinear characteristics of the ReLU function, guaranteeing efficient distribution of weights at the beginning of the training process.Within our proposed few-shot learning architecture, UNet plays a multifaceted role that goes beyond simple image segmentation. It is extensively integrated into our few-shot learning architecture, smoothly connecting with CVAE component. This integration aids in effectively retrieving and analyzing diagnostic features from limited data to overcome the difficulties in the diagnosis of periodontal disease.CVAEConvolutional variational autoencoder (CVAE) serves as a fundamental element in our framework for diagnosing sparse periodontal imaging data. CVAE is an advanced version of the conventional autoencoder (AE) that has been used to reconstruct input signals by using deep neural networks. The basic structure of CVAE is given in Fig. 3. CVAE is specifically designed to acquire efficient representations of input data under a primary purpose of reducing dimensionality25. In contrast to conventional AEs, CVAE incorporates a probabilistic methodology for encoding. The encoder, also known as the inference network \(q_{\phi }\left( \textbf{z} \mid \textbf{x}_{i}\right)\), generates a probability distribution for each latent variable \(\textbf{z}\). Each of latent variable has been commonly modeled by the normal distribution, \(z \sim \mathscr {N}(\mu , \sigma ^2)\), where \(\mu\) and \(\sigma ^{2}\) are the mean and the variance of the normal distribution, respectively. The re-parameterization method includes the equation of a vector form: \(\textbf{z}= \varvec{\mu }+\sigma \odot \varvec{\epsilon }\), where \(\varvec{\epsilon }\) follows a multivariate normal distribution with the mean vector of zeros and identity covariance matrix. This procedure guarantees that the latent space retains a certain level of randomness for securing the robustness of the model.Fig. 3Structure of the convolutional variational autoencoder (CVAE).The decoder, also known as the generative network \(p_{\theta }\left( \textbf{x}_{i} \mid \textbf{z}\right)\), is responsible for reconstructing the ith input data \(\textbf{x}_{i}\) using latent variables \(\textbf{z}\). Training of CVAE entails the task of maximizing the evidence lower bound (ELBO), which is mathematically described as$$\begin{aligned} \mathscr {L}\left( \phi ,\theta ; \textbf{x}_{i}\right) =-D_{K L}\left( q_{\phi }\left( \textbf{z} \mid \textbf{x}_{i}\right) \Vert p_{\theta }(\textbf{z})\right) +\mathbb {E}_{q_{\phi }\left( \textbf{z} \mid \textbf{x}_{i}\right) }\left[ \log p_{\theta }\left( \textbf{x}_{i} \mid \textbf{z}\right) \right] , \end{aligned}$$
(3)
where \(\phi\) and \(\theta\) are the variational parameter and generative parameter, respectively. Here \(\mathbb {E}_{q_{\phi }\left( \textbf{z} \mid \textbf{x}_{i}\right) }[\cdot ]\) denote the expectation in term of \(q_{\phi }\left( \textbf{z} \mid \textbf{x}_{i}\right)\). In this context, ELBO maintains a trade-off between the precision of reconstructed data and the smoothness of latent space, which is quantified by Kullback–Leibler divergence \(D_{KL}(\cdot )\). By incorporating convolutional layers into variational autoencoder (VAE), it becomes possible to capture spatial hierarchies present in dental image data. This is particularly important in medical imaging applications such as the diagnosis of periodontal disease26. CVAE’s encoder compresses the input image into feature maps, then they are used to calculate the parameters of the latent space distribution. In contrast to the encoder, the decoder employs deconvolutional layers to rebuild the images based on sampled latent variables.In our proposed architecture, UNet effectively extracts RoIs from dental panoramic radiographs, and CVAE then analyzes these RoIs to extract essential latent properties. This hybrid technique is essential in effectively managing the scarcity of data under few-shot learning paradigm. Latent vectors created by CVAE contain unique characteristics in dental radiographs that are used at the unsupervised clustering stage of our methodology. This enables the categorization of imaging data with subtle distinctions to accurately identify periodontal disease using small size of dental panoramic radiographs.This study was conducted according to the principles of the Declaration of Helsinki and was approved by the Institutional Review Board (IRB) of the Hanyang University Seoul Hospital (IRB number 2019-01-007-026). The requirement for informed consent was waived by the IRB because of the retrospective nature of the study.Unsupervised clusteringOur dental panoramic radiograph classification framework introduces unsupervised clustering methods to support few-shot learning scheme. The methods are based on latent variables derived from CVAE, which encompass condensed information vital for detecting the patterns that are indicative of periodontal disease.
k-means clusteringk-means clustering is a popular technique for dividing data into several groups with similar characteristics27. The process entails dividing a set of n samples into k groups, where each group is characterized by its centroid. The algorithm proceeds by performing two main steps: firstly, it assigns each data point to the centroid that is closest to it, and secondly, it updates the positions of the centroids based on the points that have been assigned to the centroids. This process continues until convergence, usually when the centroids reach a state of stability. Within our framework, the use of k-means clustering algorithm assists in categorizing radiograph images into separate clusters according to the characteristics presented in the latent space produced by CVAE. This process facilitates the recognition of various phases or types of periodontal disease.DBSCANDensity-based spatial clustering of applications with noise (DBSCAN) is a kind of clustering technique developed by Ester et al.28. It features several clusters by evaluating the density of data points. The characteristics of the clusters are determined by two parameters: the parameter specifying the size of a neighborhood around a point and the parameter representing the minimal number of points needed to create a dense region. DBSCAN distinguishes core points, border points, and noises, making it effective in handling outliers and identifying clusters of various forms. In our study, DBSCAN is employed to identify intricate patterns in dental radiographs that may not take a spherical shape, allowing for a more sophisticated clustering that is well-suited for various presentations of periodontal disease.GMMGaussian mixture model (GMM) is a statistical model under the postulation that data is derived from a number of Gaussian distributions with unspecified characteristics29. It is especially efficient in the situations where the clusters exhibit different variations. GMM employs an expectation-maximization (EM) technique to progressively estimate the parameters of the Gaussian distributions, allowing the model to handle overlapped clusters with different sizes. The GMM is used in this work since radiographic features may overlap or change greatly from feature to feature. Along with the distribution of latent features recovered by CVAE, the GMM helps us detect minor variations in radiographic images that indicate different stages of periodontal disease.The efficacy of these clustering algorithms is evaluated by their abilities to accurately classify radiographs into several groups that indicate the presence or absence of periodontal disease; that is, the precision and the extent to which detected clusters align with clinical diagnoses. The selected algorithms are then incorporated into our few-shot learning architecture to improve the diagnostic process by offering an automated and resource-efficient way for classifying dental panoramic radiographs. Note that the clustering is a critical stage in the early identification and treatment of periodontal disease.Bayesian optimization for hyperparameter tuning in few-shot learningWe introduce the Bayesian optimization method to decide hyper-parameter values in our integrated framework. The procedure operates in the context of few-shot learning strategy for diagnosing periodontal conditions since the hyper-parameters in both UNet-CVAE and clustering algorithms affect accurate and quick detection of periodontal disease. Bayesian optimization aims to fast converge to the optimal solution with respect to a computationally intensive objective function, such as the framework proposed by Wu et al.30. Bayesian optimization operates under the principle of Bayes’ rule as$$\begin{aligned} p(w|D) = \frac{p(D|w)p(w)}{p(D)}. \end{aligned}$$
(4)
Here, p(w) is the prior distribution of an unobserved quantity w, p(D|w) is the likelihood, and p(w|D) is the posterior distribution of the data D. This optimization method updates the results of previous iterations to select appropriate values via the acquisition function \(\textbf{u}\) for identifying next observations that could potentially maximize the objective function. Popular acquisition functions include the probability of improvement (PI), expected improvement (EI), and upper confidence bound (UCB)31. The optimal hyper-parameters, \(\varvec{\lambda }^*\), are determined such that \(\varvec{\lambda }^*=\text {arg} \min _{\textbf{x}\in A} f(\varvec{\lambda })\), with \(\textbf{x}(\in A : A \rightarrow \mathbb {R})\) representing evaluation points in the search space. Here, \(f(\cdot )\) denotes a surrogate model. Typically, it employs the Gaussian process regression (GPR) that estimates the target function iteratively as$$\begin{aligned} f(\textbf{x}) \approx GP(m(\textbf{x}), k(\textbf{x},\textbf{x}’)), \end{aligned}$$
(5)
where \(m(\cdot )\) and \(k(\cdot )\) represent the mean and covariance function of GPR, respectively. A commonly used covariance function is the squared exponential function: \(k(\textbf{x}_i,\textbf{x}_j) = \exp \left( -\frac{1}{2}|\textbf{x}_i-\textbf{x}_j|^2 \right)\). Along with its efficiency in fast convergence compared to random sampling methods, Bayesian optimization is implemented through a sequence of updating the posterior distribution and maximizing the acquisition function in the proposed integrated architecture. The Bayesian optimization algorithm is given in Algorithm 1. Note that \(D_{1:t-1}=\{x_n,y_n\}^{t-1}_{n=1}\) is the training dataset for the surrogate model f.Algorithm 1Bayesian optimization with prior and posterior updating.Description of the dataTufts dental databaseTufts dental database (TDD)32 is a collection of one thousand digital panoramic radiographs that have not been completely supervised. The data was collected with the agreement of the Tufts University Institutional Research Board (IRB ID MODCR-01-12631, authorized on 7/14/2017). The images during the period spanning from January 1, 2014, to December 31, 2016 were carefully selected based on their diagnostic accuracies, along with a focus of minimizing technical faults. The radiographs were converted to a standard picture format (TIFF/JPEG) and were annotated by both a dental specialist and a student using the Labelbox program. The annotations specifically targeted dental masks and maxillomandibular RoIs, which were used as the reference data for training UNet model that learns to recognize important structural components for automated detection of periodontal disorders. Figure 4 presents sample panoramic radiographs of normal (left panel) and periodontal disease (right panel) in the TDD.Fig. 4Sample panoramic radiographs from tufts dental database.Hanyang university seoul hospital dental databaseThe second dataset consists of 256 photos in HUSHDD (Hanyang University Seoul Hospital Dental Database), following Hanyang university’s ethical requirements (IRB 2019-01-007-026). Out of them, 138 photos illustrate different phases of periodontal disease, whereas 118 photos portray healthy dental conditions from patients aged over 20 years. Figure 5 presents sample panoramic radiographs of normal (left panel) and periodontal disease (right panel) in HUSHDD.Fig. 5Sample panoramic radiographs from Hanyang University Seoul Hospital Dental Database.Noor medical imaging center dental databaseNMICDD (Noor medical imaging center dental database) is comprised of 116 panoramic dental X-rays collected at the Noor Medical Imaging Center in Qom, Iran, anonymized to protect patient confidentiality33. The dataset reflects a broad spectrum of dental conditions, encompassing healthy individuals, partially edentulous, and completely edentulous patients. Labeling process of the data was performed to align with the principles applied in HUSHDD, where 70 images capture various stages of periodontal disease and 18 images depict healthy dental states. Exclusions were made for images lacking teeth, patients under 20 years of age, and duplicated records (Fig. 6).Fig. 6Sample Panoramic Radiographs from Noor Medical Imaging Center Dental Database.Data pre-processingThe collected dental radiographic images were carefully reviewed by a dental professional, adhering to the classification standards established by the 2017 World Workshop. The images were categorized into control and chronic periodontal disease groups based on observed alveolar bone resorption patterns. Periodontal disease was diagnosed when generalized alveolar bone resorption exceeded 3 mm from the cementoenamel junction (CEJ) in the radiographs. To ensure data quality, images that meet following exclusion criteria were removed from the dataset: mixed dentition (coexistence of primary and permanent teeth), pathologic lesions (e.g., tumors, osteomyelitis, cysts), localized periodontitis affecting one or two teeth, partial or complete edentulous alveolar ridges due to multiple missing teeth, sequelae and metal plates from maxillofacial trauma, and supernumerary teeth in the alveolar bone region. In medical imaging, exposures play a crucial role in radiograph interpretation. However, each radiograph imaging equipment manufacturer has developed its own controlling mechanisms for exposures, resulting in different exposure results and potentially affecting automated image classification tasks34. To mitigate unintentional effects of EIs, we performed picture standardization as a vital pre-requisite for effective application of the UNet-CVAE framework in our few-shot learning architecture. Histogram standardization was employed to improve the uniformity of three image datasets for analysis: TDD, HUSHDD, and NMICDD. The standardization process involved in calculating a global histogram representing the entirety of the image collection by transforming and combining individual image histograms into a uniform format. The global histogram data was then normalized by dividing it by the total number of images in the dataset. Subsequently, histogram equalization35 was implemented on each image to ensure uniformity in contrast and brightness across all the radiographs. This approach utilizes a cumulative distribution function obtained from the global histogram to adjust pixel values of each image to match the standardized distribution. The radiographs were then normalized to guarantee consistent scaling of pixel values within a pre-defined range. Histogram equalization was particularly important for the HUSHDD, as the image data exhibited significant differences in pixel values between normal and periodontal disease data, which could unintentionally affect classification performance. After the global histogram equalization, the exposure of the HUSHDD showed consistency among images. To evaluate the performance of histogram equalization, the standard deviation of average information content (entropy) was employed as an image quality measure. The entropy is defined as $$\begin{aligned} H = – \sum _{i=0}^{L-1} P(i) \log _2 P(i), \end{aligned}$$
(6)
where P(i) is the probability density function at intensity level i, measuring the richness of image details. Here, L is the total number of grey levels. The standard deviation of average information of radiographs before normalization was 0.2025 and it was reduced to 0.1013 after normalization, effectively mitigating potential biases in model evaluation resulting from these differences.

Hot Topics

Related Articles