Orthopedic disease classification based on breadth-first search algorithm

Orthopedic diseases are widespread worldwide, impacting the body’s musculoskeletal system, particularly those involving bones or hips. They have the potential to cause discomfort and impair functionality, making routine daily tasks tough. In this section, we described the dataset that was utilized in this study, followed by the preprocessing steps applied to the data. Subsequently, the feature selection where the approaches used to extract and select the most relevant features from the data, followed by the proposed method in detail. Subsequently, the layout of machine learning (ML) models that were used in this study to evaluate our proposed model, then fitness function describes the function used to guide the optimization of the proposed model.DatasetThe dataset utilized in the study is available online20. The dataset is depicted as the following: 6 attributes and 310 instances. The dataset includes six biomechanical properties for each patient, which are calculated based on the form and orientation of the pelvis and lumbar spine. These attributes include pelvic incidence, pelvic tilt, lumbar lordosis angle, sacral slope, and pelvic radius. Each characteristic has been utilized as a column in the dataset, which has been transformed into a Comma Separated Values (CSV) file.Data preprocessingThis stage is utilized to guarantee that the used data is complete and organized.Clean null valuesManaging noisy missing values is a considerable problem that will likely take considerable time. Null values commonly result from errors that appear during data collection, such as leaving an empty place for diagnostic attributes that don’t apply21. NaN, or null indicators, commonly represent missing values. Deleting duplicate rows and columns is essential. Therefore, we suggest two strategies to resolve this issue. One approach is to remove the samples with missing values; however, this may result in the loss of important data. The alternative approach is to impute null by substituting these values with the mean value for each attribute. We replaced the null with a known value from the dataset to preserve the majority of the data’s meaning. Eliminating data that is absent from the dataset would limit its dimensions, perhaps leading to inaccurate analysis. Conversely, retaining missing data could cause abnormalities in the variable distribution. The study uses the K-Nearest Neighbor (KNN) technique for null imputation22, identifying ‘k’ samples near the dataset through the Euclidean distance. The average of the ‘k’ neighbors is used to impute null. This method is useful at outliers and requires less computational time, with a value of ‘k’ of 10.NormalizationNormalization is an essential preprocessing step, particularly when dealing with approaches that are impacted by scaled features, such as support vector machines and K-nearest neighbors. We normalized our numerical features using the StandardScaler from scikit-learn, which adjusted them to have a mean of zero and a standard deviation of one. This technique standardizes the scale of all features to prevent any one characteristic from overriding the others during model training.$$\:z=\frac{x-\mu\:}{{\upsigma\:}}$$
(1)
The formula represents the transformation of input data x into standard scores, allowing for a mean \(\:\varvec{\mu\:}\) of zero and a standard deviation \(\:{\upsigma\:}\) of one, by transforming the training samples into standard scores.Feature selectionFeature extraction improves in identifying the efficient attributes for the classifier to learn from the depiction. Prior to evaluating the performance of a model, hyper-parameter optimization allows for precise adjustments. Table 2 shows the performance of BBFS algorithm compared with another algorithms.Table 2 The performance of BBFS algorithm compared with another algorithms.The study uses BBFS, BPSO, BGWO, and BWAO for feature selections, and the BBFS makes an average less error as shown in Fig. 1. Details of BBFS is explained in Binary Breadth-First Search section.Fig. 1Average error and select size of BBFS compared with other optimizers.$$\:Average\:Error\:\left(AE\right)=\frac{1}{n}\sum\:_{i=1}^{n}{e}_{i}$$
(2)
where \(\:{e}_{i}\) = \(\:f\left({x}_{i}\right)-f\left({x}_{optimal}\right)\) if you’re comparing the function value at the current iteration \(\:f\left({x}_{i}\right)\) to the optimal function value \(\:f\left({x}_{optimal}\right)\). \(\:n\) is the number of iterations or data points.The entire dataset was split into two separate parts, each of which was used as input for our classification study. Table 3 shows the statistical investigation of the utilized dataset.Table 3 Statistical investigation of the utilized dataset attributes.Figure 2 displays the heatmap investigation of the dataset features. Heatmap analytics is a frequently utilized method that visualizes the correlation between variables in a dataset. We use it to determine the strength and weakness of the connections between variables and to identify the correlation among them. Both bars in the figure represent the numerical values of the given attributes. The data in the map is scaled to a range of 0 to 1, with brighter colors representing a value of 1 and darker colors representing a value of 0. The diagonal values are 1, indicating a perfect correlation between features. A reduction in values signifies a decrease in the correlation between features. This is helpful in diagnosing and predicting orthopedic diseases using the statistical analysis represented in the heatmap diagram. Figure 3 displays the box plot visualization for the dataset features classified using labeled analytics.Fig. 2Heatmap investigation for utilized dataset attributes.Fig. 3Boxplot visualization of the target attributes.Figure 4 displays a box plot that was used to analyze the distribution of the features. It is an excellent plot for illustrating the distribution of numerical data. You can use a box plot to visually represent the distribution of features in a dataset. We refer to this depiction as a box plot and use it to analyze the spatial distribution of features. We illustrate the six major enrollment features of the orthopedic dataset in this graph. Box plots divide the data into quartiles, with each piece representing approximately 25% of the dataset. Box plots are useful as they provide a visual context of the utilized data, enabling readers to quickly identify average values, the spread of the dataset, and inequality.Fig. 4Boxplot visualization of dataset attributes.Figure 5 displays the distribution analysis of the characteristics. It visually represents the dataset’s statistical distribution by showing the frequency of data points at various intervals. This tool can be helpful for illustrating the distribution of the data that is represented as numbers. We analyzed the histogram of the features in this graphic, a typical graphing method used for displaying both continuous and discrete information collected on a scale consisting of intervals. It is commonly utilized to represent the fundamental characteristics of data distribution in a user-friendly manner. The selected features as shown in Table 4.Fig. 5Histogram distribution analysis for the dataset features.Table 4 Statistical of selected features.The proposed methodologyThis research employs a widely used dataset and a selection of algorithms with machine learning approaches for classifying patients in the field of orthopedics. Prior to evaluating the performance of a model, hyper-parameter optimization allows for precise adjustments. The study uses binary breadth-first search (BBFS), binary particle swarm optimization (BPSO), binary grey wolf optimizer (BGWO), and binary whale optimization algorithm (BWAO) for feature selections, and the BBFS makes an average less error, so we used BBFS as an optimal optimizer algorithm. Then six machine learning models, i.e., random forest (RF) classifier, stochastic gradient descent (SGD) classifier, Naïve Bayesian classifier (NBC), dummy classifier (DC), quadratic discriminant analysis (QDA) classifier, and extra trees (ET) classifier, were trained using a training set that was obtained through a feature selection optimizer (BBFS). Through experimentation, the RF model achieved the best results when compared with the others. The parameters of the RF model were optimized using four optimization algorithms: BFS, PSO, WAO, and GWO. The dataset used contains 310 instances and six distinct features. The results showed that the developed BFS-RF can improve the performance of the original classifier compared with other hybrid models. It was found that the BFS-RF performs better on the dataset. Figure 6 shows the optimized RF model based on BFS for Orthopedic’s disease classification (normal or abnormal).Fig. 6The optimized RF model based on BFS for Orthopedic’s disease classification.Through the study, we exploit a shared dataset. to optimize the RF model using four optimization algorithms: BFS, PSO, WAO, and GWO, for classifying patients with Orthopedic’s disease. We use isolated parts of the whole dataset for training and testing targets. We can build classifier models using the training data. Afterwards, we evaluate the created models based on their ability to create a successful classification model for orthopedic illness. Random Forest is chosen using breadth-first search as the best method for adjusting variables. The first steps in creating a classification model for RF are determining the parameters that are predicted and the desired outcome. Next, try to tune the hyperparameter settings of the RF.Eventually, the optimized Random Forest algorithm has been utilized for classification, and the model’s efficacy is assessed employing trial data. Experimental findings show that RF gave the highest accuracy of 91.4% before hyper-parameter adjustment, compared to 81.7%, 83.6%, 86.2%, 87.8.3%, and 89.3% for NB, DC, SGB, QDA, and ET, respectively. After applying hyperparameter tuning, the RF with BFS achieved 99.41% compared to 97.13%, 96.75%, and 93.95% for PSO-RF, WAO-RF, and GWO-RF, respectively. Therefore, it is an optimal method that utilizes the RF model for orthopedics’s illness categorization in contrast with other machine learning classifiers.Binary breadth-first search (BBFS)BBFS is used in ML to identify the most relevant features from a dataset. It begins with considering all available features and iteratively removes the one that irrelevants and affects on the performance of the model and continues until a desired number of features remains. It simplifies the model, making it easier to train and comprehend. BBFS may help rectify overfitting, when the model performs well on training data but negatively on unknown data, by deleting unnecessary features. BBFS also speeds up training and improves model ability for generalization. BBFS has constraints like it takes much time for computations and especially for big datasets with several features. BBFS also depends on the performance metric, thus choose one that matches the model’s target. Finally, since BBFS eliminates features individually, it may overlook model-informing feature relationships.A common feature selection approach in binary labeling scenarios is Binary Breadth-First Search (BBFS). The goal is to pinpoint and choose the most significant characteristics from a provided set of features in order to enhance the efficiency of a machine learning model. The algorithm utilizes Breadth-First Search (BFS) principles, a graph traversal technique, to effectively navigate the feature space. Algorithm 1 demonstrates the mathematical algorithm for binary breadth-first search used in feature selection.Algorithm 1 methodically examines the feature space in a breadth-first approach, guaranteeing that the chosen subset is based on the model’s performance. The halting criterion regulates the search space and prevents exhaustive exploration. Figure 7 displays the encoding mechanism of BFS.Fig. 7Encoding mechanism of BFS.Machine learning models using hyperparameter optimizationThe study presents an optimized RF model for orthopedic disease classifications, using BFS, PSO, WAO, and GWO algorithms to fine-tune hyperparameters. Biomechanical features from an orthopedic patient dataset20 were assessed for efficiency. BFS, PSO, WAO, and GWO are hyperparameter tuning methods that improve model accuracy by collecting observations with as much information as possible about the function and optimal value. The method efficiently investigates a wide variety of options by searching using different hyperparameter settings. algorithm 2 demonstrates the mathematical approach for BFS used in hyperparameter tuning.Algorithm 2 methodically examines the hyperparameter space, guaranteeing that different settings are assessed and compared using a breadth-first approach. The halting criterion regulates the search space to avoid exhausting investigation and enhance performance in adjusting hyperparameters.

Hot Topics

Related Articles