Ensemble of hybrid model based technique for early detecting of depression based on SVM and neural networks

We selected various ML classifiers, one DL model, a hybrid model, and an ensemble meta-model for the training procedure. We selected the RFC, KNN, SVM, and XGB classifiers as the ML classifiers. We also propose a state-of-the-art hybrid model, DeprMVM. This model incorporates the DL and ML architectural components. With the help of the ML and DL models, we created an ensemble model that achieved greater accuracy in detecting depression.Support vector machineSupport Vector Machine is an ML method based on the theory of statistical learning. It is a popular classifier used in many practical applications, particularly in classification problems. The basic design philosophy of SVM is to maximize the classification boundaries and its purpose is to maximize the hyper-plane28. SVM can handle both linearly separable and non-linearly separable problems by mapping data into a high-dimensional feature space. It uses a boundary-detection technique to retain the potential support vector and improve the learning generalization ability. SVM aims to minimize the empirical risk and achieve the minimization of existential risk and confidence range. This is a powerful tool for classification purposes. The mathematical equation representing the SVM classifier is expressed as follows:$$\begin{aligned} f(x) = \text {sign}(W \cdot X + b) \end{aligned}$$
(3)
where f(x) is the classification function, X is the input data, W is a weight vector. b is the bias term. In the case of a linear SVM, the objective is to determine the best W and b values to maximize the margin between the two classes29.DL model (multi-layer perceptron)The MLP model consisted of three types of layers in this study. An input layer, one or more hidden layers, and an output layer are added to the many layers of linked nodes or neurons that constitute this network. Weighted connections and activation functions allow each neuron in the network to process information and forward it to the subsequent layer. MLPs30 are ideal for a variety of applications, such as pattern recognition, regression, and classification because they may represent intricate and non-linear connections in data. When training an MLP, weights, and biases are often adjusted using optimization methods such as back-propagation to decrease the error between the expected outputs and the real objectives.The mathematical equation of the MLP classifier consists of separate equations for each layer of the network. The forward pass of an MLP with a single hidden layer is represented by the following equations:1. Input to the first hidden layer:$$\begin{aligned} z_1 = W_1 \cdot X + b_1 \end{aligned}$$
(4)
Here, \(z_1\) is the weighted sum of input features \(X\) for the first hidden layer. The weight matrix for the links between the input layer and the first hidden layer is denoted as \(W_1\). For the initial hidden layer, the bias vector is denoted as \(b_1\).2. Activation function for the first hidden layer:$$\begin{aligned} a_1 = \sigma (z_1)\ \end{aligned}$$
(5)
is typically a non-linear activation function, such as a sigmoid, Rectified Linear Unit(Relu), or hyperbolic tangent.3. Input to the output layer:$$\begin{aligned} z_2 = W_2 \cdot a_1 + b_2 \end{aligned}$$
(6)
Here: \(z_2\) is the weighted sum of activations from the first hidden layer to the output layer. The weight matrix \(W_2\) represents the relationship between the output layer and first hidden layer. The bias vector of the output layer is denoted as \(b_2\) .4. Output of the MLP:$$\begin{aligned} a_2 = \text {softmax}(z_2) \end{aligned}$$
(7)
In multiclass classification, the softmax function is often used to convert raw scores \(z_2\) into class probabilities.The above equations describe a simple feed-forward neural network with one hidden layer. This would keep building hidden layers on top of one another, each with the same processing pattern, to develop deeper networks. The goal of training an MLP is to determine the optimal weight matrices (\(W_1\) and \(W_2\)) and bias vectors (\(b_1\) and \(b_2\)) by adjusting them during the training process to minimize a chosen loss function, which is typically related to the difference between the predicted and actual target values. Training is often performed using techniques such as backpropagation and gradient descent31.Fig. 4The architecture of the hybrid model DeprMVM, combines the MLP and SVM classifiers. The MLP block and detection block have shown here.Hybrid method (DeprMVM)After using the models above, we combined the best-performing models to create an ensemble-based hybrid model to determine whether the hybrid model outperformed the other models.DeprMVM is a hybrid model that combines the MLP and SVM models to accurately detect depression. This offers a hybrid strategy that leverages the benefits of both ANN32 and conventional ML methods. The MLP performs feature extraction and complex data representation learning in this combination. At the same time, the SVM performs the role of classifier, using the characteristics learned by the MLP to inform its conclusions. This combination is quite advantageous, especially when dealing with high-dimensional or non-linear data because the SVM is good at successfully differentiating different classes, whereas the MLP is good at capturing intricate patterns.The hybrid DeprMVM is shown in Figure 4. The technique uses three hidden layers of the NN, with the output of the last hidden layer serving as the input for the SVM, which more precisely detects the depression. The method is slightly more complex than using SVM and MLP classifiers independently, but it typically improves several attributes, such as accuracy, precision, recall, and the f1-score. The mathematical equation of DeprMVM comprises separate equations for each layer of the network. An MLP with an input layer and two hidden layer forward passes is represented by the following equation:Step 1: Input to the output layer:$$\begin{aligned} Z_1 = W_1 \cdot (a_1+ a_2) + b_1 \end{aligned}$$
(8)
Here: \(Z_1\) is the weighted sum of the activations from the input layer to the hidden output layer. \(W_1\) is the weight matrix for the connections between the first hidden layer and output layer. In addition, \(a_1\), and \(a_2\)are hidden layers. \(b_1\) denotes the bias vector of the output layer.Step 2: Output equation of the DeprMVM:$$\begin{aligned} A_1 = \text {softmax}(Z_1)+ \text {sign}(W_2 \cdot X_0 + b_2) \end{aligned}$$
(9)
In multiclass classification, the softmax function is often used to convert raw scores \(Z_2\) into class probabilities. Here, \(X_0\) is the connection between the MLP output and SVM input layers. Then \(W_2\) is the weight vector and \(b_2\) is the bias term. We obtained more accurate results when we utilized the hybrid model because the final hidden layer of the MLP had more crucial characteristics for every training data set, which was then further classified using SVM.Following a thorough evaluation of the various ML, Hybrid, and DL models, we determined which model performed the best and detected the greatest impact on this type of dataset.Fig. 5In the proposed ensemble of hybrid model architecture, Two base classifiers combine to create a new dataset, and the DeprMVM method serves as a meta-model to produce the ultimate detection.Algorithm 1Proposed hybrid model ensemble.Proposed ensemble modelThe proposed ensemble33 approach combines two neural networks, MLP and DeptrMVM, with an SVM. There are two layers in the depression framework: Level 0 and Level 1. Both the dependent and independent variables in the fresh data set utilized for training the meta-classifier are the foundation for the classifiers’ outputs after they have been trained and tested at level 0 using out-of-sample examples.In this research, the hybrid DeprMVM is a level-1 learner, whereas the SVM and MLP networks are level-0 learners. The primary rationale for choosing MLP and SVM as foundational learners is their exceptional prediction performance in sequential data modeling34 and their resilience in such scenarios. Additionally, because different base models are likely to generate various types of errors, their differences guarantee a variety in the ensemble, which is crucial. Figure 5 shows the flowchart of the proposed process. The proposed model approach is described in Algorithm 1.First, the SVM and MLP networks were used to train the basis models. To ensure that there was no data leakage, an ensemble was developed using the two-fold cross-validation (CV) approach. The second stage creates a new dataset by transforming the out-of-fold predictions produced by the two base models and adding actual labels. Specifically, in the new dataset, the expected target labels were utilized as attributes, whereas the initial class labels comprised the response variable. Furthermore, the base learners used identical CV indices to train separately on the training data, whereas the hybrid model was trained using out-of-fold detections. As they were not utilized to train the level-0 models, actual labels were attached to these occurrences. Depressed and not depressed are thus represented by the base learners’ outputs of 1 and 0, respectively.For instance, assuming that each sample in a sociodemographic dataset D is \(c_i\), \(d_i\) , a new sample \(\hat{c}_i\), \(\hat{d}_i\) is created,$$\begin{aligned} \hat{c}_i = P_1 (c_i), P_2 (c_2),….., P_T (c_i) \end{aligned}$$
(10)
where the base models were {\(P_1,P_2,……,PT\)} and \(\hat{c}_i\) are sample detections. Finally, the hybrid-based meta-model combines the base models trained on the resulting dataset in the third stage. The final ensemble detection for \(c\) is \(\hat{P}\) {\(P_1(c), P_2(c),…., P_T(c)\)}, where \(\hat{P}\) represents the meta model.

Hot Topics

Related Articles