Resting-state frontal electroencephalography (EEG) biomarkers for detecting the severity of chronic neuropathic pain

ParticipantsWe obtained a publicly available EEG dataset released by Monterrey Institute of Technology and Higher Education, as described in the study conducted by M. Zolezzi et al. 35. The dataset consisted of individuals suffering from chronic neuropathic pain. In total, 36 patients (28 females, 8 males) were included in the study. The participant cohort encompassed various chronic neuropathic pain conditions, including spinal cord injury (N = 11), peripheral neuropathy (N = 10), diabetes (N = 6), trigeminal neuralgia (N = 3), central nervous system (CNS) disorders (N = 3), and other conditions (N = 3). Exclusion criteria encompassed individuals with debilitating conditions such as CNS or head tumors, neurological disorders, cerebral infarction, or severe mental illness.The study exclusively enrolled adult participants, with a mean age of 44 ± 13.98, who had been experiencing chronic neuropathic pain for at least three months. To assess the severity of pain, participants completed the Pain Detect Questionnaire, with a threshold of 12 or more for inclusion. The Brief Pain Inventory (BPI) was also utilized to evaluate pain severity36. We used the ‘Actual pain’ scores from the BPI questionnaire. This score was selected because it comprehensively evaluates the patient’s pain intensity at the time of the survey37.EEG data acquisition and experimental paradigmThe EEG data were acquired using a mBrain train cap with Ag–AgCl electrodes and a Smarting mBrain amplifier, following the 10–20 international system. A total of 24 electrodes (Fp1, Fp2, F3, F4, C3, C4, P3, P4, O1, O2, F7, F8, T7, T8, P7, P8, Fz, Cz, Pz, CPz, and POz) were positioned on the scalp to capture electrical activity.The EEG signals were recorded at a sampling rate of 250 Hz. A band-pass filter ranging from 0.1 to 100 Hz was applied during data acquisition to ensure the extraction of relevant frequency components. The collected EEG data were referenced to the Cz electrode, and the left and right mastoids were used as ground electrodes.The experimental paradigm consisted of two conditions: eyes open (EO) and eyes closed (EC). In the EO condition, participants were instructed to gaze at a white fixation cross displayed against a dark background for five minutes. Following this period, the fixation cross disappeared, at which point participants were instructed to close their eyes and maintain this closed-eyes condition for an additional five minutes. A beep sound signaled the end of the recording. In this study, we have analyzed and discussed the significant findings observed specifically in the EO condition.EEG pre-processingThe EEG data pre-processing was performed using MATLAB R2020b (The MathWorks, Inc., U.S.A.) along with the EEGLAB38 and FieldTrip toolboxes39. Independent component analysis (ICA) was employed to remove artifacts such as muscle activity, eye movements, and line noise, thus isolating the EEG signal from the raw data38. Components identified by the ICA auto labelling, second-order blind identification (SOBI), and algorithm as muscle, eye, heart, channel noise, or line noise with a probability exceeding 70% were discarded. The signal was then re-referenced to a common average reference.The raw signals exhibited considerable noise in the form of abnormal peaks. To effectively denoise the data, we segmented the 5 min signal into 5 s slices. Baseline correction was performed for each segment using empirical mode composition (EMD)40. This technique utilizes an iterative process of decomposing the signals into intrinsic mode functions (IMFs) and residual components, effectively eliminating the non-stationary nature of EEG signals. It begins by identifying the local minima and maxima of the signal, then using these extrema to create lower and upper envelopes and calculating their mean. The mean is subtracted from the original signal to obtain the residual.This process is repeated: initializing the original signal, checking the number of extrema and the energy ratio of the residual signal, and sifting the residual to extract an IMF. If the energy ratio exceeds a threshold or the number of IMFs reaches a maximum limit, the decomposition stops. Subsequently, a band-pass filter with a range of 4–30 Hz was applied using a finite impulse response (FIR) filtering method.To identify outliers, a band-pass filter with a range of 4–30 Hz was applied using a finite impulse response (FIR) filtering method. Any amplitude exceeding 100 μV was flagged as outliers, channel by channel. On average, 0.50 ± 1.89% of trials were discarded, with up to 18.33% (N = 11).Spectral featureTo quantify the power of frequency bands, we applied filtering to the pre-processed signals within each frequency range corresponding to the four bands: theta (4–8 Hz), alpha (8–12 Hz), beta (12–30 Hz), and gamma (30–50 Hz). The power features were computed as the average of the absolute values of the filtered signals.Furthermore, we examined the EEG power discrepancy between the left and right hemispheres of the frontal lobe. The asymmetry feature was calculated by comparing the power values at the Fp1 and Fp2 channels for each frequency band. The equation for computing the asymmetry feature is as Eq. (1).$${\text{Asymmetry feature}} = log \left( {\frac{{\text{band power at Fp2}}}{{\text{band power at Fp1}}}} \right)$$
(1)
The frequency band was evaluated for four in the same range as the power features.Phase-amplitude coupling featureTo examine the cross-frequency coupling between the phase of low-frequency oscillations and the amplitude of high-frequency oscillations, we employed the modulation index (MI) proposed by Tort et al.41. This measure quantifies the degree of cross-frequency coupling by utilizing Shannon entropy and Kullback–Leibler (KL) divergence. The MI is known for its robustness against noise and the ability to handle short-length signals, making it a suitable approach for analyzing PAC in EEG data41.The pre-processed signal was subjected to band-pass filtering to isolate the desired frequency ranges for both the phase and amplitude components. Specifically, the phase frequencies were defined as theta, alpha, low beta (12–17 Hz), and high beta (18–30 Hz), while the amplitude frequencies consisted of low gamma (30–50 Hz) and high gamma (70–90 Hz). The phase of the signal and the envelope of the amplitude were then computed using the Hilbert transform applied to the filtered signal.Regression analysisThe regression analysis using EEG features was conducted utilizing a Python toolbox. The analysis aimed to examine the relationship between pain and quantified EEG features. Pain levels were determined based on a questionnaire completed by the subjects, specifically focusing on the ‘Actual Pain’ score in the BPI questionnaire among several surveys. In this study, we employed five regression models: linear regression, random forest regression, and support vector regression (SVR) models using linear, polynomial, radial basis function (rbf) kernels. The brief descriptions of the five models are explained as below.Linear Regression: The linear regression is a powerful statistical method used to investigate the relationship between a dependent variable and one or more independent variables42. Its objective is to estimate the parameters of a linear equation that best fits the observed data. The linear regression model can be expressed as Eq. (2).$$y = { }\beta_{0} + \beta_{1} x_{1} + \beta_{2} x_{2} + \cdots + \beta_{n} x_{n} + \varepsilon { }$$
(2)
\(y\) represents the dependent variable, pain score, \({x}_{n}\) are the independent variables, calculated features, and \(n\) is the number of features. \({\beta }_{n}\) denotes the regression coefficients for the intercept and independent variables, respectively. It estimates the regression coefficients using ordinary least squares (OLS), and \(\varepsilon\) is the error term, representing the unexplained variability in the dependent variable.Random Forest Regression: Random forest regression is an ensemble method using multiple decision trees that are trained on a random subset of the data43. In this study, we utilized the ‘sklearn.ensemble’ package in Python, specifically the ‘RandomForestRegressor’ class, to perform the estimation. The random forest regression model can be expressed as Eq. (3).$$\hat{f} = { }\frac{1}{B}\mathop \sum \limits_{b = 1}^{B} f_{b} \left( {x^{\prime}} \right)$$
(3)
\(B\) represents the number of iterations of randomly selecting samples and corresponds to the number of trees, \({f}_{b}\) is the regression tree trained with randomly selected \(x\), and \(\widehat{f}\) s the prediction value and represents the average of the predictions of all individual regression trees on the test sample \({x}{\prime}\) after training in Eq. (3). We set the maximum depth of the tree as 3.Support Vector Regression: Support vector regression is a modelling technique employed to capture nonlinear relationships between a dependent variable and independent variables by leveraging support vector machines44. In this study, we utilize the ‘sklearn.svm.SVR’ library to estimate the SVR model. The SVR model is represented as Eq. (4).$${\text{y}} = { }w^{T} \phi \left( x \right) + b$$
(4)
\(\phi \left(x\right)\) denotes a nonlinear function that projects the calculated features \(x\) into a high-dimensional feature space, \(w\) is the weight vector of the hyperplane, and \(b\) represents the bias value of the hyperplane. The main advantage of SVR lies in its ability to use kernel tricks to identify hyperplanes within the transformed feature space, thereby effectively capturing complex non-linear relationships between variables.We explore the effectiveness of three distinct kernel functions: linear, polynomial, and radial basis functions. Each kernel function provides a different approach to identifying the similarity between data points in the high-dimensional feature space. This enables SVR to handle various types of non-linear relationships present in the data and enhance its predictive capabilities for regression tasks.All 20 features were included in the analysis, consisting of eight band powers, four power asymmetries, and eight PAC asymmetries. Consequently, each model was evaluated using all possible combinations of features, resulting in a total of 1,048,575 (\(={\sum }_{k=1}^{20}{}_{20}{\complement }_{k}\)) combinations.To assess the performance of the regression models, R-squared was utilized as the evaluation criterion as Eq. (5) Ref.45. The performance was evaluated using a leave-one-subject-out approach, where each subject’s data was excluded from the analysis to test the model’s predictive ability.$$R^{2 } = 1 – \frac{{\sum \left( {y_{i} – \hat{y}_{i} } \right)^{2} }}{{\sum \left( {y_{i} – \overline{y}} \right)^{2} }}\user2{ }$$
(5)
\({y}_{i}\) is the true pain score, the actual pain score is the predicted score, and y is the mean of true label data of the pain score.Statistical analysisWe utilized Pearson’s linear correlation coefficient to examine the association between pain scores and EEG features, as pain scores are represented as a sequence indicator ranging from 0 to 10. To compute reliable correlation coefficients, we utilized Cook’s Distance to remove outliers46. A threshold of 85%, based on the percentile-based of Cook’s Distance, was employed. This ratio referenced the proportion of outliers detected within the feature data distribution.To validate the model, we generated surrogate data to examine the chance level. We generated 1000 surrogate data by randomly shuffling the label dataset, keeping the label proportions uniform. The dataset consists of 36 subjects, and there are 3.46 × 1029 permutations possible given the overlapping labels, which gives us a sufficient amount of data. We recalculated the regression models on the surrogate datasets for each combination of EEG features in each model. The chance level we reported is the average of 1000 datasets of sham data, and we found it to be statistically significantly different from the actual result (T-test, p < 0.001).

Hot Topics

Related Articles