Three layered sparse dictionary learning algorithm for enhancing the subject wise segregation of brain networks

This section evaluated and validated the proposed algorithm by comparing it to the existing state-of-the-art source retrieval data-driven algorithms. This was accomplished by means of fMRI based data analysis, for which, two different fMRI datasets were considered, (i) simulated fMRI dataset for six subjects created using Simtb toolbox54, and (ii) block design experimental fMRI dataset for eight participants acquired from quarter 3 release of the Human Connectome Project (HCP)55,56. Notably, the proposed algorithm is applicable to task-related and resting state fMRI data, but only task-related dataset results have been provided in the paper to avoid increasing its length. The results of the resting state dataset are submitted as part of the Supplementary Material.The sparse group independent component analysis (sgICA)20,57, compressed online dictionary learning (CODL)22, adaptive consistent sequential dictionary (ACSD)58, sparse spatiotemporal blind source separation (ssBSS) method28, subject-wise sequential dictionary learning (swsDL)19, and subject-wise block dictionary learning (swbDL)19 are incorporated along with the proposed three-layered sparse dictionary learning (TLSDL) in the comparison study. The effectiveness of each algorithm was assessed based on how successfully it retrieved the spatial and temporal ground truth using the synthetic and experimental fMRI datasets. All algorithms were implemented in Matlab.Synthetic dataset generationIn line with earlier fMRI research investigations19,28, a dataset of six individuals was generated in MATLAB using the Simtb toolbox to replicate the experimental fMRI data. The synthetic fMRI dataset was produced for six individuals, using twelve distinct temporal sources, each comprising 300 timepoints with a repetition time (TR) of 1 second and twelve different spatial sources, each containing a grid of 50 by 50 voxels. The synthetic brain was mapped using the source IDs \(\{3, 6, 8, 10, 22, 23, 26, 30, 4, 12, 5, 29\}\) to produce spatial components. For each individual, a total of seven spatiotemporal sources were used to generate the fMRI data from the source signals. As elaborated in25, the first six spatiotemporal sources were universally seen in all six subjects, although with some degree of intersubject variability. Conversely, the last six sources were distinct, with each subject being assigned one.Figure 4Correlation values shown with respect to GT-SMs for (a,b) subject-wise analysis computed over 7 source components, 6 subjects, 30 realizations, and 3 temporal noise variance cases, and (c,d) group-level analysis computed over 7 components, 30 realizations and 3 temporal noise variance case. In both cases, the deviation from the mean values has been plotted as error bars. \(\rho \) values along the x-axis indicate the extent of spatial overlap.The HRF parameters, including delay, response dispersion, and undershoot, were modified with respect to shared temporal sources to introduce variability across subjects. Similarly, the variability across subjects for the common spatial maps was created by adjusting the parameters of the Gaussian distribution, namely the mean (\(\mu \)) and the deviation from the mean (\(\sigma \)). These parameters allowed for manipulation of the activations’ direction, position, and spread. This was achieved by applying a random translation with a mean of 0 and a standard deviation of 1 in both the x and y axes, a random rotation with a mean of 0 and a standard deviation of 0.9, and a random scaling with a mean of \(\rho \) and a standard deviation of 0.05, as seen in the middle row of Fig. 3. The spread parameter, denoted by \(\rho \), determines the structural range of the activations and generates seven unique instances of spatial overlaps. The values of \(\rho \) are \(\{3,6,9,12,15,18,21\}\), representing the Gaussian distribution’s mean. The top row of Fig. 3 shows the spatial sources range, demonstrating a degree of interdependence from moderate to large.The first subject contributes to the six common and one distinct sources, while the next five subjects each provide one distinct spatiotemporal sources. These sources were concatenated to create the ground truth time courses (TCs) and spatial maps (SMs) for group-level analysis, as seen in the left most column and bottom two rows of Fig. 3 serving as the primary case for the ground truth. The following subfigures illustrate the temporal and spatial sources used to generate each of the six datasets, represented by the equation \(\text {Y}_m=\sum _{i=1}^{7}(\text {tc}_i+\psi _i)(\text {sm}^i+\phi ^i)\). The noise-generating matrices, \(\Psi \in \mathbb {R}^{300\times 7}\) and \(\Phi \in \mathbb {R}^{7\times 2500}\), were created using Laplacian distributions with parameters \(\mathscr {L}(0,\,n_t)\) and \(\mathscr {L}(0,\,0.01)\), respectively. All datasets were generated using Laplacian noise and some outliers were manually added at time points 60, 120, 180, 240. Subsequently, the datasets \(\text {Y}_{m=1}^M\) (with \(M=6\)) were created and used by all source retrieval methods, contingent upon the values of \(\rho \), \(n_t\), and trial number.Figure 5(A) Spatial and temporal sources that were artificially generated, (B) spatial and temporal sources retrieved using swsDL, and (C) spatial and temporal sources restored using TLSDL. The absolute temporal and spatial correlation (\(\gamma \)) are shown for each source, with the total sum of correlation values on the left side. The red circles on the temporal sources derived from swsDL indicate the outliers that were effectively suppressed by TLSDL but were not reduced in intensity by swsDL.Synthetic dataset dictionary learningThe same parameter values were used across all algorithms to provide an impartial comparison wherever feasible. In the case of sgICA and CODL, the desired number of components to be obtained was determined as 12 since they were applied to the grouped/concatenated data where 18 components were retained using the subject-wise PCA. However, in the case of other algorithms, it was acknowledged that the exact number of underlying sources for experimental fMRI data is unknown. Consequently, more components were allowed to be trained for the simulated dataset instead of learning an equal number of components as the number of generating sources. The components were configured to a value of 12 for ACSD, ssBSS, swsDL, and TLSDL, and these values were applied on a per-subject basis. These algorithms, including CODL, were executed for a total of 30 iterations. The optimal initialization for each method was determined by evaluating the initialization scenario utilizing the data, random number generator, and DCT bases separately. The ssBSS method used random numbers generated from the standard normal distribution, whereas the ACSD, swsDL, and TLSDL algorithms employed DCT bases, and CODL utilized initialization from the concatenated data.We conducted several trials with various tuning parameter configurations. We selected the ones that produced the most favourable outcomes in terms of the similarity between the ground truth and the estimated sources. In the case of sgICA, twelve components were preserved after the second PCA reduction, although the first PCA yielded 18 components for each participant. The best values for sparsity and smoothing in the case of sgICA were found to be 3 and 50000, respectively. The tuning parameters for ssBSS were set as \(\zeta _1=60\), \(\lambda _1=3\), \(K_p=150\), \(K=14\), and \(P=7\). The best sparsity parameter for ACSD was found to be 12. The best tuning parameters for swbDL were found to be \(\lambda _2=8\), \(\alpha =100\) and \(K_1=12\). The optimal tuning parameters for swsDL were determined to be \(\zeta _2=\zeta _3=24\), \(\lambda _2=16\), and \(K=12\). The optimal parameters for TLSDL were found to be \(\zeta _2=\zeta _3=8\), \(\lambda _3=8\), \(\lambda _4=0.1\), \(\alpha =150\), and \(K_1=12\).Synthetic dataset resultsThe process of generating the synthetic dataset and learning from it using the participating algorithms was done several times for various instances of noise and spatial overlap to showcase the resilience and reliability of the proposed algorithm. In order to accomplish this, the experiment was repeated under the following conditions: (i) three different variance values (the square of the standard deviation) of the temporal noise, denoted as \(n_t=\{0.6, 0.9, 1.2\}\), (ii) seven values of \(\rho \) ranging from 3 to 21, which were gradually increased to enlarge the activation overlaps as depicted in the top row of Fig. 3, and (iii) a total of 30 trials.Furthermore, the source recovery was conducted for both subject-level and group-level analysis. The underlying source TCs/SMs were acquired by selecting the indices with the greatest correlation values after correlating each algorithm’s trained dictionary atoms/sparse code rows with the ground truth TCs/SMs. The correlation values were calculated based on the ground truth SMs (GT-SMs) and were stored as cTC/sSM. For each of the participating algorithms and seven instances of spatial overlap, we calculated and saved mcTC/mcSM, which is the average of the cTC/cSM values over all seven spatiotemporal sources, three realizations of temporal noise, and thirty trials of the experiment. For both temporal and spatial sources, the mean of mcTC/mcSM values over 6 subjects is plotted in Fig. 4a, b in the case of subject-wise analysis, and just the mcTC/mcSM values in the case of group-level analysis, because there was no need for the calculation of mean of correlation values over the subjects. Figure 5 shows the results of component-wise analysis related to the experiment that was repeated for \(\rho =8\) and \(n_t=0.6\) using swsDL and TLSDL to highlight the strength of the proposed algorithm. Figure 6 displays the pace at which the suggested algorithms converge and the evolution of average correlation values as a function of the number of algorithm iterations. It also shows the values recovered by the sparse matrix \(\Psi _m\) during the tenth iteration for the first subject.Figure 6Based on the number of times the algorithm is run, the average of the (a) correlation values for all spatiotemporal sources and (b) convergence rate. The deviation from the mean values has also been plotted as error bars. Temporal background noise recovered by the sparse matrix during the tenth iteration of the algorithm is shown in subfigure (c).Based on the results shown in Fig. 4, it can be inferred that the proposed TLSDL repeatedly achieved better results than all other algorithms in various source recovery situations comprising spatial/temporal features, subject-wise/group-level analysis, and instances involving different spatial overlaps. By also taking Table 2 into account, it was revealed that TLSDL had superior recovery performance for all spatiotemporal sources without exception. Despite the rise in noise intensity and spatial overlaps, its performance remained superior to all other competing algorithms. Its competitor swsDL came in second place for the subject-wise analysis. It retained its place as the runner-up for both subject-wise and group-level analysis. Surprisingly, ssBSS, in the case of spatial source recovery, outperformed ACSD, CODL, and sgICA when spatial overlaps started to increase. Furthermore, it is worth noting that the proposed algorithm consistently exhibited a considerably higher level of performance for both subject-wise and group-level analysis without exception.Table 2 The average correlation values for 12 spatiotemporal sources (ST1–ST12) over 30 trials, 3 noise instances, and 7 \(\rho \) values are shown for group-level recovery.Figure 4 further revealed that the TLSDL algorithm consistently outperformed all other algorithms when it came to source recovery circumstances for both spatial/temporal feature instances and spatial overlap cases. It had the highest recovery rate, despite the fact that there was a significant degree of spatial dependency among sources and a high amount of temporal noise. It is important to note that despite the fact that its retrieval capability reduced as the number of spatial overlaps increased, it continued to retain its greater recovery performance. In contrast, the swsDL method came out on top as a runner-up, despite the fact that it had relatively lower deviations from the mean correlation value for both the spatial and temporal recovery scenarios than TLSDL. The ssBSS method, much like the swsDL algorithm, displayed reduced standard deviations; nonetheless, the findings of its mean correlation were not very spectacular. Overall, as spatial overlaps kept increasing, the TLSDL algorithm performance stayed relatively better.Figure 7The visual cue (A), left toe (B), left finger (C), right toe (D), right finger (E), and tongue (F) movement tasks of the block design dataset were analyzed using five different methods: (a) CODL, (b) ACSD, (c) swbDL, (d) swsDL, and (e) TLSDL. The extracted common activation maps were thresholded at a significance level of \(p<0.001\) using random field correction. The correlation values that correspond to these spatial maps are provided in Table 3.Based on the information provided in Fig. 5C, it can be inferred that TLSDL effectively eliminated all outliers and, as a result, achieved a better overall temporal correlation strength compared to swsDL. In addition to this, one can also observe that SMs recovered by TLSDL had better correlation strength because of its higher capability of segregating spatial maps.Figure 6c shows the background noise eliminated from dictionary atoms during the tenth iteration of the algorithm. It clearly clipped the outliers from the TCs. It can be deduced from the data shown in Fig. 6b that both the TLSDL algorithm and the swsDL approach converged at an equal pace. This tendency was also seen in Fig. 6a, where it was observed that the correlation strength for both TLSDL and swsDL practically stopped rising after the tenth iteration. On the other hand, when the number of iterations increased, both ACSD and ssBSS continued to demonstrate progress in source recovery, although the ssBSS technique converged slowly.A source-wise group-level analysis was performed using 30 trials, 3 temporal noise instances, and 7 spatial dependence situations on 6 algorithms, and the findings are shown in Table 2. In general, it is clear from this data that TLSDL emerged as the victor in the source recovery of each spatiotemporal component, with swsDL coming in as the runner-up.Block design datasetThe motor task’s 3T MRI raw block design dataset was acquired from the HCP’s quarter 3 release. The experimental specifics may be found in the reference19,55,56. The dataset, which depicts the brain’s motor cortex, was obtained during a 204-second experiment. The experiment provided the participants with instructions to react to visual stimuli by tapping either their right or left fingers, pinching their right or left toes, or moving their tongues. Participants performed a specific motor task lasting 12 seconds after a three-second visual cue. A total of ten movement tasks were taken into account, including two tongue motions, movements of the left and right fingers, and movements of the left and right toes. There were a total of 13 blocks, three of which were fixation blocks lasting for 15 seconds each. Six simulated MHRs were generated using the standard hemodynamic response function and task stimuli associated with five distinct forms of movement: tongue (T), left toe (LT), right toe (RT), left finger (LF), right finger (RF), and visual type cue (VC). The purpose was to get accurate ground truth time courses (TCs). After normalisation, these six MHRs were also stored in the matrix \(\text {T}\) to be used by the propose method.The fMRI scan for each individual was acquired using a Siemens 3 Tesla (3T) scanner. The following were the acquisition parameters: The data acquisition parameters for the MRI scan are as follows: there were 72 consecutive slices with isotropic voxels measuring \(2 \ \textrm{mm}\) each. The echo time (TE) is \(33.1 \ \textrm{ms}\), the repetition time (TR) is \(0.72 \ \textrm{s}\), and the field of view (FOV) is \(208 \times 180 \ \textrm{mm}\). The flip angle (FA) is 52\(^\circ \), and the matrix size is 104 by 90. The slice thickness is 2 mm, and the echo spacing is \(0.58 \ \textrm{ms}\). A total of 284 EPI volumes were collected, and the first five scans were discarded. The dataset used in our analysis comprised eight people, ages spanning from 22 to 35.Block design dataset preprocessingThe block design dataset was preprocessed using the SPM-12 toolbox4, as mentioned in the paper19. The dataset’s preparation processes, which include masking, realignment, normalizing, and spatial smoothing, are extensively described in25,59. To mitigate motion artifacts, functional images were aligned with the first image. Subsequently, all images were subjected to spatial smoothing using a Gaussian kernel with a full-width at half-maximum (FWHM) of \(6\times 6 \times 6\) \(\textrm{mm}^3\). This was done after spatial normalization to a Tailarach template and resampling to \(2 \times 2 \times 2\) \(\textrm{mm}^3\) voxels. Next, an endeavour was made to exclude data beyond the scalp during the masking phase. The dataset with four dimensions was then transformed and saved as a two-dimensional matrix called \(\text {Y}\) for each subject to be regarded as a complete brain dataset.After completing this phase, a dataset \(\text {Y}\) was generated with a size of \(279 \times 236115\) for the m-th subject, where m ranges from 1 to 8. Subsequently, temporal filtering was implemented on the \(\text {Y}\) matrix derived from all individuals. This included the implementation of a low-pass filter using FWHM to eliminate high-frequency physiological noise and a high-pass filter utilizing DCT to eliminate low-frequency trends. An FWHM cutoff of 1 second and a DCT filter cutoff of 1/150 Hz were used. Following the previously explained methodology, the variable \(\text {Y}\) was normalized to have a mean of zero and a variance of one.Figure 8The group-level activation maps using five different methods: (a) CODL, (b) ACSD, (c) swbDL, (d) swsDL, and (e) TLSDL that were threshold using a random field correction with a significance level of \(p<0.001\). The activation maps were derived for six different templates: (A) medial visual, (B) occipital pole visual, (C) lateral visual, (D) default mode network, (E) frontoparietal right, and (F) frontoparietal left.The correlation values corresponding to these spatial maps are reported in Table 3.Block design dataset dictionary learningWe experimented with a diverse array of tuning parameter combinations. The ones that yielded the most accurate results in terms of the degree of similarity between the recovered sources and the ground truth were considered. The dictionary initialization process used concatenated data, random numbers, and DCT bases for CODL, ssBSS, and ACSD/swbDL/swsDL/TLSDL, respectively. These initializations were agreed upon after testing each for the competing algorithms. The dictionary learning algorithms were executed for 15 iterations, except for the CODL and ssBSS algorithms, which were executed for 30 iterations due to their slow convergence.Table 3 The greatest correlation values obtained using seven techniques, where best values are highlighted in bold, between the averaged spatial maps and RSN templates and between the group-level dictionary atoms and MHRs.During the process of dimensionality reduction for sgICA, a total of 100 components were preserved after using PCA for the first time. Subsequently, when PCA was used for the second time, only 60 components were maintained. The exact number of components was also retrieved using sgICA. The sparsity parameter for sgICA was set to 5, while the smoothing value was set at 50000. For the ssBSS algorithm, a total of 60 components were selected from the PCA, and 40 of them were retained, and its remaining parameters were assigned the values of \(\lambda _1=16\), \(\zeta _1=50\), \(K_p=60\), \(K=60\), \(P=40\). The CODL training process included training 70 dictionary atoms, with a sparsity parameter set to 6 and a batch size set to the data dimension. The ACSD algorithm was used to train 40 dictionary atoms for each subject, with a sparsity parameter set to 60. Both swbDL and swsDL were trained using a total of 40 atoms. The tuning parameters for swbDL were set as \(\lambda _2=12\) and \(\alpha =3000\), while for swsDL, the tuning parameters were set as \(\zeta _2=\zeta _3=48\) and \(\lambda _2=25\). The optimal parameters for TLSDL were found to be \(\lambda _4=0.15\), \(\alpha =150\), \(\lambda _3=12\), \(\zeta _2=\zeta _3=8\), and \(K_1=40\).Block design dataset resultsIn this section, the unavailability of task-related activation maps prompted us to use temporal analysis with six created MHRs as our chosen method of investigation. Similarly, we used Smith’s templates because resting state networks (RSNs) do not include temporal profiles. These MHRs and RSNs also became the entries of matrices \(\text {T}\) and \(\text {S}\), respectively. Both subject-wise and group-level techniques served as the basis for the study; however, for the sake of this article, we will only be discussing group-level analysis. When doing the subject-wise analysis, the TCs and SMs for each subject obtained from ACSD, swbDL, swsDL, and TLSDL were considered. Meanwhile, group-level TCs and SMs were trained for CODL and sgICA, and the individual TCs/SMs were obtained through back reconstruction. On the other hand, the group-level TCs and SMs produced by each competing algorithm were used as a reference for the group-level evaluation. The common TCs and SMs for ACSD, swbDL, swsDL, and TLSDL were extracted for the purpose of group-level analysis. The discussions in the subsection on group-level TLSDL were used to extract these TCs and SMs. Ultimately, these common TCs were correlated with MHRs and averaged SMs were correlated with RSNs. We retained the atoms and sparse codes with the most significant correlation values. One may consult the ideas mentioned in25 to get further information on this extraction.Only a limited number of activation maps and temporal dynamics have been shown to avoid prolonging the study, even though there were quite a few of them for the block design dataset. Figure 7 shows a series of two-dimensional images blended to illustrate three-dimensional volume for various group-level movement activities. Similarly, the results of the resting-state study, in this instance, only represent a subset of the total. For example, refer to Fig. 8, which presents the SMs for the first four and last two resting state network templates. According to the data shown in Table 3, the proposed TLSDL algorithm surpassed all other algorithms in terms of overall performance. It generated atoms and sparse codes with the greatest correlation with the ground truth, while the swsDL algorithm came in second place. The TLSDL algorithm successfully recovered all task-related and resting-state networks with highest specificity.The spatial maps shown in Fig. 7 make it abundantly evident that, in comparison to the activations discovered by other approaches, the activations discovered by swbDL, swsDL, and TLSDL are more exclusively confined to the motor area. According to Table 3, it is possible to assert that, when considering correlation values, TLSDL yielded superior results compared to the other approaches. This was also supported graphically by Fig. 8, demonstrating that the resting state networks recovered by TLSDL were more specific and consistent in terms of sensitivity than those generated by other methods.The correlation strength accumulation and convergence rate of the CODL, ACSD, swbDL, swsDL, and TLSDL algorithms are shown in Fig. 9, which is presented in terms of algorithm iterations. From this vantage point, it is clear that the TLSDL algorithm gradually gained an advantage over its contenders. The outliers recovered by the TLSDL in the sparse matrix for the first subject during the fifteenth iteration are also shown in this figure.Figure 9The mean (a) correlation values, and (b) convergence rate for subject-wise dictionary learning calculated as a function of algorithm iterations. Temporal background noise recovered by the sparse matrix during the fifteenth iteration of the algorithm is shown in subfigure (c).

Hot Topics

Related Articles