Detection and quantification of groundnut oil adulteration with machine learning using a comparative approach with NIRS and UV–VIS

Physicochemical properties of laboratory adulterated samplesAverage results for physical and chemical analysis of prepared adulterated samples are shown in Fig. 1. Peroxide value is frequently used to calculate the total hydroperoxide content and monitor the rate of lipid oxidation during refining and preservation of oil19. It displays the amount of peroxides measured in milliequivalents (mEq) of active oxygen per kilogram of oil20. Iodine value is a parameter which expresses the degree of unsaturation in oils which reflects the susceptibility of the oil to oxidation21. Free fatty acids are produced by hydrolysis of oils during oxidation making them prone to oxidation and rancidity22. From Fig. 1A–C, it can be seen that the pure sample (0%) was significantly different from the other samples and recorded a high peroxide value (14.86 ± 3.41 meq/kg) and free fatty acid (2.10 ± 0.20) as compared to the other samples. According to studies done by23, it was observed that groundnut oil and palm olein blends had decreasing values of free fatty acid, peroxide and iodine values with increasing concentrations of palm olein. Similar observations can be seen in the prepared laboratory adulterated samples. Palm olein has lesser free fatty acid due to the rigorous refining process it undergoes. This explains the decrease in free acid content of the oil blends. The decrease in iodine value could be due to the fact that increasing concentrations of palm olein decreased the levels of unsaturation characterized by groundnut oil. Lastly, decreased levels of peroxide value could be attributed to a decrease in peroxide concentration of the oil blends.Figure 1Average values of physical and chemical properties of pure and laboratory adulterated samples. (A) Peroxide value. (B) Free fatty acid. (C) Iodine value. (D) Color coordinate L*. (E) Color coordinate b*. (F) Color coordinate a* (p < 0.05). Different alphabets represent samples with significant differences.The color parameters of the samples were examined based on L*, a* and b* color coordinates. The L* value shows brightness, positive a* value shows redness, negative a* shows greenness, positive b* value shows yellowness and negative b* value shows blueness. From D (Fig. 1), the lighter sample (L*) was the pure sample (32.83 ± 0.86) while the darkest (least bright) was the sample with 50% adulteration (51.62 ± 0.65). From E (Fig. 1), the most yellow sample (+ b*)was the sample with 50% adulteration (32.22 ± 0.62). and the least yellow sample was the pure sample with a b* value of 7.08 ± 0.86. From F, the reddest sample (+ a*) observed was the sample with 40% adulteration (3.79 ± 0.28) and the least red sample (2.20 ± 0.24) was the sample with 20% adulteration. Overall, the colour analysis showed that all the oil blends recorded lightness within acceptable levels which is the preference of consumers.Raw and pre-treated spectra analysis for NIR and UV–VisFigure 2 shows the raw spectra of laboratory adulteration and market samples using NIR spectroscopy (A) and UV–Vis (B). raw spectra. Figure 3 shows the pretreated (sgol) spectra of laboratory adulteration and market samples using NIR spectroscopy (A) and UV–Vis (B).Figure 2Raw spectra of laboratory adulteration and market samples (A) NIR raw spectra. (B) UV–VIS raw spectra.Figure 3Preprocessed spectra of laboratory adulteration and market samples (A) NIR preprocessed spectra. (B) UV–VIS preprocessed spectra.From the raw spectra of the NIR spectroscopy analysis (Fig. 2A), the laboratory adulterated samples had absorbances up to 0.8 and the market samples had absorbances up to 1.0 from the NIR raw spectra. Several key peaks can be observed which correspond to various chemical bonds and functional groups in the samples. NIR spectroscopy typically detects overtones and combinations of fundamental molecular vibrations, which are usually associated with C–H, O–H, and N–H bonds. Peaks around 1200–1500 nm can be attributed to the second overtone of C–H stretching vibrations24. These are indicative of the aliphatic hydrocarbon chains present in oils. Peaks observed between 1600 and 1800 nm are likely due to combination bands involving C–H stretching and bending vibrations. This region can provide insights into the overall hydrocarbon content. Although less prominent, any peaks around 1400–1450 nm may suggest the presence of moisture or hydroxyl groups25. The peaks were still visible after preprocessing (Fig. 3A).From the raw spectra of the UV–Vis analysis (Fig. 2B), the laboratory adulterated samples had absorbances up to 3.5 and market samples had absorbances up to 4.0 for UV–Vis raw spectra. In the Ultraviolet–Visible (UV–Vis) preprocessed spectra, distinct absorption peaks are observed which can be linked to various chromophores and conjugated systems within the samples. Peaks in the UV region (200–400 nm) are generally due to electronic transitions in aromatic compounds or conjugated dienes. These peaks are crucial for identifying the presence of specific unsaturated compounds or antioxidants within the oils26. Peaks around 400–500 nm can indicate the presence of non-bonding electrons transitioning to anti-bonding π orbitals. These transitions are typical in compounds with lone pairs such as carbonyls. Peaks beyond 500 nm into the visible range can be attributed to the color compounds in the oils, such as carotenoids and chlorophylls27.The peaks were still visible after preprocessing (Fig. 3B).PCA scores for NIR and UV–VIS spectraIn order to group comparable samples closer together, PCA, an unsupervised pattern recognition technique, extracts important information. This results in the visualization of data trends in a three-dimensional space as shown in Fig. 4. In this sense, determining the differences between the various sample categories used might be done using the graphical output28. Based on PCA analysis of pretreated spectra of NIR spectra of differently adulterated and market samples, the first two principal components (PC1 and PC2) described a total variance of 99.76% (Fig. 4A). From the PCA score plot, samples from Abaobo market showed similarities with samples from market Central. Samples from Lamashegu market also showed similarities with samples with 0%, 1%, 20% and 30% of adulterant. Which indicates that the NIR spectra have distinct differences that can be effectively used to differentiate between the samples. The close grouping of samples from Abaobo and Central markets, as well as the overlap of Lamashegu market samples with adulterated ones, suggest similar chemical compositions and potential adulteration. For Near-Infrared (NIR) spectra, wavelengths associated with overtones and combinations of molecular vibrations such as C–H, O–H, and N–H stretching are important. These wavelengths contain information about the oil’s fatty acid composition, moisture content, and other minor constituents that affect the oil’s quality and stability.The absorption bands around 1200–1500 nm are significant due to their sensitivity to changes in fatty acid profiles, which directly relate to properties like free fatty acid content and iodine value29. Also, regions around 1700–1900 nm are indicative of moisture and other volatiles in the oil, influencing its peroxide value and overall oxidative stability. In the UV–Visible (UV–VIS) spectra, wavelengths beyond 500 nm are particularly critical as they correspond to the visible range where color compounds such as carotenoids and chlorophylls absorb. These compounds are indicators of the oil’s purity and quality, with specific peaks indicating the presence and concentration of these pigments.Wavelengths around 450–550 nm are associated with the yellow and green color components, which can differentiate between pure groundnut oil and adulterated samples due to their varying pigment contents. The clustering observed in the PCA plots reflects these differences, with distinct groupings based on the spectral signatures related to the oil’s physicochemical properties like color, peroxide value, and fatty acid.Figure 4PCA plot for laboratory adulterated laboratory and market samples (A–C) UV–Vis PCA plot for laboratory adulterated laboratory and market samples spectra (B). NIR PCA plot for laboratory adulterated laboratory and market samples.It can then be said that samples from Lamashegu market could have some form of adulteration as shown in the resemblance they have with the adulterated laboratory samples in the PCA score plot. For the pretreated spectra of UV–VIS, the first principal component PC1 showed a variance of 84.06% and PC2 showed a variance of 11.84%. From Fig. 4B, laboratory adulterated samples (0 to 50%) were in the negative variance in PC1 signifying similarities between them and samples from Lameshegu market. This suggests that UV–Vis spectra also effectively differentiate the samples but with less variance explained compared to NIR. The similarities in PC1 for laboratory adulterated samples and Lameshegu market samples reinforce the potential adulteration observed in these samples. Scree plot and PCA loadings for both UV–Vis and NIR dataset has been provided in the supplementary document (Figs. S1 and S2).LDA models for NIR and UV–VIS spectraLDA for NIRClassification plots and model performance parameters for the detection of groundnut oil adulteration can be seen in Fig. 5.The results of the LDA models built from the NIR spectra to classify the laboratory adulterated and market samples are shown in Fig. 5A. It was observed that there was a degree of overlap among all laboratory samples suggesting that they all contained groundnut oil and palm olein. Samples from Aboabo and Central markets appeared to be different from the laboratory adulterated samples but showed some overlapping with each other and with samples from Lamashegu market. Furthermore, samples from Lamashegu market showed overlapping with laboratory adulterated samples at 1%, 3%, 5%, 10%, 20% and 30% w/wconcentrations. For Fig. 5C, High sensitivity for laboratory-adulterated samples (close to 100% for 0%, 1%, 20%, 30%, 40%, and 50% adulteration). High specificity for these samples, as they can be accurately classified without significant overlap. Precision is also high, with most samples being correctly identified without false positives.Figure 5LDA plot for laboratory adulterated and market samples (A–C) LDA and model parameters for UV–Vis (B–D) LDA and Model Parameters for NIRS.LDA for UV–VisIn LDA model for the UV–VIS spectra, the laboratory adulterated samples showed a significant degree of overlap except for sample containing 1% adulteration as seen in Fig. 5B. In addition, all market samples showed overlapping patterns with each other but did not extend to the laboratory as seen in the LDA model for the NIR spectra. For Fig. 5D, Lower sensitivity for distinguishing between different levels of adulteration due to significant overlap among samples. Lower specificity as the model struggles to differentiate between similar samples. Lower precision due to higher misclassification rates. Recall and F1 values can be found in Tables S1 and S2 in the supplementary sheet. For the NIR spectra, the significant wavelengths are those that can effectively distinguish between laboratory adulterated samples and market samples. This includes the wavelengths where there is significant absorption due to the presence of groundnut oil and palm olein as depicted in the raw spectra. plots. For the UV–VIS spectra, although the LDA model showed significant overlap among samples.Confusion matrices from NIR and UV–VIS spectraTable 2 shows the confusion matrix for the classification of the laboratory adulterated and market samples. There was an average recognition of 95.09% and average prediction of 92.61%. The confusion matrix shows their accuracies of classifications and misclassifications. Laboratory adulterated samples with 0%, 1%, 20%, 30%, 40% and 50% concentrations could be classified with 100% accuracy. Higher levels of misclassification were observed for samples 3%, 5%, 10% and all market samples meaning they share similarities with each other. It is likely that they all contain groundnut oil and palm olein but in varying concentrations.Table 2 Confusion matrix developed from NIR spectra for laboratory adulterated and market samples.Table 3 presents the confusion matrix for the classification of both laboratory and market samples for the UV–VIS spectra. It shows an average prediction accuracy of 62.17% and an average recognition accuracy of 96.13%. Higher levels of misclassification were observed for all samples therefore making the LDA model built from the NIR spectra a better one as it showed better accuracy and less misclassification.Table 3 Confusion matrix for the classification of the laboratory and market samples for the UV–VIS spectra.Partial least square regression for NIRSPrediction of free fatty acid, iodine, peroxide value and colourAfter 18 different pretreatments were applied (Tables S3 to S6, supplementary sheet), R2 values ranged from 0.7735 to 0.94 while R2CV values ranged from 0.4712 to 0.7847. After cross validation, the errors (RMSECV) developed were all lower than 1 mL/100 mL for fatty acid prediction (Table 4). Savitzky-Golay smoothing (sgol) with filter 17 pretreatment produced the best accuracy for the prediction of palm olein in groundnut oil using NIRS (Fig. 6).Table 4 Partial least square values obtained from Savitzky-Golay pretreatment applied to the NIR spectra for the prediction of free fatty acid, iodine, peroxide values and color.Figure 6PLSR plot for the prediction of palm olein concentration on groundnut oil using NIRS using Savitzky-Golay Pretreatment (filter 17).For iodine values prediction (Table 4), it was observed that R2 value was 0.8775, while R2CV value was0.656. Again, the errors (RMSECV) after cross validation were below 1 mL/100 mL. Savitzky-Golay smoothing (sgol) with filter 17 pretreatment proved to have the best accuracy for predicting iodine value in groundnut oil regardless of the adulterant concentration.In the case of peroxide prediction (Table 4), R2 value was 0.7691, while R2CV had avalue of 0.5169. The errors developed after cross validation (RMSECV) were all lower than 3 mL/100 mL. Savitzky-Golay smoothing (sgol) with filter 17 then detrending (deTr) produced the best accuracy for predicting peroxide value in groundnut oil irrespective of the adulterant concentration.The coefficient of determination (R2), ranging between 0 and 1, is one of the indicators of a model’s quality. If the value is higher or nearer 1, it acts as a better model. If it is farther away from one, it indicates a poor model30. The root mean square (RMSE) measures the average prediction error made by the model in predicting the outcome for an observation. The lower the RMSE, the better the model since it indicates the stability of the models. Conversely, higher values for RMSE show many errors hence a bad model. The preferred models were chosen based on the pretreatment that gave the highest R2 value and the lowest RMSE values. The best models which produced the highest R2 and R2CV while producing the least RMSE and RMSEC values were obtained from the pretreatments that involved the use of Savitzky-Golay smoothing (sgol) with filter 17 and Savitzky-Golay smoothing (sgol) with filter 17 then detrending (deTr). These pretreatments resulted in R2 of 0.8973, R2CV of 0.7847, RMSE of 0.0731 and RMSECV of 0.1059 for free fatty acid prediction (Table 4), R2 of 0.8775, R2CV of 0.656, RMSE of 0.0961 and RMSECV of 0.1611 for iodine value prediction (Table 4) and R2 of 0.7691, R2CV of 0.5169, RMSE of 1.2348 and RMSECV of 1.7861 for peroxide value prediction (Table 4).The PLS model for prediction of colour in groundnut oil adulteration, with pre-processing involving Savitzky-Golay smoothing, showed R2 of 0.9434, R2CV of 0.8799, RMSE of 1.1477 and RMSECV OF 1.686 for lightness, R2 of 0.727, R2CV of 0.3074, RMSE of 0.2494 and RMSECV of 0.3971 for redness and R2 of 0.9439, R2CV of 0.8819, RMSE of 1.6093 and RMSECV of 2.3357 for yellowness. These results show a better model for colour prediction among differently adulterated samples since they had higher values of R2 while producing lesser values for RMSE.Partial least square regression for UV–VisOverall, NIRS could predict all the parameters of interest better than UV–VIS (Table 5). Only the concentration of palm olein, brightness (L*) and yellowness (b*) of the samples could be predicted with UV–VIS. These were the same parameters that were also best predicted using NIRS but with higher accuracies. Models for all the other parameters were weak. Figure 7 shows the PLSR plot for the prediction of palm olein in groundnut oil (the best model achieved with UV–Vis).Table 5 Partial least square values obtained from Savitzky-Golay pretreatment applied to the UV–VIS spectra for the prediction of free fatty acid, iodine, peroxide values and color.Figure 7PLSR plot for the prediction of palm olein concentration on groundnut oil using UV–Vis using Savitzky-Golay Pretreatment (filter 17).The PLS models for UV–VIS pre-processed spectra for the prediction of free fatty acid, iodine, peroxide values and colour parameters in groundnut oil adulteration with palm olein were less satisfactory since they produced most R2 values farther from 1 and larger RMSE values as compared to the PLSR models from the NIR pre-processed spectra.LimitationsSome limitations of this study included the small study area and sample sizes used. Future studies could explore wider study areas to produce a more comprehensive understanding.

Hot Topics

Related Articles