Deep learning artificial neural network framework to optimize the adsorption capacity of 3-nitrophenol using carbonaceous material obtained from biomass waste

Characterization of CM-HCFigure 1 shows that the point of zero charge of CM-HC equal to 6.5. This result is similar with those obtained from other precursor materials such as Dipterocarpus alatus (pHpzc = 6.3)64, rice husk (pHpzc = 6.8)65. Therefore, when the solution pH is above than the pHpzc (pH > pHpzc), the surface of CM-HC is negatively charged and the cationic species will be preferentially removed, whereas values of pH are below than pHpzc (pH < pHpzc), the charge of the surface of CM-HC will become positive and then anionic species are preferentially attracted via electrostatic interactions66.Fig. 1XRD diffraction pattern of CM-HC is presented in Fig. 2. It can be observed that HCAC has a semicrystalline structure related to the amorphous region between 10 and 40º corresponding to the carbonic fraction due to the preparation and characterization of activated carbons from different precursors67.Fig. 2Scanning electron microscope (SEM) analysis was carried out in order to investigate physical surface morphology of CM-HC. The SEM micrographs of CM-HC (Fig. 3) show that the particles of the synthesized activated carbon have a rough surface and an irregular shape with a variety of randomly distributed cavities which can provide easy access transport toward the adsorption sites68. The elemental composition of the activated carbon was performed by energy dispersive X-ray spectroscopy (EDS) is also shown in Fig. 3. In CM-HC, the material consists predominately of carbon and oxygen, the summary of these two elements to be 97.9% per weight. The rest of the composition (2.1%) corresponds to metallic fractions (Ca, K, Na and Mg).Fig. 3SEM images and EDS analysis of CM-HC.The mean pore diameter of CM-HC calculated by BET equations were 2.1382, with a surface area of 124.15 m2/g. The difference in the surface properties can be attributed to the type of biomass precursor. Beker et al., reported that the adsorption of phenols is carried out in ulramicropores and micropore with diameters between 0.7 and 2 nm69. Therefore, it is suspected that both adsorbents can be usefully used for removal of 3-nitrophenol from aqueous solution, due to the smaller molecular diameter of 3-Nph (0.6202 nm)70.Sorption studyAdsorption isotherms at different solution pHThe study of pH effect on the removal of 3-NpH on MC-HC was performed at ambient temperature by adjusting the solution pH of 3-Nph at 3, 6 and 8. For each solution, initial concentration was varied from 25 to 1000 mg/L.Usually, at low pH values, anions are favorably adsorbed on the sorbent surface due to the presence of high concentration of H+ ions, while at high pH values, cations are more adsorbed on the sorbent surface as a result of high concentration of OH− ions71. In addition, it well known that the degrees of dissociation and ionization of organic compounds as well as the adsorbent surface charge depended to the pH solution72, therefore, it is important to study the effect of pH solution on the adsorption of 3-NpH on MC-HC.Figure 4a–c show the adsorption isotherms of 3-Nph on MC-HC at initial pH 3, 6 and 8, respectively. It can be seen that the adsorption capacity increases between pH 3 and 6, and then decreases for pH 8. This suggests that, the interaction of 3-Nph with CM-HC is more favorable in the acid than alkaline medium. In previous researches, the uptake for phenols in certain pH range present a dome-shaped curve25,73, which is attributed to the change in nature of adsorbent (surface charge) and adsorbate species at different pH69.Fig. 4Plot of adsorption isotherms for 3-Nph on MC-HC at (a) pH = 3, (b) pH = 6 and (c) pH = 8.At pH between 3 and 6, the surface of CM-HC is positively charged (pHpzc(CM-HC) = 6.7. Furthermore, in this pH range, 3-NpH mainly present as neutral species (pKa (3-Nph) = 8.3 at 298 K), however the concentration of its anionic form (C6H5NO−3) increases with increasing of solution pH and as consequently higher uptakes at pH = 6 compared to pH = 3, by means of electrostatic attractions between the surface charged positively and the anionic form of 3-NpH74,75. At pH = 8, the uptake of 3-Nph declined. This result can be attributed to the electrostatic repulsion force between the surface of CM-HC negatively charged and the abundance anionic form of 3-Nph. Previous investigations have reported similar behavior of phenol43,69,73,76, nitrophenols (2,4-dinitrophenol, 3-nitrophenl, 4-nitrophenol)42,76,77 and chlorophenols (2,4-dichlorophenol, 2-chlorophenol, 4-chlorophenol)42,76 adsorption on activated carbons.Nonlinear equations of Langmuir, Freundlich, Temkin, and Redlich-Peterson isotherm models were used to investigate the adsorption mechanism. Table 2 shows the equation´s parameters with their respective correlation coefficients (R2).Table 2 Langmuir, Freundlich, Temkin, and Redlich-Peterson isotherm parameters for the adsorption of 3-Nph on CM-HC at different solution pH.The results indicate that the Langmuir and Redlich-Peterson models were well described the adsorption data for CM-HC in the pH range studied with R2 values between 0.9805 and 0.9985 compared to Freundlich model (0.9454 ≤ R2 ≤ 0.9767) and Temkin (0.9768 ≤ R2 ≤ 0.9954). This result suggests the formation of multilayers on the adsorbent surface where the interaction between phenols and surface of carbonaceous materials is due Van der Waals’ forces and π–π interactions42. As shown in Table 2, the maximum adsorption capacities, qmax calculated from Langmuir isotherm model were 100.523, 128.625, and 87.284 mg/g, at pH = 3, 8, and 8, respectively. The values of Redlich–Peterson exponential constant, β are close to unity (0.794–0.944), this indicates that the Redlich–Peterson model reduces to the Langmuir model78.As can be seen from Table 3, the calculated values of error functions APE, SSE, ∆q (%), χ2, EABS and RMSE from the experimental data were lowest for Langmuir and Redlich-Peterson than Temkin and Freundlich isotherms. This result proves the applicability of Langmuir and Redlich–Peterson isotherm models to describe the adsorption mechanism of 3-Nph on CM-HC.Table 3 Values of error functions of adsorption isotherm models of 3-Nph on CM-HC at different solution pH.Effect of contact time on adsorption equilibriumThe study of adsorption kinetics was done at ambient temperature without any adjustment of solution pH. Figure 5a–c show respectively the adsorbed amount of 3-NpH (mg/g) by MC-HC versus contact time (min) with 3-Nph initial concentration (Ci) at 50, 100 and 250 mg/L. It can be seen that within 120 min, 80.1%, 81.7%, and 65.3% of 50, 100, and 250 mg/L of 3-NpH were removed. The higher rate of 3-Nph adsorption in this first stage can be attributed to the availability of adsorption site on the adsorbent surface79. After this time, adsorption capacity was gradually increased as increase contacted time reaching equilibrium, which varied from 180 to 360 min depending to initial 3-Nph concentration (the faster equilibrium time was found for Ci = 50 mg/L). This result can be attributed to the more availability of the uncovered surface area of the adsorbents at low solute concentrations76,80. In this study, the equilibrium time was found to be minor compared to other equilibrium times reported by other studies for the removal of phenols using different carbonaceous materials81. For Ci = 50, 100, and 250 mg/L, the adsorption capacity reached 8.97, 17.38, and 41.37 mg/g, at equilibrium time, respectively. Experimental data were analyzed using nonlinearized equations of pseudo-first-order (PFO), pseudo- second-order (PSO), and Elovich kinetic models. Table 4 displays the kinetic parameters of PFO (k1, qe), PSO (k2, qe), and Elovich (α,β) with their corresponding correlation coefficients R2.Fig. 5Variation of adsorption capacity of MC-HC against contact time (min) at (a) Ci = 50 mg/L, (b) 100 mg/L and (c) 250 mg/L.Table 4 PFO, PSO, and Elovich kinetic parameters for different initial concentrations of 3-Nph.The results showed in Table 4 indicate that for Ci = 50 and 100 mg/L, pseudo-second order model displays higher coefficient regression (R2 = 0.9840, 0.9892) in comparing to pseudo-first-order (0.9043, 0.8744) and Elovich (R2 = 0.8182, 0.8839) models. Also, it can observe that the calculated adsorption capacities obtained for the PSO model (qe,cal = 9.05 and 17.26 for Ci = 50, 100 mg/L, respectively) are agree to experimental data (qe,exp = 8.97 and 17.38, mg/g for Ci = 50, 100 mg/L, respectively). As seen in Table 4, the rate constants (k2) decrease with increasing of initial concentration confirming that the adsorption process was faster for lower initial concentration. For Ci = 250 mg/L, it can be seen that the nonlinear Elovich curve pass near to experimental data (Fig. 5c). Also, Elovich model gives higher value of R2 (0.9842) compared to PFO (R2 = 0.7834) and PSO (R2 = 0.9349) suggesting that the adsorption process is controlled by chemisorption mechanism.Elsayed et al.82 developed a biocomposite aerogel (Amf-CNF/LS) and investigated its efficacy in removing methylene blue (MB), rhodamine B (RhB), and cadmium ions (Cd2+) from synthetic wastewater. The study specifically explored the influence of contact time and stirring speed on the adsorption process. The results showed that contact time significantly impacts the adsorption capacity, with rapid increases observed within the initial minutes of exposure, suggesting a high affinity between the aerogel and contaminants. The equilibrium was quickly reached, indicating the aerogel’s efficiency in fast contaminant uptake, which is beneficial for practical wastewater treatment applications where quick removal is necessary. Stirring speed was another critical factor that influenced the adsorption efficiency. Higher stirring speeds improved the mass transfer of the adsorbate molecules to the aerogel’s surface, enhancing the adsorption rate. This adjustment helped to minimize the boundary layer around the adsorbent, facilitating faster adsorbate uptake.As shown in Table 5, the values obtained from the six error equations considered in this study (Eq X to Eq X) are minor for PSO (for Ci = 50 and 100 mg/L) than PFO and Elovich models, while for Ci = 250 mg/L, the error function values were lower for Elovich than for PFO and PSO models. This confirms the results obtained from the nonlinear of PSO, PFO and Elovich models (R2), and agree the feasibility of PSO model (for Ci = 50 and 100 mg/L) and Elovich model (for Ci = 250 mg/L).Table 5 Values of error functions of PFO, PSO and Elovich kinetic models of 3-Nph (Ci = 50, 100 and 250 mg/L) on CM-HC.Adsorption isotherms at different adsorbent dosageThe effect of adsorbent dosage on the adsorption capacity of 3-NpH was investigated at ambient temperature using isotherm experiments without modifying solution pH. In this study, dosages of adsorbent were 2, 4, 8 and 10 g/L, and initial concentrations of 3-Nph were varying from 25 to 1000 mg/L. Figure 6a–d show the equilibrium relationships between the 3-Nph concentrations in solution Ce(mg/L) and the adsorptive capacities at different dosages of CM-HC. It can be seen that for all adsorbent dose, the adsorption capacities of 3-Nph (qe (mg/g)) increase with the increasing of Ce (mg/L).Fig. 6Adsorption isotherms of 3-Nph on CM-HC with dosage (a) 2 g/L, (b) 4 g/L, (c) 8 g/L and (d) 10 g/L.The experimental adsorption isotherms data were analyzed using nonlinear equations of Langmuir, Freundlich, Temkin, and Redlich–Peterson isotherm models. Table 6 presents the isotherm parameters with adsorbent dose from 2 to 10 g/L. The results indicate that the adsorption isotherm was depended of adsorbent dosage. According to correlation coefficients showing in Table 6, Freundlich and Redlich–Peterson isotherm models were best described to the sorption data for 2 g/L of adsorbent dose, whereas, for CM-HC dosage between 4 to 10 g/L, Langmuir and Redlich-Peterson isotherm models were well to describe the adsorption process. As seen in Table 6, it was found that the Langmuir constants KL increased from 0.309 × 10−2 to 3.9217 × 10−2 L/mg with increasing of adsorbent dosage from 2 to 10 g/L, which indicates the high affinity at low dosage of CM-HC. Additionally, it can be noted that the value of Freundlich constant 1/n (adsorption intensity) correspond to 2 g/L was lower compared than adsorbent dosages between 4 and 10 g/L, which means that the adsorption of 3-Nph is more favorably at lower adsorbent dosage. The Redlich–Peterson parameters presented in Table 6, show that for all values of adsorbent dosage, the exponent constant β is between 0 and 1, which indicates a good adsorption78. In addition, for dosage 4, 8 and 10 g/L, the values of β were close to unity (β = 1.003, 0.878, and1.0267), therefore, the Redlich–Peterson model reduces to the Langmuir model to describe the 3-Nph adsorption, whereas for dosage 2 g/L, β < 1 (β = 0.421) and αRP, KRP >> 1 (αRP = 4.604 and KRP = 18.794), then the isotherm was approaching the Freundlich form, where KRP/αRP and (1- β) are related to KF and n Freundlich parameters, respectively83.Table 6 Langmuir and Freundlich isotherm parameters for the adsorption of 3-Nph on CM-HC at different dosages.Based on the values of error functions obtained by Eqs. (7), (8), (9), (10), (11), (12), it can observed from Table 7, that for adsorbent dosage 2 g/L, the Freundlich and Redlich–Peterson show the lowest values of APE, SSE, ∆q (%), χ2, EABS, and RMSE. In the case of adsorbent dosages from 4 to 10 g/L, Langmuir and Redlich–Peterson present the less values of error functions compared to Freundlich and Temkin isotherm models. These results were in agreement with the finding correlation coefficients and validate the studied isotherm models.Table 7 Values of error functions of adsorption isotherm models of 3-Nph on CM-HC at different dosage.As seen in Fig. 7a, it was found that as the amount of 3-NpH (mg/g) decrease from 236.156 to 79.441 mg/g when the adsorbent dose increases from 2 to 10 g/L. This result can be attributed to the split in the flux or the concentration gradient between solute concentration in the solution and the solute84. Figure 7b, shows that the removal percentage of 3-NpH increase by the increasing of the sorbents dosage, this result can be attribute to the increase of the number of sorption site available, thus allow the increasing of removal percentage of 3-NpH85.Fig. 7Effect of sorbents dosage (g/L), on the (a) amount of 3-NpH sorbed, qm (mg/g) and (b) percentage of 3-NpH removal; solution pH: 5.6.00; temperature: 297 K, agitation speed: 150 rpm; contact time: 24 h.Adsorption isotherms at different temperaturesAdsorption isotherms of 3-NpH at temperatures 300.15, 313.15 and 330.15 K on CM-HC are shown in Fig. 8a–c, respectively. It can be seen that the temperature has a significant effect on the removal of 3-Nph. As shown in these figures, it can observe that the adsorption capacity of CM-HC decreases with increasing of temperature confirming that the adsorption process of 3-NpH on CM-HC is controlled by an exothermic reaction. Previous studies of adsorption of phenolic compounds showed an exothermic process using oil palm shell activated carbon86, carbon black87, cattail fiber-based activated carbon88, and anaerobic granular sludge89 as adsorbents. In these studies, it was suggested that the increase of temperature may cause a breaking of attraction force between the adsorbate molecules and the active sites on the surface of carbonaceous materials leading to decrease of adsorption capacity90. The equilibrium data at temperature 300.15, 313.15 and 330.15 K were fitted using nonlinear equations of Langmuir, Freundlich, Temkin and Redlich–Peterson isotherm models. Table 8 shows the isotherm parameters with their correlation coefficients R2. It was found that for all temperatures studied, the R2 for Langmuir and Redlich-Peterson isotherm models are higher than Freundlich, Temkin isotherm models.Fig. 8Adsorption isotherms of MC-HC at temperature (a) 300.15, (b) 313.15 and (c) 330.15 K.Table 8 Langmuir and Freundlich isotherm parameters for the adsorption of 3-Nph on CM-HC at different temperatures.The maximum adsorption capacities of 3-Nph on CH-HC were 128.625 107.704, and 105.441 mg/g at 300.15, 313.15 and 330.15 K, respectively. Likewise, the values of Langmuir constant KL decrease with increasing of temperature, which indicate the higher affinity at lower temperature and confirming that the adsorption of 3-Nph is an exothermic nature. Furthermore, it can be noted that the values of parameter β are close to unity, confirming that the adsorption isotherms are best approaching to Langmuir model than Freundlich model. Table 9 presents the values of error functions. It can be observed that among four studied isotherm models, the Langmuir and Redlich–Peterson isotherm showed less values, confirming that the Langmuir model is the best fitted model for the range temperature studied.Table 9 Values of error functions of adsorption isotherm models of 3-Nph on CM-HC at different temperatures.Adsorption mechanismsThe proposed adsorption mechanisms of 3-Nph on CM-HC depended to the functional groups present on the surface on the adsorbent. In the recent study, we have found that the surface of CM-HC exhibits the following functional groups: OH, C=O, C–O, and C=C (aromatic ring)91. Therefore, electrostatic interaction can be carried out between the negative charge of OH, C=O, C–O groups with the positive charge of the nitrogen from 3-Nph. Also, the OH functional group may interact with the oxygen from phenol group by hydrogen bonding. Other interaction between the C=C group with the benzene from 3-nitrophenol by π–π interactions. Additionally, CM-HC has pores that may allow the 3-Nph molecules. Figure 9 shows the electrostatic interactions, hydrogen bonding, π–π interactions, and porous adsorption as a possible adsorption mechanism of 3-Nph onto CM-HC.Fig. 9Adsorption mechanism of 3-Nph on CM-HC.Adsorption and desorption cyclesFigure 10 shows the adsorption and desorption cycles of 3-Nph onto AB-HC. It can be seen that adsorption and desorption capacities (qads and qdes) of 3-Nph has decreased. The decrease in adsorption capacity can be attributed to the possible remnant of 3-Nph on the surface of the CM-HC due to the strong interaction between adsorbate and adsorbent, and incomplete desorption of 3-Nph using NaOH. In other studies, moderate adsorption of nitrophenols was observed after the first cycle, which is attributed to the destruction of the porous structure of the activated biochar under alkaline experimental conditions. To improve the reusability of CM-HC, it is suggested to use other eluents or even temperature of aqueous solution.Fig. 10Adsorption and desorption cycles of 3-Nph on CM-HC.Deep learning artificial intelligence framework to optimize the adsorption capacity of carbonaceous materialIn this section, the focus centers on the implementation of a deep learning92 artificial intelligence (AI) algorithm utilizing an artificial neural network93. Subsequently, optimization based on a genetic algorithm94 is applied. This comprehensive procedure is executed methodically, with the initial step encompassing data visualization and description, followed by AI implementation and optimization.Data visualization and descriptionRegarding data visualization and description, the entire experimental dataset is initially formatted and consolidated into a structure, as illustrated in Table 1095. Each column in this table contains variables such as time, initial concentration, pH, dosage, temperature, and the fitness function (removal percentage). Each row corresponds to an individual experimental run, mirroring the real experimental procedure, which can be resource-intensive. For instance, the execution of experiment run no. 8 required 1440 min. Despite the substantial time investment, consolidating the database at this juncture proves beneficial.Table 10 General format of the experimental data. Data publicly available at: https://dx.doi.org/10.6084/m9.figshare.24545587.To elucidate the data description, violin plots (Fig. 11) are employed for each of the five variables and the removal percentage, the latter being the fitness function. The violin plot is chosen for its ability to convey data properties akin to a box and whisker plot, while also providing insights into data distribution. An alternative method for understanding data distribution is through histograms, as depicted in the scatter matrix of Fig. 12.Fig. 11Violin plot of the variables and the fitness function.Fig. 12Scatter matrix with histograms between the design variables and the fitness function.The violin plot reveals distinctive patterns, such as a concentration of the time variable towards the upper end, aligning with the scatter matrix histogram. Furthermore, there is a noticeable concentration at the lower end, indicating durations less than 200 min. Notably, there is an abundance of experimentation for lower initial concentrations (< 200 units), a trend consistent with the scatter matrix results showcasing the highest data density in this range.The pH variable exhibits its highest density just below the neutral point, with bottlenecks observed around pH values of 4–5 and 7. This aligns with the scatter matrix, emphasizing lower probability density at the ends compared to a peak at 6. The dosage and temperature graphs share a similar pattern, with higher probability distribution for lower dosages (< 2 units) and temperatures (< 305 K). However, as dosages and temperatures increase, the number of data points decreases.The removal percentage’s violin plot correlates intriguingly with the histogram distribution in the scatter matrix. The highest frequency of removal percentage occurs at approximately 90 units, with a noticeable reduction in data points beyond this threshold. The upper end of the 90-unit range exhibits a more substantial data distribution compared to the lower end, suggesting a potential combination of the five design variables yielding maximum removal percentage.It should be noted that the dataset used for the ANN analysis comprises 87 entries, which may be considered small for robust ANN applications. This limitation could potentially affect the generalizability of the model. To address this issue in future studies, we propose to expand the dataset through additional batch experiments to enhance the training process and improve the model’s accuracy and generalizability. Further, employing cross-validation methods or increasing the diversity of the data points could also contribute to more reliable ANN predictions. These steps will help in mitigating the effects of the current dataset size and provide a more robust framework for ANN applications in adsorption batch experiments.In this phase, correlation heat maps incorporating coefficients from Pearson96, Spearman97, and Kendall98 methods have been successfully implemented and are visually represented in Fig. 13. Notably, all three heat maps exhibit a correlation coefficient of 1 along the diagonal, indicating pairwise correlations such as time-time, initial concentration with initial concentration, among others. Conversely, the data situated above or below the diagonal represents a mirrored image. Consequently, it is appropriate to focus solely on interpreting one side of the correlation heat map. The primary emphasis of this discussion is on the significance of the correlation coefficient values and the distinctions arising from the implementation of different correlation methods. It is imperative to acknowledge that, in all instances, a positively correlated coefficient is exclusively observed for time and dosage, with Pearson correlation coefficients of 0.26 and 0.38, respectively. It can be asserted that the relationship between time and the removal percentage does not exhibit a strongly positive correlation; however, dosage demonstrates a more robust positive correlation with a coefficient of 0.38. Nonetheless, this interpretation undergoes notable variations when examining the Kendall correlation, which implies that time possesses a more potent correlation coefficient with the removal percentage compared to dosage—an observation similarly noted with the Spearman method. This leads to the conclusion that the determination of whether time or dosage more significantly influences the removal percentage is inconclusive. Nevertheless, all three algorithms consistently affirm the existence of a positive relationship. The correlation coefficient of pH, as determined by all three methods, is negative. However, its value is in close proximity to zero, signifying that pH is not the most robust or influential variable affecting the removal percentage. A parallel trend is observed for temperature, displaying a negative correlation coefficient, suggesting that lower temperatures are preferable for achieving higher removal percentages. The most substantial correlation is identified for the initial concentration concerning the removal percentage, with values of −0.59 for both Pearson and Spearman correlations, and −0.45 for the Kendall correlation. This indicates that the initial concentration emerges as the most influential parameter for the removal percentage, with lower initial concentrations proving more conducive to achieving favorable outcomes.Fig. 13Coefficient of correlation of Pearson, Spearman, and Kendall.Deep learning artificial neural networkFor enhanced accuracy and mitigating multicollinearity issues within the dataset, an advanced approach involving artificial neural networks (ANNs)99 was implemented for deep learning purposes100,101. Previous attempts using traditional machine learning techniques such as multivariate regression analysis and support vector machine yielded unsatisfactory results, prompting the adoption of ANNs.The initial ANN configuration consisted of a single hidden layer comprising 10 neurons. Architecture optimization was pursued by employing the Adam102 optimizer in Python103, within the Google Colab environment104. Despite multiple iterations, the mean square error (MSE)105 ranged between 75.8409 and 156.8976, indicating suboptimal performance.To address this, manual iterations were conducted by varying the number of hidden layers and neurons, transitioning from a structure of 5-10-10-1 to 5-100-100-1. Unfortunately, this did not yield satisfactory results. The implementation was then transferred to MATLAB106, where the scaled conjugate method107, known for its suitability in regression fits for small and noisy datasets, was adopted.The optimized architecture identified was 5-14-14-1 (refer to Fig. 14). This architecture includes an input layer with five variables, two hidden layers with 14 neurons each utilizing a tansig activation function, and an output layer representing the removal percentage with a linear activation function.Fig. 14Optimal deep learning architecture.The Tansig function, or hyperbolic tangent sigmoid transfer function, is a popular activation function used in neural networks, particularly in the context of artificial neural networks (ANNs). Mathematically, it is defined as: \(tansig\left(x\right)=\frac{2}{1+{e}^{-2x}}-1\), where x represents the input to the function. This function outputs values that range from −1 to 1, making it particularly useful for modeling data that have been normalized to this range. The Tansig function is an S-shaped curve, similar to the logistic sigmoid function, but with outputs spread over a wider range on the y-axis. This characteristic allows the function to handle negative values more naturally and makes it beneficial for problems where the symmetry around zero can help in faster convergence of the learning algorithm. The use of the Tansig function in neural networks is advantageous because of its non-linear nature, which enables the network to learn complex patterns that linear models might not be able to capture. Additionally, the gradients of the Tansig function are stronger for values in the range closer to zero, which can lead to more effective and efficient training phases, especially during backpropagation where gradients are used to update the weights. In the context of our ANN model used to analyze adsorption batch experiments, employing the Tansig function as the activation helps the network in handling varying dynamics of the data while maintaining stable learning and convergence behaviors. This choice is crucial for the performance of our ANN, particularly given the constraints imposed by the relatively small size of the dataset.The iterative adjustment of bias and weight matrices in each epoch led to the optimal values at Epoch 228 (see Fig. 15), resulting in MSE values of 4.07, 18.406, and 6.2122 for the training, testing, and total datasets, respectively (refer to Table 11).Fig. 15Convergence criteria of the Mean Squared Error for the best architecture.Table 11 Statistical performance indicators for the best architecture.The coefficient of determination, corresponding to R-squared values, for training, testing, and the total dataset were 0.98759, 0.94280, and 0.98108, respectively. This confirmed the efficacy of the selected architecture and convergence criteria. The output generated from this optimized model was saved for further analysis and interpretation.Upon establishing the architecture and statistical characteristics of the proposed artificial neural network (ANN) model, the subsequent phase involves a comprehensive graphical visualization to assess the model’s performance. While statistical indicators provide crucial insights, graphical representations offer additional nuances that may not be discernible through numerical metrics alone.In Fig. 16a, the regression fit graph portrays the relationship between the target variable (experimental removal percentage) and the model’s output (simulated removal percentage). Ideally, a perfect linear correlation between these variables signifies an impeccable goodness of fit. The linear fit equation reveals a slope of 1.00, indicating a consistent rate of change between the target and output. However, a bias of −0.56 is observed, implying a slight deviation of the y-intercept from the origin, a phenomenon rationalized by the scarcity of data points below a removal percentage of 50. Notably, the graph includes representations of both training and testing data. Strikingly, the statistical performance of the training dataset aligns more closely with the linear regression line than the testing data. Instances of substantial deviation from the regression line in the testing data caution against over-reliance on this subset for optimization, as poorly predicted points may lead to local optima47.Fig. 16(a) Regression fitting for the optimal architecture. (b) Verification of the assumptions of the regression fit, including normality, independence, and homoscedasticity from left to right.Furthermore, adherence to key regression assumptions is imperative for robust model validation108. Figure 16b delves into the examination of normality, independence, and homoscedasticity of errors. The normality of errors is scrutinized using a QQ-plot, where the majority of data points closely adhere to the theoretical normal distribution line. However, six outlier data points, positioned at the extremes of the distribution, warrant careful consideration during the optimization phase. Independence of errors is assessed through a plot of residuals against experimental run (row number). The absence of a discernible pattern in this plot affirms the independence of errors. Intriguingly, the same six outlier data points identified earlier exhibit distinctive characteristics, emphasizing the need for meticulous handling during optimization. Finally, homoscedasticity of residuals is validated via a plot correlating predicted values with residuals. The absence of a defined pattern in this plot further confirms the homogeneity of residuals. Notably, the six outlier points exhibit unique behaviors in this context as well, underscoring their significance in the overall analysis.In conclusion, this multifaceted graphical analysis provides a comprehensive evaluation of the ANN model’s performance, ensuring a thorough understanding of its strengths and potential limitations in addressing the research objectives.The validation of our deep learning artificial neural network (ANN) is a critical step to ensure the reliability and accuracy of the model’s predictive capabilities. We employed a robust validation strategy that involves splitting the data into distinct sets: training, testing, and validation. This separation ensures that the model is not only trained but also fine-tuned and tested against unseen data. During the training phase, the model’s parameters are adjusted to minimize the error on the training set. Then, the validation set is used to tune the hyperparameters and prevent the model from overfitting, which is critical given the relatively small dataset. The final model’s performance is assessed on a separate test set, which provides an unbiased evaluation of its predictive power. The mean square error (MSE) and the coefficient of determination (R2) are calculated for each set to quantify the model’s accuracy and predictive performance. These metrics confirm whether the model can effectively generalize beyond the training data.Ensuring the repeatability of our ANN model involves detailed documentation of the model architecture, including the number of layers, the type of activation functions used (e.g., Tansig), and the optimization algorithms (e.g., scaled conjugate method). The iterative process of model training is set to reproducible conditions, with fixed seeds for random number generators and consistent training-validation-test splits. This practice is vital to achieve consistent results when the model is re-run under the same conditions or by other researchers. Additionally, the robustness of the model is tested through repeated runs, where the stability of the results (such as MSE and R2 values) is checked across different iterations to ensure that the outcomes are not anomalies but reproducible findings.Optimization using genetic algorithmThe employment of deep learning has facilitated the establishment of an empirical correlation between various tuning parameters and the fitness function, represented by the removal percentage. This correlation serves as a fitness function equation, enabling the formulation of an optimization problem. The objective of this optimization is to maximize the removal percentage by adjusting decision variables such as time, initial concentration, pH, dosis, and temperature. The decision variable bounds are defined as lower bound = [15, 50, 3, 1, 300.15] and upper bound = [1440, 250, 8, 10, 330.15]. To conform to typical optimization conventions geared towards minimization, the removal percentage is multiplied by −1. A noteworthy concern arises from the lack of a continuous dataset for digital twin modeling. Despite the high performance demonstrated during empirical modeling, the optimization process operates with a potential error margin due to the finite nature of the removal percentage (with a maximum value of 100).The optimization problem is expressed as follows:$$ {\text{Minimize}}: \, – {1 } \times {\text{ removal percentage subject to lower bound }} \le \, \left[ {{\text{time}},{\text{ initial concentration}},{\text{ pH}},{\text{ dosis}},{\text{ temperature}}} \right] \, \le {\text{ upper bound }}0 \, \le {\text{ removal percentage }} \le { 1}00. $$Prior to mathematical optimization, a critical step involves visualizing the optimization data to identify potential regions for improvement. Given the multivariate nature of the data (six dimensions), a parallel coordinate plot is adopted (Fig. 17). In this plot, red lines signify a higher tendency for improved removal percentage. Analysis of the red lines indicates that lower temperatures, reduced dosis, moderate pH values, moderate initial concentration, and extended time are favorable for optimization. However, a formal algorithmic application is required to substantiate these observations.Fig. 17Parallel coordinate plot.To address this, a single-objective genetic algorithm109 implemented in MATLAB, utilizing the ‘ga’ function, is employed for optimization. The chosen parameters include a population size of 50, a maximum of 100 generations, and parallel computing. The results of this optimization process are tabulated in Table 12.Table 12 Optimality based upon simulation and experimentation.In Table 12, simulation results stemming from data-driven optimization through a deep learning algorithm are presented. Notably, the simulated maximum removal percentage reaches 100.001, a theoretically ideal value with a tolerance level of 0.001. While this represents an idealized outcome, the simulation aids in identifying decision variable combinations leading to this optimality. However, acknowledging the inherent error margin in the digital twin modeling and optimization process, it is crucial to validate the true optimality through experimentation.A subsequent experiment, conducted under conditions aligning with the optimal points identified in simulation, yields a removal percentage of 98.77%. The percentage difference between this experimentally verified optimality and the simulated result is a mere 1.24%, demonstrating the success of applying deep learning with genetic algorithm optimization. Consequently, it is recommended to prioritize experimental optimality for robust validation.

Hot Topics

Related Articles