Cloud computing load prediction method based on CNN-BiLSTM model under low-carbon background

Experimental platform and parameter settingsThis experiment uses the PyCharm platform and Python language for programming, mainly using numpy, pandas, tensorflow, scikit-learn and other programming packages. In terms of hardware, the experiment was completed on a personal desktop computer, using Windows 10 system, and the CPU was Intel(R) Core\(^\text{TM}\) i7-8700.During the experiment, some parameters have been defined for the model’s architecture. According to the hybrid model method proposed, the parameters of the CNN and BiLSTM network are shown in Table 1.Table 1 Parameter settings.Data setThis paper uses the Google cluster data set, containing load data for about 12,000 machines. It runs more than 670,000 applications and about 4000 tasks. The information collected includes disk input and output time, page cache, CPU utilization, memory usage and so on. Four characteristics of cloud scheduling are analyzed: machine and workload heterogeneity and variability, highly dynamic resource demand and availability, predictable but poorly predicted resource needs, resource class preferences and constraints.
CPU utilization is the percentage of time that the CPU takes to process non-idle tasks, which reflects the busyness of the current CPU. When describing the cloud computing load, the data set can appear the following three typical scenarios, as shown in Fig. 8. First, the CPU utilization rate is low. Second, the CPU utilization rate is in the average state, and the equipment is in the range of 10\(\%\) -20\(\%\) for a period of time. Third, the CPU utilization rate is high, and it is in a busy state for a period of time. This also shows that the cloud platform load has the dual dimension characteristics of time and space.In this paper,the CPU utilization rate is set as the predicted value, and the first 80\(\%\) of the collected data is selected as the training set, and the last 20\(\%\) is used as the test set.Figure 8Different CPU busyness scenarios.Prediction and analysis of cloud computing LoadThe server load is analyzed from the time dimension. The prediction results of the data in the four models of BP, LSTM, BiLSTM, and CNN-BiLSTM are shown in Fig. 9.Figure 9The effect of load prediction under different models.It can be seen from Fig. 9 that the x-axis is the number of data in the test set, the y-axis is the CPU utilization, and the test set has about 3000 sets of data. In our selected test set, CPU utilization is concentrated at 0-20\(\%\), which is a low running rate. With the advancement of time, CPU utilization is also constantly changing irregularly. The prediction results of the three groups of models can generally cover the real value, but at the high utilization time of the server mutation, that is the peak time of the image, the error of the BP model and the LSTM model is larger. Especially in the BP model, there are still a lot of negative values, resulting in a large error in the results. In order to facilitate comparison, some experimental results of four groups of models and real values are drawn in Fig. 10.Figure 10Comparison of cloud load prediction results of different models in time dimension.Figure 10 selects 300 sets of test data in the continuous time dimension, and the y-axis is the CPU utilization rate. It shows the results of the four models, the prediction gap of each model can be seen more clearly. Taking the data in the left blue frame as an example, it can be seen that the true value of the left point is slightly higher than that of the right point. In a short period of time, it lags behind from the utilization rate of about 5\(\%\), close to 0 point, and then returns to 5\(\%\) height. Among the four groups of models, only the CNN-BiLSTM model achieved two-point approximation, and the difference between the two points of the other three groups of models was large, and the right point was higher than the left point. It shows that only the CNN-BiLSTM model can better predict the actual situation in the environment of short-term rapid change of CPU utilization. Further, we compare the changes of the model at the peak point. Taking the high point of the true value closest to 300 groups as an example ( orange dot part ), the CPU utilization rate is in the range of 8\(\%\)–10\(\%\). Among the four groups of models, only the CNN-BiLSTM model points fall in the range of 8\(\%\)–10\(\%\), and the remaining three groups of models fall in the range of 4\(\%\) or less, with a large error. Therefore, the prediction effect of CNN-BiLSTM model is better.In order to further show the spatiality of the model, 3D images are drawn for the four models and the real values are shown in Fig. 11. The x-axis represents five sets of data of four models and real values, the y-axis represents the task execution time, and the z-axis represents the CPU utilization. It can be seen that the task duration is mostly concentrated in the range of 5 minutes, and the task duration is not necessarily proportional to the CPU utilization. As the duration increases, the CPU utilization may decrease, which also brings difficulty to the prediction. So among the four models, the CNN-BiLSTM model performs better at the curve mutation point and peak value.Figure 11Comparison of cloud load prediction results of different models in spatial dimension.Further, the evaluation index is used to describe the model results. The three evaluation indexes are mean square error (MSE), mean absolute error (MAE) and fitting degree (R-squared). MSE is used to describe the error between the predicted value and the true value, MAE is used to describe the true situation of the predicted value error, and R2 is used to describe the fitting degree of the model. The values of the three indicators are in the range of 0-1. The smaller the values of MSE and MAE are, the smaller the error is, and the larger the R2 value is, the better the fitting degree is.Compared with reference12, the CEEMDAN-ConvLSTM model is used to decompose the data sequence by modal decomposition technology, and then the ConvLSTM combination model is used to learn and predict the CPU utilization. It selects the state of a host in 3 days from the Google load data set, a total of 835 data, and evaluates the prediction results with RMSE, MAE and R2 values. For the convenience of comparison, the RMSE index of the literature12 is converted into MSE, and the formula is as follows.$$\begin{aligned}{}&\begin{gathered} MSE=\frac{1}{m}\Sigma _{i=1}^{m}(y_{i}-y_{j})^{2} \\ RMSE=\sqrt{\frac{1}{m}\Sigma _{i=1}^{m}(y_{i}-y_{j})^{2}} \\ RMSE=sqrt(MSE) \end{gathered} \end{aligned}$$
(11)
The reference results are shown in Table 2. It can be seen that the R2 values of the three methods are relatively high, which may be related to the small number of samples selected. In addition, the three models have the problem of high RMSE and MAE values, indicating that the model prediction error is large, especially for the burst of CPU utilization extremum, the model performs poorly.Table 2 Evaluation of load forecasting results in reference12.The evaluation of the prediction results about the CNN-BiLSTM model proposed in this paper is shown in Table 3. It can be seen that the prediction effect of the four models is gradually improved. Compared with the evaluation index of CEEMDAN-ConvLSTM model12 in Table 2, although the R2 value decreased slightly, but the MSE and MAE values decreased significantly, indicating that the error between the predicted value and the true value of the model was small, the model stability was higher, and the prediction was more stable in the face of sudden extreme values. Meanwhile, there are more than 3000 test set data in this paper, which is nearly 4 times that of the reference12, and the value of R2 can be maintained at 95\(\%\), indicating that the model prediction effect is good.Table 3 Evaluation of load forecasting results of the model in this paper.Prediction and analysis of cloud computing carbon emissionsAccording to formula(8), the power of the server is calculated first. The static and peak power of the three types of machines summarized by Gao et al.19 are shown in Table 4. Here we take the Intel E5345 server as an example, the peak power is selected to be 335w, and the static power is 223w. According to the predicted CPU utilization \(U_n\), the power P and energy consumption W are calculated according to formulas (8) and (9), as shown in Table 5.Table 4 Three types of server power statistics.Table 5 Server power and energy consumption prediction within 1 h.Figure 12Changes in power and energy consumption of different machines within an hour.It can be seen from Table 5 that the machine performs a total of 17 sub-tasks within one hour. The execution time of each sub-task is different, and the corresponding CPU utilization is different. The final power P and energy consumption W are also different. On the whole, the longer the execution time, the higher the CPU utilization, the greater the value of P and W, so it is positively correlated. In order to further analyze the power, the power and energy consumption of this machine and the other two machines are shown in Fig. 12.In Fig. 12, the power and energy consumption changes of the three machines within an hour are plotted. The x-axis is the execution time of each server’s sub-task. Figure 12a performs a total of 17 sub-tasks, and the task time interval is short; Fig. 12b performs a total of 7 sub-tasks, and the sub-task execution time is longer; Fig. 12c performs a total of 13 sub-tasks. The y-axis is the energy consumption of the machine, and the energy consumption increases with time, but the rising curves of the energy consumption about the three machines are slightly different. Finally, the energy consumption of the three servers is 0.233kwh, 0.254kwh and 0.236kwh. Taking the machine with ID 3232617906 as an example, its time is within the time range of 0.86–0.87, and the energy consumption growth has briefly stagnated, which is due to the short execution time of sub-tasks in this period. Meanwhile, at the top of each sub-graph of Fig. 12, the image of the server’s power is drawn in a blue column. The machine with ID 3232617906 has little change in overall power except for the third sub-task, which also causes the phenomenon of platform period in the image. The latter two sub-tasks’ power of the machine with ID 266853480 is larger, and the execution time is longer, resulting in an instantaneous increase in energy consumption. The machine with ID 4820236868 is affected by execution time and power, so the overall energy consumption shows a relatively standard positive function curve, which is an ideal energy consumption change state.Through analysis, it can be seen that even for the same type of equipment, the actual energy consumption changes are different within the same execution time. Taking the three equipment in Fig. 12 as an example, the energy consumption per hour of high-energy equipment and low-energy equipment can be different by 0.024 kwh. Assuming that there are 30 servers in the data center, there can be a difference of 17.28 kwh in one day. According to the traditional calculation method, the power of the equipment is unchanged, and the actual situation can not be described correctly. Therefore, the prediction method proposed in this paper has certain guiding significance.Furthermore, taking the machine with ID 4820236868 as an example, the carbon emission within 24 hours of work is calculated according to formula (10) as shown in Fig. 13. It shows the carbon emissions of sub-tasks of the machine within 24 hours. It can be seen from Fig. 13a that the carbon emissions of sub-tasks are concentrated in the range of 0.01-0.012, and the carbon emissions between sub-tasks are quite different and irregularly distributed. Taking the tenth hour as an example from Fig. 13b, the carbon emission is close to 0, indicating that the equipment is in a low energy consumption state during this time.Figure 13Changes in carbon emissions of the server within 24 h.In order to further analyze the carbon emissions of server, the calculation method proposed in this paper is compared with the traditional method. In daily life, if carbon emissions are not linked to CPU utilization, the default power will remain unchanged. Taking the Intel E5345 server used in this experiment as an example, its peak power is 335w, and the static power is 223w. It is assumed that the default daily power is the mean power, that is, 279w. We calculate its carbon emissions within 24 h in formula (12). Comparing the growth of carbon emissions of the same equipment under the two calculation methods is shown in Fig. 14.$$\begin{aligned}{}&C_{m}=279*24*0.5703/1000=3.8187\quad (kg) \end{aligned}$$
(12)
Figure 14The change of carbon emissions within 24 h of the server under different algorithms.In Fig. 14, the x-axis is time, the y-axis is carbon emissions, the red curve is the result of the calculation method proposed in this paper, and the yellow curve is the result of the traditional calculation method. Under the calculation method proposed in this paper, the carbon emission of the equipment in 24 hours is 3.1217 kg, the traditional method is 3.8187 kg, and the difference of one equipment in 24 hours is 0.697 kg. And the machine with ID 4820236868 selected in this paper is shown in Fig. 12c. Its growth presents a standard linear function curve. If there is a platform period or sudden increase point in the growth, the difference in carbon emissions will be greater.From the actual situation, when the data room uses the method proposed in this paper to predict carbon emissions, it can effectively adjust dehumidification, refrigeration and other equipment. Figure 14 shows the curve of carbon emissions of 2.5kg. If the carbon emissions exceed 2.5kg, the data center cooling system will be enhanced. According to the analysis of the traditional prediction results, the cooling will need to be enhanced at the 16th h. According to the prediction method proposed in this paper, the threshold will not be reached until the 20th h. The error between the predictions will further increase the burden of carbon emissions in the data center.It can also be seen from the curve drawn that the dynamic carbon emission prediction model established in this paper can achieve flexible prediction, while in the traditional way, the task execution time and carbon emissions are positively distributed in a linear function in continuous time. Therefore, the calculation formulas of carbon emissions in two ways can be summarized as follows.$$\begin{aligned}{}&\begin{gathered} C_{m} = P*T_{i}*0.5703/1000 \\ C_{m} = \biggl \{\begin{array}{ll} \sum \limits _{i=1}^{n} [(P_{n}^{max}-P_{n}^{min})*U_{n}^{CPU}+P_{n}^{min}]*T_{i}*0.5703/1000 &{} U_{n}^{CPU}>0 \\ \qquad \qquad \qquad \qquad \qquad 0 &{} \text{ otherwise } \end{array} \end{gathered} \end{aligned}$$
(13)
In the formula (13), \(C_m\) is the carbon emission for a certain period of time. In the traditional method, P is a fixed value, and the value of \(C_m\) is only affected by the change of time \(T_i\) The carbon emission prediction method proposed in this paper is composed of small tasks, which is affected by the multiple effects of machine static and peak power, real-time CPU utilization \(U_i\), and task execution time \(T_i\). The dynamic meaning of the model can be clearly seen from the formula, the traditional method cannot describe the actual situation of the machine. Therefore, the method proposed in this paper has certain guiding significance and practical value.

Hot Topics

Related Articles