Improving crop production using an agro-deep learning framework in precision agriculture | BMC Bioinformatics

Real world agriculture data was used to test the ADLF for the effectiveness of predicting crop yield and improving farming practices. As presented here, the results in this section will show the model’s performance across major metrics such as accuracy, precision, recall, and F1 score. The output of these metrics gives us insights into the capability of the model to assist farmers in making such decisions, to cut down the resource wastage and to improve the management of the crops.The proposed ADLF is implemented using Jupyter Notebook, a popular tool for interactive computing and data analysis. The choice of Jupyter Notebook allows for an iterative development process, where code can be executed in cells, facilitating real-time experimentation and visualization of results. The programming language used for this implementation is Python, which is widely recognized for its extensive libraries and frameworks that are well-suited for ML and deep learning tasks. Python’s libraries such as TensorFlow, PyTorch, Pandas, and NumPy play a crucial role in developing, training, and evaluating the deep learning models within the ADLF framework.The development environment is configured with a Windows operating system, running on an Intel® Coreâ„¢ Ultra 7 Processor 155UL. This processor, with its 12 M Cache and ability to reach speeds up to 4.80 GHz, provides the necessary computational power to handle intensive data processing and model training tasks efficiently. The system is equipped with 8 GB of RAM, which supports multitasking and smooth operation of the Jupyter Notebook environment and associated libraries. Overall, the combination of Jupyter Notebook, Python, and the specified hardware configuration provides a robust environment for developing and testing the ADLF model, ensuring efficient processing and analysis of agricultural data.The performance of proposed ADLF has compared with the existing Deep learning-based computer vision approach (DLCVA), Improved Agro Deep Learning Model (IADLM), Deep Learning-based Optimization (DLBO), and Improved Deep Learning-Based Classifier (IDLBC). Here the crop yield prediction dataset [47] is used to simulate the results and the python simulator is the tool used to execute the results. The dataset used in this study consists of a total of 56,717 data points, which have been split into training and testing sets using an 80:20 ratio. Thus, 80% of the data point (45,373) will be used for training the model and 20% of the data point (11,344) will be used to test the performance of the model. Such a split is beneficial such that a lot of data is available for training of the model and another separate testing set is also available to assess the model’s accuracy and generalization capability. Finally, for the robustness and generalizability of the proposed model, we used the fivefold cross validation technique. Here, the dataset was random partitioned into five equals subsets. For every iteration of the training process, a model was trained on four subsets and was validated on one. We repeated this five times, using a different subset for validation in each iteration. Once all iterations are done, we average the results from a fold to get a solid model performance estimation. By employing this cross validation, this approach prevents overfitting, in that the model is not unduly reliant on any segment of data that it will serve to generalize to other datasets well.Computation of accuracyAccuracy in Crop Production is computed by comparing the predicted yield generated by the ADLF to the actual yield data collected from the field. This comparison is done using a percentage calculation:$$ {\text{Accuracy}} = \left( {{\text{Actual Yield}} – {\text{Predicted Yield}}} \right)/{\text{Actual Yield}} \times {1}00 $$
(30)
This gives an estimate of the percentage of accuracy in predicting crop production. The ADLF leverages various techniques such as remote sensing, weather data, and crop health monitoring systems to generate predictions based on historical and real-time data. This enables more accurate and timely predictions, leading to improved decision-making for farmers, ultimately resulting in better crop production. Continuous evaluation and refinement of the framework further enhances the accuracy of forecasts over time. Table 2 shows the comparison of accuracy for various models on different inputs.Table 2 Comparison of accuracy for various models on different inputsWe compare the accuracy, as shown in Fig. 7. We chose the baseline models based on their technical strengths in crop yield prediction, farm management optimization, and relevance in precision agriculture. For this, we selected convolutional neural networks (CNNs) for tasks such as crop health monitoring, which DLCVA performs. Among others, IADLM demonstrates how multi-source data and optimization can enhance accuracy. The DLBO method integrates deep learning into optimization, making it suitable for evaluating performance in agricultural applications. Since this benchmark compares classification accuracy, IDLBC is well-suited for classification tasks. We included non-deep learning models such as Random Forest (RF) and Support Vector Machine (SVM) to provide a broader comparison. By comparing the models across varying input sizes, we show that the proposed ADLF model consistently outperforms all others, especially with larger input sizes. For instance, the proposed ADLF model achieved the highest performance with an accuracy of 92.44, which significantly exceeds that of other models, including DLCVA (87.02), DLBO (77.60), and traditional ML models like RF and SVM, which have accuracies of 84.21 and 86.39, respectively. The same trend occurs as the number of inputs increases. The ADLF model remains dominant at 200 inputs, with a score of 90.45, compared to DLCVA’s 85.53 and RF’s 83.78. SVM and RF show consistency but are far less successful than deep learning models across all input sizes. When the number of inputs exceeds 700, the ADLF model still leads with 85.41, while other models, including DLCVA (80.09), DLBO (66.87), and IADLM (61.32), perform progressively worse.Fig. 7No. of inputs vs accuracy comparison for various modelsWe conducted a t-test to evaluate the statistical significance of the performance of the proposed model against baseline models’ results. These results show that there are statistically significant differences in performance (p < 0.05), indicating that the improvements we observed with our model were not just random chance. We find p-values less than 0.05 for both the ADLF (Proposed Model) and DLCVA, confirming that the improvements in performance exhibited are statistically significant, and unlikely to be due to random chances. The results of this suggest that the proposed ADLF model provides significant increase in prediction accuracy, precision, recall and F1 score over other models. Conversely, the p values of IADLM, DLBO and IDLBC models are larger than 0.05 implying that none of the differences in performance is statistically significant.Computation of precisionPrecision for Crop Production in an agro-deep learning framework refers to the accuracy of the crop yield prediction model. It is computed by dividing the number of correctly predicted crop yield values by the total expected values. This metric evaluates how precise the model is in identifying the correct yield values, which directly impacts the accuracy of crop production estimates. To compute precision, the model compares the predicted values to the actual crop yield values from historical data. A higher precision score indicates a more accurate model, which can be used to make informed decisions and optimize crop production strategies. It is a critical evaluation metric for monitoring and improving the performance of crop production models. Table 3 shows the comparison of precision for various models on different inputs.Table 3 Comparison of precision for various models on different inputsFigure 8 shows the comparison of precision. In a computation tip, existing DLCVA obtained 68.32%, IADLM reached 61.29%, DLBO reached 67.86% IDLBC obtained 63.84% precision. The proposed ADLF reached 84.87% precision. The proposed framework used CNN to extract meaningful features from the input data, such as satellite images and crop yield data. It is also helpful for sequentially modeling crop growth patterns and effectively capture temporal dependencies. The pre-trained models were fine-tuned for the specific task of Crop Production. These combinations from multiple models result in a more accurate and robust prediction. Combining these techniques led to an exact and reliable prediction of Crop Production.Fig. 8No. of inputs vs precision comparison for various modelsComputation of recallRecall is a performance metric used to evaluate the effectiveness of the agro-deep learning framework for Crop Production. It measures the proportion of relevant data points the model correctly identifies. In Crop Production, recall is computed by dividing the number of correctly predicted crop yield values by the total number of actual crop yield values in the dataset. This computation considers both accurate positive and false pessimistic predictions, where a true positive is a correctly identified relevant data point, and a false negative is a missed appropriate data point. This provides a comprehensive understanding of the model’s ability to accurately predict crop yields, which is crucial for efficient and successful crop production. Table 4 shows the comparison of recall for various models on different inputs.Table 4 Comparison of recall for various models on different inputsFigure 9 shows the comparison of recall. In a computation tip, existing DLCVA obtained 66.52%, IADLM reached 60.35%, DLBO reached 68.17% IDLBC obtained 64.91% recall. The proposed ADLF reached 84.24% recall. The framework integrates these CNNs with a clustering algorithm to identify regions of interest in the images, which are then used to generate training data. This data is used to train the deep learning model, which can recognize and classify different crops accurately. It also enhances a pre-trained model is fine-tuned for specific crop types, further improving the recall performance.Fig. 9No. of inputs vs recall comparison for various modelsComputation of F1-scoreThe F1 score is a metric commonly used to evaluate the performance of classification models in ML. It is calculated using precision and recall values, which measure the accuracy and completeness of the model’s predictions. The precision refers to the percentage of correctly predicted crop types out of all expected ones. In contrast, recall refers to the rate of correctly predicted crop types out of all actual crop types. The F1-score is then calculated as the harmonic means of precision and recall, providing a combined and balanced measure of the model’s performance in accurately predicting crop types. Table 5 shows the comparison of f1-score for various models on different inputs.Table 5 Comparison of F1-score for various models on different inputsFigure 10 shows the comparison of F1-score. In a computation tip, existing DLCVA obtained 84.32%, IADLM reached 72.50%, DLBO reached 79.03% IDLBC obtained 78.45% F1-score. The proposed ADLF reached 88.91% F1-Score. The proposed framework collects a large amount of data related to crop production, such as weather patterns, soil conditions, and historical yields. This data is then pre-processed and fed into a deep learning model, which uses various algorithms to learn patterns and make accurate predictions. The framework also utilizes transfer learning and data augmentation techniques to improve its performance. Furthermore, it employs feedback mechanisms to improve and update its predictions continuously. The constant improvement results in a high F1-Score for crop production prediction.Fig. 10No. of inputs vs F1-score comparison for various modelsComputation of false negative rateThe false negative rate measures the percentage of incorrect classifications made by the model, specifically when a crop is present but is predicted as absent. It is calculated by dividing the total number of false pessimistic predictions by the total number of positive cases in the dataset. To compute this rate, the deep learning framework uses a process called back propagation, where the network weights are adjusted based on the difference between the predicted and actual outputs. This allows the model to continuously improve and reduce the false negative rate, leading to more accurate predictions and better crop production results.Table 6 and Fig. 11 presents a comparison of the False Negative Rate (FNR) across different models with varying numbers of inputs. The models evaluated include DLCVA, IADLM, DLBO, IDLBC, and ADLF. As the number of inputs increases from 100 to 700, the FNR generally increases for all models, indicating a trend towards higher false negative rates with more data. Among the models, ADLF consistently demonstrates the lowest FNR across all input sizes, suggesting superior performance in minimizing false negatives compared to the others. In contrast, DLCVA exhibits the highest FNR, highlighting potential limitations in its ability to handle larger datasets effectively.Table 6 Comparison of false negative rate for various models on different inputsFig. 11No. of inputs vs false negative rate comparison for various modelsComputation of false positive rateThe false positive rate is calculated by analyzing the number of incorrect predictions made by the model compared to the total number of negative cases. This is accomplished by dividing the number of false positives (cases predicted as positive but are negative) by the sum of false positives and true negatives (cases correctly predicted as negative). This calculation provides a measure of the model’s ability to accurately identify negative cases, which is essential for Crop Production as it helps reduce unnecessary interventions in regions where they are not needed. The lower the false positive rate is, the more reliable and efficient the ADLF is in predicting crop production accurately.Table 7 and Fig. 12 compare the False Positive Rate (FPR) across various models with different input sizes. The models include DLCVA, IADLM, DLBO, IDLBC, and ADLF. As the number of inputs increases from 100 to 700, all models exhibit a rising trend in FPR. Among the models, ADLF consistently shows the lowest FPR, indicating its superior performance in reducing false positives compared to the others. Conversely, DLCVA demonstrates the highest FPR across all input sizes, suggesting it may be less effective in managing false positives as the dataset size grows. This variation highlights the differences in model robustness and accuracy. Table 8 shows the overall performance comparison of various models.Table 7 Comparison of false positive rate for various models on different inputsFig. 12No. of inputs vs false positive rate comparison for various modelsTable 8 Overall performance comparison of various modelsFigure 13 shows the overall performance comparison between the existing and proposed model. In a comparison tip, the proposed model achieved 85.41% accuracy, 84.87% precision, 84.24% recall, 88.91% F1-Score, 91.17% false negative rate, and 89.82% false positive rate. The proposed framework analyses large agricultural data datasets such as soil composition, climate, and crop yield. By accurately extracting high-level features from the data, the framework can create a predictive model to forecast crop production based on various factors accurately. Additionally, the framework incorporates transfer learning, where pre-trained models on other domains are fine-tuned for the specific agricultural data, leading to improved results. Deep learning allows for a more comprehensive data analysis, capturing complex relationships and patterns that may not be apparent with traditional statistical methods. These results are more accurate and robust predictive models for crop production. Table 9 shows the comparison of proposed system across different aspects.Fig. 13Overall performance comparison of various modelsTable 9 Comparison across various aspectsTo address the practical challenges of integrating the proposed ADLF (ADLF) with existing farm management systems, it is essential to explore specific integration strategies. Integrating ADLF with current systems can significantly enhance its practical relevance and adoption by farmers. Data exchange and interoperability are crucial, involving the development of APIs to facilitate seamless communication between ADLF and existing farm management systems. Employing standardized data formats such as CSV or JSON will ensure compatibility and smooth integration. Real-time data integration can be achieved by connecting ADLF with IoT sensors and data streaming platforms, enabling continuous analysis and timely insights. Improving user accessibility involves creating a unified dashboard that combines data from ADLF and existing systems into a comprehensive view. This dashboard should be accessible through mobile and web applications, allowing farmers to interact with the system from various devices. Automated alerts and recommendations based on ADLF’s analysis should be incorporated into existing workflows to enhance decision-making efficiency. Additionally, training programs and technical support are essential to help users understand and utilize the integrated system effectively.We also presented the practical challenges associated with deploying our proposed ADLF. Dependency on uninterrupted network operability in rural and remote agricultural zones is a big issue. This can be effectively mitigated by implementing offline data collection and processing facilities through edge computing devices. This way the system is also capable of top performance even if it does not have internet or good network coverage. The major challenge is the large upfront investments for deploying deep learning models and its hardware. We propose a multi-phased approach to make those costs more manageable. It consists of initiating at a very small-scale scope, as moving confidently and easily expanding on further resources come. Further, there are also government subsidies and technology providers’ collaboration that help reduce the cost. Performing ongoing maintenance and updates to the system is laborious. But to ease that burden, we suggest establishing regular maintenance schedules and advising the use of automated resolution tools. This course will also build capacities for local staff regarding the system and its management, which are useful for post-graduation sustainability as well in reduction on outside expert’s requirements. Our goal is to ensure that through tackling these real-world environmental and practical implementation constraints, we present a holistic solution for implementing our proposed framework in agriculture practice. We think these additions will greatly strengthen the paper and add useful information for practitioners in this sector.We propose incorporating feature importance analysis and visual explanations so that the model’s predictions become more interpretable. For example, farmers may receive a ranked list of the most influential factors (such as soil moisture levels, temperature, or pest prevalence) for predicting the crop yield or health status. The Farmers can highlight these factors and know how these variables are affecting their crops and take the appropriate action accordingly. Beyond this, we have created the model’s output to be delivered to a user-friendly dashboard that translates difficult data into clear visuals and useful insights. The predictions are presented as easy to read charts and graphs that present trends in crop yield, areas of potential risk (example: likelihood of pest infestation) and recommendations for actions such as irrigation or fertilization. In short, these visuals give the farmers an immediate understanding of what the outcomes are and then they decide what to do to remediate it based on this evaluation without necessarily having to rely heavily on deep technical knowledge. Thus, in the manuscript we will exemplify how the model predictions are communicated to farmers in order to strengthen the practical applicability. An example: A farmer might get an alert that even with the available moisture level and projected weather they will begin to suffer water stress in the next week, and it would be suggested that they irrigate. It provides actionable info to bridge the gap between great model predictions and real world farming decision. We integrate these interpretability measures and present the model’s outputs in a form that is not only accurate, but meaningful and practical for farmers to take well timed and effective decisions.Scalability across different farming contextsThe ADLF is flexible and scalable, allowing it to be applied to small and large farming operations as to different crop types. Specifically, the model can be fine-tuned in such small-scale farms where farmers may grow only a few or two crop types. In these sorts of scenarios, the model could be applied to inform optimized resource use, maybe water or fertilizers, based on local data. However, for large scale farms, the framework can leverage data from various sources, like satellite images, IoT sensors and drones, to monitor a large area and gain high quality insights on different crop types and zones. Advances in analytics can help large farms see patterns in pest and disease outbreaks and put preventative measures in place. Case studies reveal the promises for deep learning frameworks to significantly improve crop management, yield prediction and resource optimization in precision agriculture implementations such as large maize farms in the United States or small rice farms in Southeast Asia.Computational resources and challenges in resource-constrained areasIn order to implement this framework, the computational resources needed are a key consideration in areas with limited access to high end technology and internet infrastructure. While the model is efficient, it still requires a lot of processing power to train and inference, a problem that can obstruct the use by the farmers in the remote or resource-restricted area. If a farm or operation has sufficient computational resources, cloud-based platforms can help with real time processing as well as data analysis. However, in areas that don’t have connectivity or access to technology this may not be possible. To address these challenges, we propose the following strategies to make the framework more accessible to farmers with limited resources:

Edge computing solutions The framework is designed such that it does not rely on the cloud-based platforms rather it is deployed on the edge computing devices like the low-cost microprocessor and local server that would take the need to process the data locally without a continuous presence of internet. That would let farmers take advantage of the model’s predictions in real time even if you would have no Internet access.

Simplified mobile applications The framework can be incorporated into lightweight mobile applications, running on smallholder farmers’ mobile phones, delivering simplified outputs and recommendations. These apps could run offline, given preloaded data and simple decision-making models akin to the basic model. The apps sync to cloud based systems for updates and enhanced functionality when internet access is offered.

Collaborations with local agriculture centers partnerships with local agricultural extension centers or cooperatives can be a source of hub to run the model on behalf of farmers because farmers with limited access to technology can also benefit. Deep learning can process these data collected by these centers and provide actionable insights to the farmers. In regions of Africa and South Asia, this approach has been successfully implemented at projects for precision agriculture.

Model limitationsThe main limitation of the model proposed is that its operation is dependent on large volumes of high-quality data for training and validation. Such data are, however, unavailable or inconsistent in real world applications in most cases, especially in developing regions. For instance, a poorly kept remote area soil data or inconsistent weather can result in inaccurate model predictions. Another is that, if one trains the model on a small or very specific dataset, there is a chance to overfit the training data. It leads to the model doing well on the training data but poorly on new, unseen data. Despite it, we have made use of techniques like cross validation and regularization, which however is still an area that could be further optimized. The limitation is also that the model needs data updates in real time for accuracy over time. Since crop conditions and environmental factors can vary radically among regions over time, the model must be frequently updated with new data, or else it risks becoming outdated and unreliable. Unsupervised learning techniques or transfer learning might be the topic of future enhancements and in this case, the model can be less sensitive to the needs of large, high-quality datasets.Although the model has relatively low false positive and false negative rates, the magnitude of impact in a real-world agricultural environment must be discussed further. In other words, a false negative occurs when the model underestimates the crop yield, or it fails to detect the disease or the pest infestation. This could mean a farmer could miss important intervention windows which might cause unexpected yield loss or damage. In these cases, farmers might have to do more frequent crop inspections manually or use a complement of decision support tools to make sure the problems are spotted when they can. However, a false positive, when the model over bends the curve, indicating an increased crop health or yield, could generate unnecessary actions. This is evident for example in how overwatering, over fertilization, or excessive use of pesticides can happen paying no heed to yield predictions which are too optimistic for farmers to use without wasting resources and possibly destroying the environment. In order to address this we suggest building a system of alerts with confidence levels for each prediction that allow farmers to evaluate the risk and make decisions. Also, integrating other real time data sources, such as sensor-based field monitoring, could be able to verify predictions before action.

Hot Topics

Related Articles