Crop Yields Prediction Using Steepest Descent in Neural Networks

DOI : 10.17577/IJERTV13IS010084

Download Full-Text PDF Cite this Publication

Text Only Version

Crop Yields Prediction Using Steepest Descent in Neural Networks

Zeeshan Choudhary, Prof. Manisha Kadam, Prof. Sonal Yadav

Department of CSE, SDBCT, Indore, India

Abstract: Agriculture is undergoing a metamorphosis due to several environmental and social factors. Due to challenges such as global warming, intermittent rainfall patterns and eroding nutrient values of soil, crop yields have become more unpredictable in the last decade. This has resulted in famines, farmer suicides and deaths due to hunger. Thus, one of the key objectives of the world health organization is to provide food security globally and also help the agriculture community as a whole with special emphasis on low income group countries. This has made crop yield forecasting extremely important. As the crop yield depends on several factors which are highly uncorrelated in nature, hence machine learning based approaches have been employed for the purpose. In this paper a deep neural network approach has been proposed along with the discrete wavelet transform to forecast crop yields. The wavelet transform has been used as a filtering techniques to remove local disturbances from the data, and deep neural networks have been used for pattern recognition and forecasting. The evaluation of the proposed system has been evaluated in terms of the mean absolute percentage error, accuracy and regression. It has been found that the proposed work outperforms existing baseline techniques in terms of the accuracy of forecasting…

Keywords: crop yield forecasting, discrete wavelet transform, deep neural networks, mean absolute percentage error, accuracy.


One of the main goals of the world health organization (WHO) is the food security programme which aims a proving food to everyone in the world so as to eradicate deaths due to hunger. This is however challenging due to factors such as explosive population increase, global warming, unprecedented climate changes, eroding soil nutrient values, urbanization, conversion of farmland for industrial uses, mass exodus to urban areas to seek livelihood, decling investments in staple crop production, increase in food costs etc. Thus it becomes extremely challenging to ensure food security. As per the statistics of WHO, almost 25,000 people die of hunger each day (Holmes, 2020) . It is estimated that a child dies of hunger every 10 seconds and around 3.1 million children die of hunger and malnutirtion each year. The worst hit areas are the Sub-Saharan region in Africa and Asian Countries where the deaths due to hunger are staggeringly large. This leads to the motivation of the WHO to eradicate hunger related deaths by 2030 (Moseley and Battersby, 2020). The situation needs meticulous planning and statistical analysis so as to eradicate hunger related deaths. Crop yield forecasting is one of the key component for the purpose which can render insights into the expected yields therby helping authorities to plan for storgae, distribution and supply of surplus to the needy. Crop yiled forecasting is however channenging due to its dependability on several

factors such as time of the year, crop type, amount of rainfall, temperature, type and condition of soil etc (Klompenburg et al., 2020). Attaining maximum crop yield with minimum production cost remains the main goal of crop yield production (Elavarasan and Vincent, 2020) . There happen to be many challenges associated with the crop yield prediction method. The domain of artificial intelligence has helped in understanding and analyzing the agricultural based markets. Early detection of the problems related to crop yield production can help in quick resolution and aid in increasing yield profit. Predictive methods can be implemented to reduce losses under unforeseen circumstances. Moreover, the prediction methods can be utilized to know the favorable time for growing conditions. Different weather conditions have different kinds of impact on the overall crop yield of a particular area (Dang et al, 2020).

There has been a tremendous growth of artificial intelligence and machine learning in the recent years. The agro based systems and industries have also witnessed an increased adoption of these technologies. This domain has been a prominent area of research for accurate prediction crop yield (Nigam et al, 2019). With the use of meteorological data, it is quite efficient to predict the weather and pest impact on the crops. Several factors affect the yield of crops in some or the other way. For farmers, the crop yield and productivity is of vital importance. Weather conditions are one of the key influencers for the crop yield production. Different types of crops have different factors that impact the respective yield. Hence the motivation behind the research is to evaluate crop yield prediction techniques using the concepts of machine learning. In this paper, machine learning based techniques are analyzed and a model employing data pre-processing and ensemble learning is proposed for crop yield forecasting


    The main challenge pertaining to the forecasting of crop yield lies in the fact that crop yields are affected by several variables which often show a very little correlation (Gopal and Bhargavi, 2019). Hence it is necessary to design a forecasting which can identify the patterns in the seemingly random data, be able to remove the noisy component around the baseline and forecast the crop yield with high accuracy and low error (Hird J and McDermid, 2009). To attain the objective of high forecasting accuracy and low or moderate number of training iterations, it is necessary to focus on two fundamental aspects:

    1. Pre-Process the data so as to remove the noisy component along the baseline.

    2. Design an appropriate machine learning alorithm which can find patterns in complex time series data.

      Thus the first part of the methodology focusses on the pre- processing part to remove the noisy part and filter the data so as to facilitate training.

      The pre-processing is done employing the discrete wavelet transform which acts as a recursive filter to filter out local disturbances and noisy nature of the raw data. This step helps in pattern recognition (Khandelwal et al, 2015). The recursive filtration using the wavelet transform for ith level scaling factor can be expressed as (Nury et al, 2017):


      , are the approximate co-efficient values of

      decomposition level L

      , are the detailed co-efficient values of decomposition

      level L

      Retaining the approximate co-efficient values and discarding the detailed co-efficient values helps in data filtration. An iterative process to retain , and discard

      , for each decomposition level generates a decomposition tree. The validation of the fact that the raw data is filtered can be obtained by observing the decomposition parameters of the raw and filtered data. The

      decomposition metrics which are commonly chosen for data analysis are (Rhif et al, 2019):


      , = 1 1,(2+1)



      1. Mean:

        The co-efficient value , for an ith level can be expressed as (Madan and Mangipudi, 2018).


        = 1



        = 1

        ( )


        s is the original data




        1, 2+1 1

        is the mean

        N is the total number of samples

        and can be expressed as:

        = (1)1 (3)

      2. Standard Deviation:

        . . =




        = (1)+1




        s.d. is the standard deviation

        = 0, . , 1 (5)

        t is the time metric

        is the data stream to which needs to be filtered

        , is the scaling metric

        N is the number of samples s is the original data

        is the mean

      3. Median:

        () = 1 [

        + ]

        The wavelet behaves like a multi-level recursive filter which decomposes the data acting like a combination of low and high pass filters (Hajiabotorabi et al., 2019). It can





        be concluded from existing work that the low pass filtering operation typically contains the baseline data while the noisy component and disturbances are contained by the high pass filtering data. The data can thus be filtered as:

        + (6)



        X is the ordered list

        N is the number of samples

        . represents the floor function

        . represents the ceiling function

      4. Mean Absolute Deviation (MAD):

        . . . () = 1




        is the data to be filtered up-to L levels

        is the low pass filtering operation at level L of




        is the high pass filtering operation at level L of


        stands for the discrete wavelet transform.

        The filtering can be used to estimate the noise floor in the data and further filter it using the co-efficient values of the data. The co-efficient representation of the data is given by:

        ,, , (7)


        M.A.D. represents the mead absolute deviation N is the number of samples

        is the individual sample value of s

        is the mean value od s

      5. L1 Norm:

        ||1 = |1| + |2| (12)


        ||1 is the L1 norm

        A1 and A2 are the comprising vectors

      6. L2 Norm

    ||2 = 2 + 2 + + 2 (13)

    mean square error (Islam et al, 2018). The training rule is given by:

    1 2



    The stopping condition for the machine learning algorithm



    to reach convergence is the successive stability of the cost function or objective function which is considered as the mean square error in this case. The mean square error (mse) is defined as:

    +1 is the weight of ( + 1) iteration.

    is the weight of iteration number .

    stands for error in iteration t

    is the learning rate


    = 1 ( )2



    The mean absolute percentage error (MAPE) for the system has been computed as (Huang et al., 2017):

    denotes the predicted value.

    denotes actual value.

    = 100



    the number of samples.

    If the decomposition values of the approximate co-efficients are identical to the raw data and those for the detailed co- efficients are non-identical, then such a decomposition implies that the noisy part has contained in the detailed co-

    denotes the predicted value.

    denotes actual value.

    the number of samples.


    efficients and can be removed or filtered by discarding the detailed co-efficient values (Fernandez-Ordoñez et al., 2017).


    The next critical stage is training in which the data pre- processed data needs to be applied to a machine learning model for pattern recognition. In this case, an ensemble deep neural network has been used to forecast crop yield. The output of the neural network is given by (Tealab, 2018):



    The simulations have been performed on MATLAB 2020a with an i5 9300H CPU with a clock speed of 2.4GHz and available RAM of 8GB. The first part of the experiment entails data pre-processing so as to remove the noise and disturbance effects form the raw data. For the purpose the wavelet transform has been employed. A three level decomposition of the raw data has been performed in which the detailed co-efficient values are discarded and approximate co-efficient values are retained so as to filter out the noise effects. The approximate co-efficient values


    represents the inputs

    represents the output

    represents the weights

    = (= + )


    along with the detailed co-efficient values are used to train an ensemble neural networks. The data is split in the ratio of 70:30 for training to testing. The parameters used for training are time, rainfall, moisture, humidity, temperature and soil type. The data has been acquired from Kaggle.

    represents the activation function.

    represents the activation function

    To analyze the data in this case, a deep neural network with 10 hidden layers is designed. Each of the modules of the ensemble neural network is fed with the and values of decomposition. The weighted sum of each of the modules is summed up to obtain the final output (Bhoslae et al., 2018). The summation of the individual outputs is given by:





    denotes the total output of the ensemble neural network.

    denotes the individual outputs of the ensemble modules.

    is the number of modules in the ensemble.

    The number of modules has been taken as 4 corresponding to the 3 detailed and 1 approximate co-efficient values. The training rule employed is the back propagation based gradient descent with the objective function taken as the

    Fig. 1. Importing raw data

    Fig. 2. Forecasting Performance

    Fig.3 Regression Analysis

    The summary of results is presented in table I.

    Table I Summary of Results





    Machine Learning Model

    Neural Net



    Back Propagation


    Hidden Layers



    Training Epochs



    Time to convergence

    2 mins, 4secs


    MSE at convergence



    Validation Checks



    Regression Training



    Regression Testing



    Regression Validation



    Regression Overall



    at convergence






    Accuracy Proposed Work



    Machine Learning Model

    Neural Net

    The critical parameters are listed in table I. The comparative analysis with existing work is presented next:

    Table II. Comparison with Previous Work


    Author and Approach

    Forecasting Accuracy


    Elavarasan et al.

    Deep Reinforcement Learning



    Dang et al.

    Support Vector Regression.



    Nigam et al. Random Forests.



    Proposed Approach,

    DWT + Gradient Boost Based Ensemble Deep Neural Network


    Figure 1 depicts the data in the Matab workspace after loading the data. The data is accessible in the Matab workspace for analysis. The dependent variables (feature) along with the target variable (yield) is decomposed to 3 levels of DWT. This would mean an approximate co- efficient value denoted by a and three detailed co-efficient values d1, d2, d3 would be obtained through the decomposition. The metrics of decomposition are chosen as maximum value of variables, minimum value, mean, median, standard deviation, mean absolute deviation, L1 norm and L2 norm. The analysis of the parameters helps us in understanding the effect of the wavelet decomposition on the data cleaning process. S corresponds to the original data stream, C_A corresponds to the approximate co- efficient values while C_D corresponds to the detailed co- efficient values. Figure 2 represents the curves for the predicted and the forecasted values. The red curve depicts the forecasted or predicted values. The blue curve depicts the actual values. It can be clearly concluded that the value of the regression is clearly related to the accuracy of prediction for the system. A comparative analysis of the

    previous and proposed work in terms of the evaluation parameters is given in table 4. Figure 3 depicts the regression for training, testing, validation and overall cases. A summary of the obtained results is presented in table I. A comparative analysis of the proposed work with contemporary techniques shows that the proposed technique outperforms the existing techniques in terms of accuracy of prediction. This can be attributed to the combined data filtration and ensemble learning approach adopted in this work


It can be concluded that crop yield prediction is a critically important forecasting problem trying to address food security in the world. However, it is challenging to accurately forecast crop yields since the data is generally random and complex and the yield depends on multiple parameters. The proposed system uses a two step approach in which the data is first filtered and secondly a deep neural network employing back propagation is used for pattern recognition. The performance of the system has been evaluated in terms of the mean square error, mean absolute percentage error, regression and accuracy. From the results, it can be observed that the system trains in low number of iterations and also achieves low MAPE value. A comparative analysis with respect to previous work also shows that the proposed technique outperforms the existing technique in terms of prediction accuracy.


  1. Bhosale S, Thombare R, Dhemey P, Chaudhari A (2018). Crop Yield Prediction Using Data Analytics and Hybrid Approach, 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA), 1-5.

  2. Dang C, Liu Y, Yue H, Qian J (2021). Autumn Crop Yield Prediction using Data-Driven Approaches:-Support Vector Machines, Random Forest, and Deep Neural Network Methods, Canadian Journal of Remote Sensing, Taylor and Francis, 47(2), 162-181.

  3. Elavarasan D, Vincent P (2020). Crop Yield Prediction Using Deep Reinforcement Learning Model for Sustainable Agrarian Applications, IEEE Access 2020, 8, 86886-86901.

  4. Fernandez-Ordoñez Y, Soria-Ruiz J (2017), Maize crop yield estimation with remote sensing and empirical models, 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 3035-3038.

  5. Gopal P, Bhargavi R (2019). A novel approach for efficient crop yield prediction, Computers and Electronics in Agriculture, Elsevier, 165, 104968.

  6. Hajiabotorabi Z, Kazemi K, Samavati F, Ghaini F (2019). Improving DWT-RNN model via B-spline wavelet multiresolution to forecast a high-frequency time series, Expert Systems with Applications, Elsevier 2019, 138, 112842.

  7. Hird J, McDermid G (2009). Noise reduction of NDVI time series: An empirical comparison of selected techniques, Remote Sensing of Environment, Elsevier, 113(1), 248-258.

  8. Holmes J. (2020). Losing 25,000 to Hunger Every Day. Retrieved from UN Chronicle: 25000-hunger-every-day

  9. Huang X, Huang G, Yu C, Ni S, Yu L (2017). A multiple crop model ensemble for improving broad-scale yield prediction using Bayesian model averaging, Journal of Field Crops Research, Elsevier 211, 114- 124.

  10. Islam T, Chisty T, Chakrabarty A (2018). A Deep Neural Network Approach for Crop Selection and Yield Prediction in Bangladesh, 2018 IEEE Region 10 Humanitarian Technology Conference (R10- HTC), 1-6.

  11. Khandelwal I, Adhikari R, Verma G (2015). Time series forecasting using hybrid ARIMA and ANN models based on DWT decomposition, Procedia Computer Science, Elsevier, 48, 173-179.

  12. Klompenburg T, Kassahun A, Catal C (2020). Crop yield prediction using machine learning: A systematic literature review. Computers and Electronics and Agriculture, Elsevier 2020, 177, 105709.

  13. Madan R and Mangipudi P (2018). Predicting Computer Network Traffic: A Time Series Forecasting Approach Using DWT, ARIMA and RNN, 2018 Eleventh International Conference on Contemporary Computing (IC3), 1-5.

  14. Moseley W, Battersby J (2020). The vulnerability and resilience of African food systems, food security, and nutrition in the context of the COVID-19 pandemic. African Studies Review, Cambridge Publications, 63(3), 449-461.

  15. Nigam A, Garg S, Agrawal A, Agrawal P (2019). Crop Yield Prediction Using Machine Learning Algorithms, 2019 Fifth International Conference on Image Information Processing (ICIIP), 125-130.

  16. Nury A, Hasan K, Alam M (2017). Comparative study of wavelet- ARIMA and wavelet-ANN models for temperature time series data in northeastern Bangladesh, Journal of King Saud University Science, Elsevier, 29(1), 47-61.

  17. Rhif M, Abbes A, Farah I, Martínez B, Sang Y (2019). Wavelet transform application for/in non-stationary time-series analysis: a review, Applied Sciences, MDPI, 9(7), 1-22.

  18. Tealab A (2018). Time series forecasting using artificial neural networks methodologies: A systematic review, Future Computing and Informatics Journal, Elsevier, 3(2) 334-340.