Production Analysis and Prediction of TOP using ARIMA Model

Tomato, Onion and Potato are the staple foods that form the essential part of meal in the contemporary world. Lack of co-ordination between the demand, supply and production induces price fluctuation in the developing country like India. In order to overcome this problem a synchronous model should be in place to reduce the price fluctuation. One of the solutions for the mentioned problem, is designed in the work carried out. ARIMA model has been utilized in the work to come up with the best model for predicting the amount of production based on the previous data. The work provides the best AR, MA and ARIMA model and also the prediction for 15 years based on the previous available data. In conclusion, the work provides the ARIMA model for tomato, onion and potato. Keywords—ARIMA, Time Serires Analysis, TOP, Agriculture,


INTRODUCTION
Agriculture plays an important role in Indian economy. It contributes to 17% of total GDP of India and provides employment over 60% of the population [1]. The success of ancient to modern civilization throughout the world comes from the cultivation of crops in different geographic conditions. The cultivation of crops depends upon versatile factors and these factors should be taken care such that studying these different aspects helps to produce a maximum yield for the farmers. Time Series Analysis is a type of a statistical analysis, which deals with the time series data, and its analysis for regular period of intervals [2]. It is mainly used for predicting the future values based on past observed values. The main objective of time series analysis is to develop the mathematical models which provides sample data and in order to provide a statistical setting for describing the character of data that seemingly fluctuate in a random fashion over time, we assume a time series can be defined as a collection of random variables indexed according to the order they are obtained in time. It is typically to show an example for time arrangement graphically by plotting the estimations of the random factors on the vertical pivot, or ordinate, with the time scale as the abscissa. It is normally helpful to associate the qualities at nearby time spans to recreate outwardly some unique theoretical consistent time arrangement that may have delivered these qualities as a discrete sample. Some of the works carried out using the time series analysis in various domains are mentioned: Forecasting Wi-Fi data network traffic is determined by using time series analysis, Box-Jenkins methodology and ARIMA model [3]. Sample data of correlated structure is taken and different time series analysis methods are observed for data traffic detection. Mean square, standard deviation and correlation co-efficient are calculated to know the relational difference between two values [3]. By using Time series approach, demand in food industry is modeled and forecasted. ARIMA model and Box-Jenkins methodology used to identify four performances criteria such as Akaike criterion, Schwarz Bayesian criterion, maximum likelihood, and standard error. If there exist any nonlinearity in the obtained data set then ANN (Artificial neural network) technique is imposed. The obtained results are again compared with available historical data to forecast the further demand in food industry [4]. The prediction of Sugarcane production in India is done by using Time series analysis, ARIMA model and Box-Jenkins methodology. The work has carried out in forecasting the production of sugarcane in five leading years. Model identification has done to check and identify the variable to be forecasted is stationary in time series. Augmented Dickey-Fuller (ADF) Test is carried for stationary test. Autocorrelations (ACF) and Partial Autocorrelations (PACF) are used for first order differenced series [5]. With the conclusion ARIMA model plays very important role, in order to forecast future successive values in the time series to the work. While exponential smoothing models are based on a description of the trend and seasonality in the data [2], ARIMA models aim to describe the autocorrelations in the data [2]. In this regard, Autoregressive Integrated Moving Average (ARIMA) model helps farmers to choose the future scope of agricultural crops in the best way by using the past values and the present results. Exponential smoothing and ARIMA models are the two methods used for different approaches in time series forecasting, and further provides complementary approaches This work focuses on forecasting the fluctuation in the production of TOP. The word TOP refers to "Tomato, Onion, Potato" which are one of the most essential vegetables which are grown all over the world [6]. These vegetables are used very prominently in daily life and are grown in all time irrespective of seasons [6]. This makes space for volatile supply of TOP in market. The production of these crops varies and due to which a specific model need to be set that provide information regarding stipulated time frame for crop cultivation. This provides information for setting up minimum selling price to farmers. Due to fluctuation in their prices the Indian Government has made 'Operation Greens' which is a project approved by the Ministry of Food Processing Industries with the target to stabilize the supply of tomato, onion and potato crops (TOP crops) in India [7], as well as to ensure their availability around the country throughout the year. These crops can be grown such that they are not affected by MSP rates [11].

II. MATERIALS AND METHODS
As mentioned in the introduction section TOP are considered as the essential part of the meal. The demand for TOP will be for throughout the year. The work was carried out in two phases. The first phase includes data acquisition and data structuring. The second phase comprises of data analysis. The steps carried out in the work are summarized in Fig 1. The acquired data was not present in the required format, thus it was structured in the required format. The ARIMA method was utilized for analysis of the previous data to predict the values. The values might help to take better decision-making for sowing, budgeting, expected production and insights into import/export of TOP. The ARIMA model is integration of two independent models Auto regressive (AR) and Moving Average (MA). This model does not assume any specific pattern on the time series data that is utilized for the prediction. The ARIMA model is based on the following building blocks: (i) Model identification (ii) Parameter estimation (iii) Model validation (i) Model identification: The model identification specifies the orders (p, d, q) of two components of ARIMA model. Typically, order's function is to determine the signal either stationary or non-stationary. If the signal is non-stationary the signal has to be converted to stationary.
(ii) Parameter estimation: The model cannot be structured until we obtain the stationary signal. The non-stationary signal can be converted to stationary signal by determining the dvalue. The optimal selection of the d-value is a challenge in itself. To address this issue it is advised to always start with minimum value. If a higher value is selected then it might lead to larger standard deviation. The optimal d value that provides with zero mean and standard deviation of the signal is considered as the stationary signal. In order to obtain stationary signal ADF test was carried out.  Fig 2(c, d, Fig 3(c, d, Fig 4(c, d, 2f. This provides the information that the production of tomato may go down. Hence the precautionary measures have to be taken care for the corresponding year to maintain the demand and supply chain. The downward trend also tries to provide inputs in decision making for importing the tomato into the country. This in turn helps to prevent fluctuation impinging the market. The statistical parameters as mentioned in the results section for tomato can be improved. The p-value obtained can be improved by incorporating more data.
The prediction curve obtained for onion shows upward trend as shown in fig 3f. This provides the information that the production of onion may rise. Measures can be taken for further improvement of production of onion within the country. This gives insight into the quantity of onion to be imported/exported based on the demand/supply for the corresponding year. The statistical parameters obtained in case of onion are not promising hence the accuracy of prediction can be improved by employing more historical data.
The prediction curve obtained in case of potato is optimal as compared to other prediction curves this is due to the ARIMA model employed incase of potato is optimal. The curve depicts the improved production of potato in the upcoming year as shown in the fig 4f. The accuracy of model prediction can be improved further by incorporating more data. The statistical parameters as mentioned in the results section are satisfactory.
The developed model do not provide insights on the other agricultural related parameters such as weather forecast, soil fertility and approximate time for sowing for improving the production of TOP. Incorporating the above mentioned parameters in the model can improve the region wise prediction and also production of TOP. The region wise prediction is necessary as India has 16 kinds of climate zones [10]. If prediction of the TOP is tailored region wise, it will help reduce the need of transportation from one region to other. It will stimulate the region wise production of TOP. This will help in maintaining the cost fluctuation and avoid transportation cost.
The data available from the source was limited and due to which the prediction has a lower accuracy. So the focus of future work is to improve the accuracy by incorporating more data. Along with improving the accuracy of the model a framework will be designed to provide more insights for optimizing the production of TOP. The outcome of the framework will be to provide information related to all the parameters [5].
In conclusion, the developed model can be utilized as a tool for analysis of budget allocation for importing the necessary crop in case of shortage. This in turn help to provide insight on the abrupt fluctuations in the market specifically for the onion. The insights might be used for better decision making in order to overcome the market fluctuation. On the other hand the data analysis from the forecast can also be utilized to optimize the production of the TOP and avoid the upcoming shortage of the crop at a particular time of the year.