Rainfall Prediction using Linear approach & Neural Networks and Crop Recommendation based on Decision Tree

— Rainfall is one of the most vital components of agriculture and also predicting it is the most challenging task. In general, weather and rainfall are highly non-linear and complex phenomena, which require progressive computer modeling and simulation for their precise prediction. Numerous and diverse machine learning models are used to predict the rainfall which are Multiple Linear Regression, Neural networks, K-means, Naïve Bayes and more. These systems implement one of these applications by extracting, training and testing data sets and finding and predicting the rainfall. This study shows the using of Multiple Linear Regression and Neural networks to predict rainfall and Decision Trees algorithm to recommend crops. Thus, we inferred that we can predict the rainfall and recommend crops with reasonable accuracy.


I. INTRODUCTION
The Rainfall Prediction model is implemented by using two Algorithms which are Multiple Linear Regression and Neural Networks. Rainfall Prediction using Linear Regression and Neural Networks is to find the correlation between diverse features in dataset which contributes to Rainfall and to find correct weights and Biases which leads to accurate Prediction of Rainfall respectively. Initially, the dataset with multiple features is cleaned and pre-processed to make it suitable for use and feed it into machine learning algorithm. A correlation matrix is created which shows the correlation between different independent variables and the dependent (Predictor) variable. Once the data is cleaned and processed, we can build our Multiple Linear Regression and Neural Network Model and fit it on our data. The main objective of our system is to predict the Rainfall based on different features like humidity, temperature, pressure etc. with reasonable accuracy. Agriculture and Rainfall are highly correlated with each other. As the technology evolved, developments were made in different sectors including agriculture. Many IT Giants started to provide information related to weather such as temperature, rainfall, humidity, etc. which can be used by farmers in agriculture. different sectors including agriculture. Therefore, we also have developed a Crop Recommendation System which will Recommend Crops to the user based on different inputs provided. This in turn will help oneself (farmer) to have a better idea about the irrigation and types of crops to be grown.

1.) Rainfall Prediction A. Dataset Used.
The dataset for Rainfall prediction knows as "Austin weather Dataset" was collected from Kaggle. The dataset contains many features which includes temperature, humidity, pressure, dew point, visibility etc. The data was having irregularities and hence were removed in the data preprocessing step. The dataset for crop recommendation was taken from open source GitHub repository. This dataset contains different features like temperature, humidity, rainfall, pH value and the crops that grown under particular values of these features.

A. Data Cleaning and Pre-Processing
Data cleansing is the process of detecting and correcting inaccurate or outlier records from a dataset and then replacing, modifying, or deleting the wrong data which can affect accuracy of our model. In our case, the data has few days where the required factors weren't recorded and the rainfall in centimeters was marked as 'T' if there was trace precipitation. Since the algorithm requires numbers, we can't work with alphabets popping up in the data. Hence, we need to clean the data before applying it on our model. This data is not suitable for our model and hence we converted them into such values which can be used in our model and also transformation of which doesn't affect the output.

B. Finding Co-relation and Co-relation coefficient
Correlation can be defined as statistical measure that shows the extent to which two or more variables are dependent on each other. A positive correlation shows that if one variable increases the other also increases while a negative correlation indicates if one variable increases the other decreases. A correlation coefficient is a statistical measure which can be used to know which features in dataset are more related or dependent on output variable and thus helps in feature selection. rxy = (1) A correlation matrix aids us to identify the features or independent variables which are highly correlated and neglect those which are not correlated thus helping us to decrease the complexity of our model.

C. Normalization (Scaling of Data)
Scaling or Normalization is a method used to normalize the range of independent variables or features of data. Data Normalization is usually performed during the data preprocessing step. Normalizing the data helps the model to be less complex as all the values are converted between a particular range of values. In our case we normalized the data in a range of -1 to 1. The following formula is used for normalization.
(2) We Normalized or scaled the data using the formula mention above. We used the R inbuilt function "Scale" to normalize our train data. As of normalizing the test data we cannot normalize it using the scale function as it would consider the Mean and Standard deviation of whole data to scale it. As we cannot give our model any prior knowledge of any kind of data whether it be Mean or S.D, we normalized the test data by separately calculating its mean and S.D and then using the above formula to find its equivalent normalized value.

D. Building a Model (Multiple Linear Regression and Neural Networks) i) Multiple Linear Regression
In statistics, linear regression can be defined as linear approach to demonstrate the correlation between a dependent variable and one or more independent variables. In the case if there is only one independent variable, it is called as simple linear regression. If there are more than one independent variable, the process is called multiple linear regression. This term has different meaning from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single variable. Linear regression has many applications like prediction, forecasting, or error reduction. A predictive model can be fitted using Linear regression with the collected dataset or observed values. Once the model is fitted, it can be used to make predictions. The multiple linear regression equation is as follows: Once the model is fitted, it can be used to make predictions. The multiple linear regression equation is as follows: = b0 + b1X1 + b2X2 + ….. + bpXp where y ̂ is the predicted or expected value of the dependent variable which is rainfall in our case, X1, X2...Xp are different independent or predictor variables like temperature, humidity etc. When all of the independent variables like temperature, humidity...Xp are equal to zero we get b0 which is value of Y. The estimated regression coefficients are represented as b1, b1...bp. The regression coefficient denotes the change in rainfall (y ̂ ) relative to a one-unit change in the independent variables. In our case that is in the multiple regression situation, b1 can be defined as the change that happens when there is a unit change in X1, keeping all other independent variables constant. In our case the dependent variable is Rainfall (Precipitation) and the independent variables are humidity, temperature, pressure and others. The days are represented on X-axis and scale of features or independent variables is denoted on Yaxis. In the above graph we can observe that the rainfall can be high when the temperature is high.

ii) Neural Networks
Artificial Neural Networks is one of the most popular machine learning and deep learning algorithms. They are inspired by human neurons which are capable of making human like decisions with help of computations. For example, in our case we trained the Neural Networks with different features like humidity, temperature, pressure etc. and they learn to identify and analyze the rainfall based on these features using the results of training dataset. The very simple neural network might contain only one input neuron, one hidden neuron, and one output neuron. It takes several dependent variables = input parameters, multiplies them by their coefficients = weights, and runs them through a ReLU activation function and a unit step function. oj = f (∑ wi, jai + bi) (4 ) In our case the input layer will contain the number of neurons equal to the input features. The inputs will be multiplied with weights and then forwarded to the hidden layer for further computation. An activation function will be used which is discussed further. We have only one output layer as we have to predict only one variable which is Rainfall.

Activation Function (Rectified Linear Unit ReLu)
The ReLU stands for Rectified linear Unit. it is widely used in deep learning. As you can see, the ReLU activation function is half rectified. f(z) is 0 when z is less than zero. It converts all negative values to 0 and f(z) is equal to z when z is positive or 0. The activation function can be denoted as: R(z)=Max (0 to infinity) Activation function (x1w1 + x2w2 + ….. + xnwn + bias) (7)

A. Decision Tree Regression
Decision Tree is a machine learning algorithm that uses a flowchart-like tree structure or can be like a model consisting of decisions and all of their possible results, including outputs, input costs and utility. It is one type of the supervised learning algorithms. Decision trees can be used for both types of output variable like categorical or continuous. Decision tree regression algorithm spots features of an object. Decision tree regressor trains a model in a tree like formation and predicts the data for future to have meaningful continuous output. The meaning of continuous output means that it is not denoted by known set of values or numbers. Discrete output example: A decision tree regressor model which predicts whether there will be rain tomorrow or not.

Continuous output example:
A decision tree regressor model which can predict the profit of a company from the sales of a particular kind of product.
The decision trees use the core algorithm named as ID3 which uses a top-down approach. It uses greedy search through the branches of the decision tree with no backtracking available. With the help of information gain (I.G) and standard deviation (S.D), we can use ID3 algorithm to implement Decision tree regressor.

Information Gain
The secondary name for Information gain is Kullback-Leibler divergence which is represented by IG (S, A) for a set S. Information Gain can be easily defined as the effective variation in entropy after deciding on a particular attribute A. With respect to independent variables I.G calculates or measures the relative change in entropy.

Standard Deviation
In a decision tree algorithm, the data is partitioned into smaller subsets which contains instances with similar/ homogenous values. It starts building from the root node and then data partition happens. To calculate the homogeneity of the sample we use a measure known as standard deviation. A complete homogenous numerical sample has a S.D = 0.

Standard Deviation Reduction
The decrease in standard deviation (S.D) after a data is split on an attribute is called as standard deviation reduction. The main aim of developing a decision tree model is to find such an attribute which returns the highest reduction in standard deviation (S.D).
Step 1: The S.D of the target is calculated.
Step 2: The dataset is then divided on basis of the different attributes. The S.D for each branch is calculated. Finally, we subtract the resulting S.D from the S.D before the split. The result is the S.D reduction.

SDR (T, X) = S (T) -S (T, X)
Step 3: The attribute having the largest standard deviation (S.D) reduction is chosen for the decision node.
Step 4: The dataset is partitioned based on the values of the selected attribute. Until all data is processed this process is run recursively on the non-leaf branches.

IV. SOFTWARE DESIGN A. Python
We used python as our programming language to implement machine learning algorithms. The main reasons for choosing python are mentioned below:

B. TensorFlow
We took help from one of the most used and popular machine learning libraries which is TensorFlow. TensorFlow architecture works in three parts: • Preprocessing the data • Build the model • Train and estimate the model TensorFlow library has numerous machine learning and neural machine computation functions which can aid to make complex task easy. Many reasons to use TensorFlow are as follows: • In neural machine translations, it can help to reduce errors up to 50%. • It provides flexibility and multi-level abstraction.
• It helps to train the model faster and also aids to run more experiments.
TensorFlow library provides numerous different API to build deep learning architecture like CNN or RNN.

C. Keras
We used Keras for building our neural networks model. It is a high-level neural network API which can be used with other libraries such as TensorFlow or Theano. The main advantages of using Keras for building neural networks are as follows: • We can use pre-trained models such as VGG16, ResNet, inception networks etc. • It reduces cognitive Load. • It has simple and consistent API's.
• It reduces the number of Human Action required for common computations. etc.

D. NumPy and Pandas
For scientific computations and to work with highperformance arrays and matrices we have used NumPy. It is very useful for processing multidimensional arrays and is an open source library. Additionally, we have used Pandas package for data manipulation.

E. Sklearn
Scikit-learn or sklearn is an open source and free machine learning library available in python. With the help of sklearn library we can incorporate diverse classification, regression and clustering algorithms. In our case we tend to use one of its many algorithms named as Decision Tree Regressor.

F. R and RStudio
R is another popular programming language in field of machine learning. We can implement variety of linear/nonlinear operations, classification etc. and also neural networks using its inbuilt libraries and packages. We have used R programming to develop a Neural network model to predict the Rainfall. Libraries like Keras and TensorFlow are available in R and hence were used according We used RStudio which is an IDE for R programming language. One can use RStudio in two formats a desktop application or via a web browser. We have used the desktop version of R studio to perform and build our model.

V. RESULTS
We fed multiple inputs such as temperature, humidity, wind speed into our two Machine learning models viz Multiple Linear Regression Neural Networks and computed the output as shown below We fed our manual input to the model in the form of an array to get the output from the model The output for Prediction of Rainfall using Multiple Linear Regression is as below. The output from our Neural Network model shows the mean absolute error which was 0.1459216. Here's a snapshot of it. The Accuracy of Decision Tree came out to be 90-92%. The output for Crop Recommendation using Decision Tree Regressor is as shown below: Figure 10. Output of Decision tree VI. CONCLUSION Hence, using machine learning techniques like Multiple Linear Regression and Neural Networks we can predict the rainfall with considerable accuracy. However, the accuracy of the model depends on the type of data that has been fed. Modifications needs to be done when data of different structure and type is used to predict the rainfall. Additionally, with the help of multiple input features like Rainfall, Humidity, temperature we also recommended crops that can be grown using another popular machine learning technique known as Decision Tree. The accuracy of the decision tree was quite satisfying and could aid many farmers to make better decisions.