**Open Access**-
**Authors :**Sahil Rane -
**Paper ID :**IJERTV9IS100054 -
**Volume & Issue :**Volume 09, Issue 10 (October 2020) -
**Published (First Online):**10-10-2020 -
**ISSN (Online) :**2278-0181 -
**Publisher Name :**IJERT -
**License:**This work is licensed under a Creative Commons Attribution 4.0 International License

#### Improving the Accuracy of Temperature Forecasting in Mumbai, Colaba

Abstract

Sahil Rane

Dhirubhai Ambani International School

Background: Temperature predictions are of great importance due to their implications for human activities. Extreme heat can lead to dangerous, even deadly, health consequences, including heat stress and heatstroke. Thus, there is a need to predict temperature accurately so that people can be warned about such conditions so that they take the appropriate precautions.

Methodology: In this paper, we look at temperature data in Mumbai (Colaba) from 2008 to 2020 and attempt to come up with predictive models for the maximum temperature. We carry out feature selection through filter methods first in order to efficiently use a variety of algorithms to develop predictive models. We use various mathematical techniques such as: Multiple Linear Regression (MLR), Simple Exponential Smoothing (SES), Artificial Neural Networks (ANN), and Auto-Regressive Integrated Moving Average (ARIMA) models to predict the temperature. The experimental results are evaluated and compared using the Root Mean Square Error (RMSE).

Results: On experimenting with all four models, it was discovered that the ARIMA model yields the best predictive model having a RMSE of 0.2587777 on testing data by removing some noise and an RMSE of 0.8213 with white noise. This model is also optimal as the residuals of this model are a gaussian white noise (which cannot be predicted). Furthermore, the poor performance of MLR indicates that temperature cannot be accurately modelled through a linear function of the variables considered.

ACKNOWLEDGEMENTS:

I would like to thank Smt. Shubhangi Bhute, Scientist-E (Indian Meterological Department) for guiding me with the project, helping me ideate, and find data for the research paper. Furthermore, I am grateful to Dr. Mahendra Mehta for helping me explore various areas in statistical predictions helping me acquire the knowledge base I needed to work on this paper.

BACKGROUND

This paper attempts to come up with a reliable predictive model for the maximum temperature in Mumbai. Although there are several research papers based on temperature predictive models in general, there were very few papers that aimed to predict temperatures in Mumbai specifically. The models cited in several research papers, for temperature prediction in areas all around the world, were not able to improve on an RMS error of approximately 1 in most cases. Thus, this paper aims to improve the RMS error of these previous models for the dataset of Mumbai (Colaba). In many papers, multiple linear regression and neural networks have been commonly used for temperature predictions with artificial neural networks being the most common. This paper also explores a variety of time series analysis techniques such as exponential smoothing and auto-regressive integrated moving average models.

Often temperatures in Mumbai are underestimated and necessary precautions are not taken while exiting ones house. Due to the adverse effects that this heat can have on the health of individuals, it is necessary that steps be taken at an organisational level to issue warnings to citizens so that they can take the necessary precautions and steps before leaving their house. This idea was inspired by the IMDs Heat Action Plan, which has been implemented in Ahmedabad. While the temperatures in Mumbai may not be as severe, precautions are necessary in order to protect and inform individuals about the adverse effects of heat.

METHODOLOGY:

Feature Selection

Pearson Correlation Test for feature selection:

Variable | Pearson Correlation Coefficient (r) |

maxlag1 | 0.8872607 |

minlag1 | 0.2079691 |

avglag1 | 0.6158288 |

DAY | 0.02438063 |

MONTH | -0.03973674 |

YEAR | 0.01284328 |

totalprecipMM | -0.3852286 |

windspeedKmph | -0.162895 |

humidity | -0.432934 |

visibilityKm | 0.2798653 |

pressureMB | 0.2311805 |

cloudcover | -0.5144175 |

HeatIndexC | 0.595783 |

DewPointC | -0.009764557 |

WindChillC | 0.7967794 |

We use the Pearson Correlation Coefficient (r) as a feature selection mechanism. In order to avoid overfitting of the data we will only be using the features that have || > 0.5

The pearson correlation co-efficient is used to measure correlation between different sets of data to understand how strong the correlation is between two variables. Above we have calculated and tabulated the pearson correlation co-efficients between the maximum temperature of the day with a variety of variables such as total precipitation, pressure, sun hours etc. with a lag of 1 day.

( )( )

=

( )2 Ã— ( )2

By the above criteria we select the variables: maxlag1 (Maximum temperature of the previous day), avglag1 (Average temperature of the previous day), cloudcover (fraction of the sky obscured by clouds when observed from a particular location), HeatIndexC (Index that combines relative humidity and actual temperature), WindChillC (Combination of windspeed and temperature)

= 13.12 + 0.6215 Ã— 11.37 Ã— (0.16) + 0.3965(0.16)

T = Temperature in degree Celsius

V = Wind velocity in kilometres per hour

T = Temperature in degree Celsius R = Relative Humidity

HI = HeatIndexC

Multivariate Linear Regression:

First, in order to perform a multiple linear regression on our data it is necessary to carry out feature scaling as the range for each variable differs significantly. If feature scaling was not done then we variables with maximum range will dominate in the training of the regression model. We will bound all our variables in the interval [0,1]. We use min-max normalisation technique on our data.

=

We will use Ordinary least Squares (OLS) regression to get the best linear unbiased estimators. Lets call the temperature on the ith day some Ti

1 = 0 + 111 + 221 + 331 + 441 + 551 + 1

2 = 0 + 112 + 222 + 332 + 442 + 552 + 2

3 = 0 + 113 + 223 + 333 + 443 + 553 + 3

= 0 + 11 + 22 + 33 + 44 + 55 +

When we vectorise these equations we get:

We can write this as:

1 11 21

= [

1 1 2

31 41 51

3 4 5

0

] Ã— [ ] +

5

= + =

Since we are using OLS regression we want to minimise || | 2 = 2

| =1

2 = Ã— = ( ) Ã— ( )

=1

We use gradient descent with simultaneous update to minimise the error function.

(

)2

= [( ) Ã— ]

Algorithm:

=1

=1

Repeat using simultaneous update

{

}

= Ã— [( ) Ã— ]

=1

We use the R programming language to run this program nd obtain the multivariate linear regression analysis

Code for Multivariate Linear Regression Used:

#First we need to read the data from the csv file with headers

data <- read.csv("weather_data_24hr_master1.csv", header = T)

#remove null row

data <- data[-4265,]

#remove columns based on feature selection

data <- data[,c(4,7,9,17,18,20)]

# we need to normalize the data (max-min normalization)

data$maxlag1 <- (data$maxlag1-min(data$maxlag1))/(max(data$maxlag1)-min(data$maxlag1)) data$avglag1 <- (data$avglag1-min(data$avglag1))/(max(data$avglag1)-min(data$avglag1)) data$cloudcover <- (data$cloudcover-min(data$cloudcover))/(max(data$cloudcover)-min(data$cloudcover))

data$HeatIndexC <- (data$HeatIndexC-min(data$HeatIndexC))/(max(data$HeatIndexC)-min(data$HeatIndexC)) data$WindChillC <- (data$WindChillC-min(data$WindChillC))/(max(data$WindChillC)-min(data$WindChillC))

# Data Partition

# Partitioning the Data into testing and training data

# If we repeat the learning, we get the same result

set.seed(222)

ind <- sample(2,nrow(data), replace = T, prob =c(0.7,0.3)) training <- data[ind==1,]

testing <- data[ind==2,]

# Training the Linear regression model MLR <- lm(maxtempC~., data = training) summary(MLR)

# Prediction on training

output <- predict(MLR, training[,-1]) df <- data.frame(output, training[,1]) df

sum((df[,1]-df[,2])^2)/nrow(training)

# Prediction on testing

output <- predict(MLR, testing[,-1]) df <- data.frame(output, testing[,1]) df

sum((df[,1]-df[,2])^2)/nrow(testing)

Explanation of code:

We first import the comma-splitted values containing the raw data for our analysis. We then keep only the columns that were deemed statistically significant by the pearson correlation test. We then carry out the max-min normalisation of the data as specified above. We then partition 70% of the data for training our regression model and 30% of the data to test the regression model. We create the multivariate ordinary least squares linear regression model using gradient descent as outlined above. We then calculate the sum of squared error for both the training and the testing data using the regression model found.

Results:

The regression equation obtained by our program is:

() = 24.2541 + 11.1606 Ã— 1 + 0.8512 Ã— 2 0.8588 Ã— 3 + 0.4494 Ã— 4 + 0.4872 Ã— 5

Metric of Analysis for results:

We will now be analysing the results produced by the multivariate linear regression model. For analysis we will be finding the square root of mean sum of squared errors (RMS errors) returned by the program. We define the function of our prediction model (hypothesis function) as (). The formula for the RMS error is

2

( ()

=

=1

)

For the training data the sum of squared errors was: 1.004442 For the testing data the sum of squared errors was: 1.062815

Since the behaviour of our model is similar for the training and testing data we can conclude that our model is not over or underfitting the dataset.

Overall, we can conclude that our data with lag 1through a linear relationship does not yield ideal results. This Multiple linear regression was also autoregressive in nature but did not give us the desired RMS error.

Simple Exponential Smoothing

We will also attempt to forecast temperature using time series analysis techniques. We will begin with simple exponential smoothing as a technique because there is no clear trend or seasonality in the data when the time series is plotted. We dont use Holts Exponential smoothing or Holt-Winter Exponential Smoothing as our data does not have clear increasing/decreasing trends or any visible seasonality based on the plot. In simple exponential smoothing, we say the predicted observation is a weighted average of previous observations. While calculating the weighted averages, the weights decrease exponentially as lag increases, that is, the smallest weights are associated with the oldest observations:

Notation:

0:

:

: 0 1

Weighted average form of exponential smoothing:

+1 = + (1 )

Lets try to arrive at the above generalisation by considering smaller examples.

1 = 0

2 = 1 + (1 )0

3 = 2 + (1 )2

4 = 3 + (1 )3

= 1 + (1 )1

+1 = + (1 )

When we substitute each equation in the next we get:

3 = 2 + (1 )(1 + (1 )0)

= 2 + (1 )()1 + (1 )20

4 = 3 + (1 )(2 + (1 )()1 + (1 )20)

= 3 + (1 )()2 + (1 )2()1 + (1 )30

1

+1 = ((1 )) + (1 ) 0

=0

This is the weighted average form of the simple exponential smoothing model. We want to find the value of that minimises the sum of squared errors (SSE), therefore, we use the gradient descent algorithm to minimise the squared error.

= ( )2 = 2

=1 =1

We use the R programming language to run gradient descent in order to minimise error.

Code for Simple Exponential Smoothing Used:

#first read the comma splitted values containing the dataset

data <- read.csv("weather_data_24hr_master1.csv", header = T)

#filter only maxtempC out which is relevant to our timeseries

data <- data[,4]

#remove NA values data <- data[-4265] data

#Let's partition the data into testing and training data.

training <- data[1:2990] testing <- data[2991:4264]

#Create a timeseries using the maxtempC data

timeseries <- ts(training, frequency = 365, start = c(2008,183)) timeseries

#plot the timeseries plot.ts(timeseries) install.packages("TTR") library("TTR")

#Simple exponential Smoothing

timeseriesforecasts <- HoltWinters(timeseries, beta=FALSE, gamma=FALSE, l.start = 27) timeseriesforecasts

timeseriesforecasts$fitted plot(timeseriesforecasts)

#calculate the sum of squared errors for our forecasts

a<-(timeseriesforecasts$SSE)

#calculate the mean of the SSE

MSE<- a/length(training)

#calculate the RMS error

sqrt(MSE)

Explanation of code:

We first import the comma-splitted values containing the raw data for our analysis. We then remove all columns except the maxtempC for our timeseries. We partition our dataset into training data and testing data. We train our simple exponential smoothing model on the first 70% of our data. We test this model using the remaining 30% of the data. We then convert the vector into a timeseries. We then plot the timeseries and run simple exponential smoothing on our timeseries. This gives us the minimum value of . We also find the RMS error of our model on the training data. We then use the same value of to run our simple exponential smoothing model on the testing data and calculate the RMS error for the same.

Results:

By running gradient descent we get the result that the error is minimised when the learning parameter = 0.8400815 This makes out weighted average exponential smoothing equation to be as follows:

1

+1 = ((0.8400815)(0.1599185)) + (0.1599185) 27

=0

Metric of Analysis for results:

We will now be analysing the results produced by the simple exponential smoothing model. One of the main issues that arises in the use of this model is that it does not perform well on sudden fluctuations in the data whereas it performs very well for gradual increases or decreases in our data. The high alpha value indicates high dependence on previous day values of the max temperature. The high alpha also indicates that as lag increases data beyond lag= (0.8400815)(0.1599185)3 = 0.00343571 becomes too small to have relevance to our forecasted value. To determine how well our model fits the data we calculate the RMSE for the testing data and the training data. We define our prediction model function for simple exponential smoothing to be: (). The temperature on the tth day is defined as ().

Therefore we get that the RMS error is:

2

( () ())

= =1

For the training data the RMS error was: 1.048192 For the testing data the RMS error was: 1.0522970

The model gives very similar values for the RMS training and testing data, thus overfitting of data is not an issue for our model.

Since simple exponential smoothing doesnt help and the data cannot be expressed as a linear relationship we look at artificial neural networks to explore non-linear hypotheses functions for the data.

Time Series plot

Artificial Neural Networks

A neural network is inspired from the model of a human brain consisting of neurons. When neural networks learn they independently find a variety of connections in the data which helps with complicated predictions when we have large datasets with several variables. Each neuron receives the values of the variables (features) from the training set and calculates a weighted average of these values. The result of this calculation is passed through a non-linear activation function.

For the mth neuron we supply the vector x of training examples and it calculates the weighted average and returns a zm value. In this scenario b is a bias constant that is added to the outcome of each neuron.

= () + ( = 1, 2, 3, , )

=1

The activation function g(z) is applied on each z value to give our forecast value (y) . Thus the output of each neuron is passed to the activation function.

Without an activation function, the neural network would simply return a linear function thus not being able to model complex data with small error.

The way a neural network is trained is by the value of the loss function. We use the sum of squared error as our error function thus during the training our model minimises the error of the neural network. In our learning the values of the weights and the bias parameter are changed in order to minimise the error function. We calculate the partial derivative of the loss function in order to arrive at a minimum value for the error. We use the backpropagation algorithm in order to train our neural network.

For our regression problem the loss function would be Mean Squared Error, which squares the difference between actual (y) and predicted value (y).

() = ( )2

=1

Using chain rule we get the following:

= Ã—

Ã—

First lets calculate :

=

Ã— (( )2) = 2 Ã— (

)

=1

=1

Now lets calculate the

=

Ã— ()

= (

1

1

+ )

= (1 + )2

1 1

= (1 + ) Ã— (1 1 + )

= () Ã— (1 ())

Finally lets calculate the :

=

Ã— ()

=

() +

=1

=

Therefore, we get:

= Ã— () Ã— (1 ()) Ã— 2 Ã— ( )

=1

We also calculate which we can get from the above formula as the input for the bias operator is 1.

= () Ã— (1 ()) Ã— 2 Ã— ( )

=1

After backpropagation is carried out we focus on optimisation which is done through gradient descent. For this we define our learning rate as .

Repeat until convergence:

{

= ( Ã— )

= ( Ã—

)

}

Code for Artificial Neural Networks Used:

#First we need to read the data from the csv file with headers

data <- read.csv("weather_data_24hr_master1.csv", header = T)

#remove null row

data <- data[-4265,]

#remove columns based on feature selection

data <- data[,c(4,7,9,17,18,20)]

# we need to normalize the data (max-min normalization)

data$maxlag1 <- (data$maxlag1-min(data$maxlag1))/(max(data$maxlag1)-min(data$maxlag1)) data$avglag1 <- (data$avglag1-min(data$avglag1))/(max(data$avglag1)-min(data$avglag1)) data$cloudcover <- (data$cloudcover-min(data$cloudcover))/(max(data$cloudcover)-min(data$cloudcover))

data$HeatIndexC <- (data$HeatIndexC-min(data$HeatIndexC))/(max(data$HeatIndexC)-min(data$HeatIndexC)) data$WindChillC <- (data$WindChillC-min(data$WindChillC))/(max(data$WindChillC)-min(data$WindChillC))

#Data partition to divide our data into training anad testing data (70% training 30% testing)

#we set seed in order to be able to repeat the learning

set.seed(222)

ind <- sample(2,nrow(data), replace = T, prob =c(0.7,0.3))

training <- data[ind==1,] testing <- data[ind==2,]

#install the neural network packages in R install.packages("neuralnet") library(neuralnet)

#We create a neural network n trained on the training data

#This neural network has the error function as the sum of squared error

#It has the activation function as the sigmoid function

#It has 2 neurons

n <- neuralnet(maxtempC~.,

data = training, hidden = 2,

stepmax = 9999999, err.fct ='sse',

act.fct = 'logistic', linear.output = T)

n

#We plot our trained neural network

plot(n)

# We calculate the RMS error for our neural network on the training and testing dataset

output <-compute(n, training) p1 <- output$net.result

sqrt(sum((training$maxtempC-p1)^2)/nrow(training)) max((training$maxtempC-p1))

output <- compute(n, testing) p2 <- output$net.result

sqrt(sum((testing$maxtempC-p2)^2)/nrow(testing))

Explanation of code:

We first import the csv file containing the raw data for our analysis. We then retain all the feature columns from our feature selection and remove all NA values. We first carry out feature scaling of our data through min-max normalisation. We then partition our dataset into training data (70%) and testing data (30%). We train an artificial neural network with 2 neurons to fit our training data. We find the RMS error of our model on the training data. We then run our model on the testing data and calculate the RMS error.

Results:

The neural network plot has been included below with the weights on each wire connecting our input to the neurons and our neurons to the output layer.

The value of the bias operator for the hidden layer is 2.44763 and the bias for the output layer is 12.68663.

Neural network plot

Metric of Analysis for results:

We will now analyse the results of the neural network model. The use of neural networks allows us to predict the data using a non-linear hypothesis function. The high weights on the WindChillC, maxlag1, and avglag1 indicate that these are the dominant features in our learning. A variety of number of neurons were tried out. 2 neurons were chosen as larger number of neurons seemed to overfit our data and would have an overly complicated hypothesis function. To determine how well our model fits the data we calculate the RMSE for the testing data and the training data. We define our prediction model function for our neural network to be: (). The actual temperature is represented by .

Therefore we get the formula for the RMS error to be:

(() )2

= =1

For the training data the RMS error was: 0.9906021 For the testing data te RMS error was: 1.012613

Since the RMS error for the training and testing data using our model is similar we can conclude that our model does not overfit our data. It also performs better than the exponential smoothing model as well as multivariate regression but only marginally. Since expressing the forecast value as a linear/non-linear hypothesis of explanatory variables does not give us the desired results we try to use time series analysis using ARIMA.

Auto-regressive Integrated Moving Average (ARIMA)

In order to apply the ARIMA technique we require our timeseries data to be stationary. This means that our data has a constant mean, constant variance (deviation from the mean), and no seasonality. It is necessary to confirm whether our time series is stationary using the Augmented Dickey-Fuller test (ADF test) and the KwiatkowskiPhillipsSchmidtShin (KPSS test).

The ADF test is a unit root test. The mathematics and proofs related to the ADF test will not be described in this paper but can be found in [1]. For our purposes we only need to know the null and alternative hypothesis as defined by the test.

0: (null hypothesis) The given data is non-stationary

1: (alternative hypothesis) The given data is stationary

On running the ADF test on our time series we get a p-value 0.01

Since the p-value is less than the significance level of 0.05, we can safely reject the null hypothesis and conclude that our data is stationary. However, for large datasets the adf test can be erroneous rejecting the null hypothesis a vast majority of the times. Hence, we use the KPSS test to confirm that our data is stationary. We use the KPSS test for level stationarity.

For the KPSS test the hypotheses are as follows:

0: (null hypothesis) The data is level stationary

1: (alternative hypothesis) The given data is level non-stationary On running the KPSS test we get the p-value 0.1

Since the p-value is greater than the significance level of 0.05 we cannot reject the null hypothesis which supports our previous conclusion that our data is stationary. Since our data is stationary we can proceed by using auto-regressive integrated moving average processes on our data to come up with an accurate forecasting model for the data.

Since our data is stationary we do not require an ARIMA model and we can directly use an ARMA model (Auto-Regressive Moving Average).

Before we can define an ARMA model we will define white noise.

Definition 4.1: White noise: It is an identically and independently distributed stochastic process { , } with mean zero such that there is no serial correlation between values of stochastic process in the present and past.

A gaussian white noise is when our stochastic process is:

~ (0, 2)

White noise timeseries cannot be predicted as they are a sequence of random numbers. If the series of forecast errors are white noise it suggests that our model cannot be improved.

Figure: Plotting the residuals of our ARIMA(2,0,2) model

We used the ARIMA model on our timeseries and the residuals of our model were a gaussian white noise with mean 0 and variance 1. However, we want to improve our model despite this white noise. To do this we will attempt to smooth our timeseries in order to reduce the white noise in our timeseries. We can ignore some random fluctuations in our data and thus it is justified to take the moving average of our data. We will use a triangular moving average smoothing technique in order to minimise the error caused by this white noise. For this we will defined our new smooth timeseries to be and our original time series to be y.

Mathematically our smoothed time series of order 2 will be defined as:

+ 1

= 2

To get the triangular moving average ( )to smooth our time series we apply the moving average of order 2 again.

=

+ 1 2

Blue: modified timeseries with triangular moving average Green: original time series

An auto-regressive model is one in which we assume that the values of the time series in the future depends on past values of the time series. It is a linear model relating values of the time series to past values of the time series. Let this time series be . Then, for this timeseries the kth order auto-regression model (AR(k)) is:

= 0 + 11 + 22 + + +

= + 0 +

=1

A moving average model is one in which we assume that the values of the time series in the future depend on the previous residual terms of the time series. It is a linear model relating the value of the time series to past values of the error. Therefore, the kth order moving average model (MA(k)) is:

= + 0 + 11 + 22 + 33 + +

= + 0 +

=1

An ARMA model simply combines an auto-regressive model and a moving average model. Thus, an ARMA model is such that a value of the time series can be predicted based on previous values of the time series. Thus an ARMA(p,q) model is:

= + 0 + 11 + 22 + + + 0 + 11 + 22 + +

= + 0 + + 0 +

=1 =1

The algorithm for choosing the best ARMA(p,q) model is not outlined in this paper. However, we use maximum likelihood estimation to arrive at the best ARMA model. We use the R programming language along with the Hyndman-Khandakar algorithm

to minimise Akaikes Information Criterion (AIC) using maximum likelihood estimation.

= 2 log() + 2( + + + 1)

:

= 1 0 + 0 0

= 0 0 + 0 = 0

Code for ARIMA Used:

#first read the comma splitted values containing the dataset

data <- read.csv("weather_data_24hr_master1.csv", header = T)

#filter only maxtempC out which is relevant to our timeseries

data <- data[,4]

#remove NA values

data <- data[-4265]

#looking at the first few values of our vector

head(data)

#sequential data partition into training (70%) and testing data (30%)

training <- data[1:2990] testing <- data[2991:4264]

#convert the data into a time series

ts <- ts(data, frequency = 365, start = c(2008,183))

#smooth the time series using the triangular moving average

tst2 <- ts(rollmean(rollmean(ts,2),2),frequency = 365, start = c(2008,183))

#plotting the original time series and the smoothed time series

plot(ts, col = 'green') lines(tst2, col = 'blue')

#carrying out the augmented dickey-fuller test on our time series for stationarity

adf.test(ts)

#carrying out the kpss test to further confirm level stationarity of the data

kpss.test(ts, null = "Level")

#converting the training data into a time series

tstrain <- ts(training, frequency = 365, start = c(2008,183))

#smoothing the training time series using triangular moving average

ts1 <- ts(rollmean(rollmean(tstrain,2),2))

#converting the testing data into a time series

tstest <- ts(testing, frequency = 365, start = c(2016,252))

#smoothing the testing time series using triangular moving average

ts2 <- ts(rollmean(rollmean(tstest,2),2))

#we fit the training data to the arma(2,2) model

fit<-arima(ts1, c(2,0,2))

#finding out the RMS error of the arma(2,2) model

accuracy(fit) fitted(fit)

#applying our previously derived model on the testing data

refit<-Arima(ts2, model = fit)

#calculating the accuracy of our model on the testing data

accuracy(refit)

#plotting the residuals of our model

plot(residuals(fit))

Explanation of code:

We first import the comma splitted values and filter only the relevant maxtempC data from the dataset. We then create multiple time series: one containing the entire datast, one containing the training data, and one containing the testing data. We then use triangular moving averages to smooth these time series. We also run the ADF test and KPSS test on the data to test for stationarity. We then fit the ARMA(2,2) model to our data and calculate the RMS error on the training and testing data. We plot our residuals to observe whether it is a white noise time series to ensure optimality of our time series.

Results:

According to the ADF and KPSS test we get that our time series is stationary. We get that an ARMA(2,2) model that best fits our data.

Co-efficients of ARMA model

Metric of Analysis for results:

We will now analyse the results of the ARMA(2,2) model. To determine how well our model fits the data we calculate the RMS error for the testing data and the training data. We define our prediction model function for our ARMA model to be: . The actual temperature is represented by .

( )2

= =1

The plot of our residuals is also a white noise time series. Thus, we can conclude that our ARMA model is optimal as white noise cannot be predicted.

For the training data the RMS error (after removing white noise) was: 0.2593737 For the testing data the RMS error (after removing white noise) was: 0.2587777

For the training data the RMS error (without removing white noise) was: 0.8115352 For the testing data the RMS error (without removing white noise) was: 0.8213673

Since the RMS error for the training and testing data using our model is similar we can conclude that our model does not overfit our data and is a reliable predictor even on unseen testing data. It also performs significantly better than the exponential smoothing model, multivariate regression model as well as the neural network model.

Figure: plot of residuals of the ARMA(2,2) model

Results:

The results of the research are that the ARMA(2,2) model works best for prediction of data in Mumbai (Colaba). After the removal of noise this model gives an RMS error of 0.2587777, which is much lower than the previous values. To outline the reliability of the model the actual temperature recorded and model predictions were compared for both maximum and minimum temperature in Colaba.

To test the model the predictions for 15 days were carried out everyday. Maximum Temperature:

Accurate: 9

Usable: 4

Incorrect: 2

Minimum Temperature:

Accurate: 13

Usable: 2

Incorrect: 0

Date

Actual max temp

Actual min temp

model prediction max

error

scale

model prediction min

error

scale

17/07/20

28.4

25.5

29

1

accurate

25.2

0

accurate

18/07/20

27.8

25.3

28.8

1

accurate

25.5

0

accurate

19/07/20

30.2

23.4

29

1

accurate

25.3

2

usable

20/07/20

32.0

25.5

30.2

2

usable

23.8

2

usable

21/07/20

31.8

27

31.5

0

accurate

25.5

1

accurate

22/07/20

31.5

27

31.2

0

accurate

26.6

0

accurate

23/07/20

32.2

27.2

31.1

1

accurate

26.6

1

accurate

24/07/20

31.8

25.5

31.8

0

accurate

26.8

1

accurate

25/07/20

28.4

26

31.4

3

incorrect

25.5

0

accurate

26/07/20

30.8

25.5

28.8

2

usable

26

0

accurate

27/07/20

31

25.5

31

0

accurate

25.6

0

accurate

28/07/20

28

25.5

30.9

3

incorrect

25.6

0

accurate

29/07/20

26.8

25

28.4

2

usable

25.5

0

accurate

30/07/20

30

25

27.7

2

usable

25.4

0

accurate

31/07/20

31

25.5

30.2

1

accurate

25.1

0

accurate

REFERENCES:

http://debis.deu.edu.tr/userweb/onder.hanedar/dosyalar/1979.pdf

https://otexts.com/fpp2/arima-r.html

http://www.imdmumbai.gov.in

http://www.imdmumbai.gov.in/scripts/search.asp