 Open Access
 Authors : Aarati G Gangshetty , Gurpreet Kaur , Uttam Sitaram Malunje
 Paper ID : IJERTV10IS110075
 Volume & Issue : Volume 10, Issue 11 (November 2021)
 Published (First Online): 29112021
 ISSN (Online) : 22780181
 Publisher Name : IJERT
 License: This work is licensed under a Creative Commons Attribution 4.0 International License
Time Series Prediction of Temperature in Pune using Seasonal ARIMA Model
Aarati Gangshetty 1
, D, &()
, a
Gurpreet Kaur 2
, , &()
,
Uttam Sitaram Malunje 3 Technical Officer A, , R&() ,
Abstract – An effort has been made to develop a SARIMA (Seasonal Autoregressive Integrated Moving Average) model for temperature prediction using historical data from Pune, Maharashtra. The historical dataset from the year 2009 to 2020 has been taken for observation. When there is a repeating cycle is present in a time series, instead of decomposing it manually to fit an ARIMA model, another very popular method is to use the seasonal autoregressive integrated moving average (SARIMA) model. Time series currently is becoming very popular, a reason for that is declining hardware's cost and capability of processing. The model can be used to set the outline for the upcoming year. Enumerate the effects of unexpected changes or disruptions in the system. The seasonal ARIMA model is designed by running Python 3.7.4 on Anaconda Jupyter Notebook and using the package matplotlib 3.2.1 for data visualization. The goodness of fit of the model was verified against standardized residuals, the autocorrelation function, and the partial autocorrelation function. We discover that SARIMA (1,1,1)(1,1,1)12 can represent very well in the performance. We found MAE of 0.60850 and RMSE of 0.76233 for SARIMA model. According to the model diagnostics, the model was good for predicting temperature.
Keywords – SARIMA, Prediction, ARIMA, temperature.

INTRODUCTION
The primary aim of time series model is to accumulate and take a look at historic values to develop suitable models that describe the essential structure and traits of the time series. Alternatively, the time series forecasting model observes distinct values and expecting to predict future values. Regression evaluation method often tests theories that the current data of one or more time series has the impact at the present data of another time series [1]. Time series statistics occurs in lots of areas like economic analysis, sensor monitoring of community, evaluation of scientific issues, and mining of social interest. Greater recent fields focus on the subject and talk to it as time series forecasting. Forecasting contains taking models fitting on past data and using them to predict future remarks. Descriptive models can derive for the future (i.e. to smooth out or eliminate noise), they are just trying to better define the statistics. An important distinction in the forecasts is that the future is completely impossible to obtain and that one should expect only that which has already taken place.
The prediction of time series is the use of a given model to predict future values against historical values, and can therefore be understood as a method. Separate numerical forecasts of metrological forecasts and time series use a
model to predict future values based on historical values. Owing to the position of time series forecasting in uncountable practical fields, researchers should pay attention to fitting a suitable model to the time series. Over the past many years, many intelligent time series models have been developed in the literature to improve the correctness and effectiveness of time series forecasting. One of the most widely used and known time series models of statistical forecasting is the Integrated Moving Average (ARIMA)selfregulatory model. The ARIMA model is well known for notable forecasting correctness and effectiveness in representing various types of time series [3] with simplicity as well as the associated, Box Jenkinss methodology for optimal model construction. For seasonal time series forecasting Box Jenkins [4]
proposed a relatively effective variation of the ARIMA model called the Seasonal ARIMA (SARIMA) model. The main design objectives for this paper are set out below:

Plotting the data as a time series plot

Checking the data, if it has any trend or seasonality

Predicting values of SARIMA (p, d, q) (P, D, Q)s

Applying SARIMA (p, d, q) (P, D, Q)s to predict future values.


LITERATURE SURVEY
RiosMoreno et al. [5] used outside air temperature, relative humidity, air velocity, and global solar radiation flux as outside variables to an autoregressive (AR) and an autoregressive moving average (ARMA) model. They effectively forecasted the room temperature in a university room in Mexico. The results showed that the outside variable older than 20 minutes did not recover the performance of the model. Felice et al. [6] used a non seasonal time series method to predict energy demand at the national and regional level in Nigeria. It was established that using temperature as an outside variable enhanced the prediction results. Mahmudur Rahman, A.H.M. Saiful Islam, Sahah Maqnoon Nadvi, Rashedur M Rahman (2013) examine the Arima and Anfis Model and outline how the Model ARIMA can more efficiently capture the dynamic concert of property say weather, Temperature, Humidity and Air pressure which must be compared by various evaluation measures, for example, with the Mean Square Error(MSE), RSquare Error and the Sum of the Square Error(SSE) [7] and the author can check whether ARIMA would yield a more precise result than other models.
Vol. 10 Issue 11, November2021
In addition, [8] accepted the study to examine the trend and forecast the maximum monthly temperature in Nigeria using the SARIMA model. According to the simplest suitable SARIMA model, the predicted maximum temperature of five years is slightly stable compared to that of reference. In another review,
[9] fitted SARIMA model to average temperature for the measure of 19802010 of Dibrugarh utilizing automatic arima function i.e., autoarima() in R software. Keeping these points in mind, an effort has been made to develop a SARIMA model. 
METHODOLOGY
Temperature data recorded from 2009 to 2020 were obtained for Pune city, from the meteorology department at onehour intervals [12]. The longitude and latitude of the automatic weather station is 73.856255 and 18.516726, respectively. The data collected has different parameters, such as date time, temperature, humidity, moonrise, wind speed, wind direction, pressure. From this, we have eliminated features that have large amounts of missing data and we have considered temperature as an input parameter. The seasonal ARIMA model is implemented by running Python 3.7.4 on Jupyter Notebook and using the package matplotlib 3.2.1 for data visualization. Time series plot of temperature for the year 2018 was shown in Figure1. The hourly temperature data during 20092018 is used as the training set, while that during 2019 2020 is used as the testing set. To evaluate the forecast accuracy, as well as to compare the results obtained from different models, the mean square error (MSE) is calculated.
Figure 1 Time series plot of Temperature in Pune (year2018)
Figure 2 Flowchart of the proposed model

Check stationarity
If the time series is not stationary, it needs to be stationarized through differencing. We tested our dataset stationarity by augmented DickeyFuller test and found that it is not stationary. Take the first difference then, determining stationarity with an augmented DickeyFuller test until P value is less than or equal to 0.05(P0.05). The order of differencing d is chosen such that it minimizes the standard deviation. The previously differencd series which is now a stationary series might still have some auto correlated faults which can be detached by adding AR terms (p 1) and MA terms (q 1) in the forecasting equivalence.

Plot ACF and PACF
In this step, the ACF and PACF of the data are plotted. Autocorrelation functions (ACF) and partial autocorrelation functions (PACF) are used to identify potential models. We can see there is gradual decreasing pattern in ACF plot while there is PACF cuts immediately after lag. Thus, the graph suggests that AR (1) would be appropriate for the model. If there's a pointy cutoff within the PACF of the differenced series and therefore the series shows mild underdifferencing, an AR term is added to the model. If there is a high limit in the ACF of the differenced series and the series shows mild over differenced, an MA term is further added to the model. Then select the optimal model based on a performance metric like AIC (Akaike Information Criteria).
Figure 3 Autocorrelation Function
Figure 4 Partial autocorrelation Function

Auto Regressive Integrated Moving Average Model ARIMA processes can be generally divided into two distinct processes, namely autoregressive (AR) processes, and moving average (MA) processes. The constraints can be defined as

p: the number of interval observations in the model; also known as the interval order.

d: differencing is needed to make the series stationary.

q: the size of the moving average window; also known as the order of the moving average.
According to Box Jenkins methodology an ARIMA model is generally written as ARIMA (p,d,q) [10].
The AR(p) model is defined by the equation:
seasonal behavior where a certain basic pattern tends to be repeated at regular seasonal intervals. Seasonal ARIMA model (SARIMA) is designed by addition of seasonal terms in the ARIMA models listed above. SARIMA models are written as,
ARIMA (p, d, q) (P, D, Q) m (4)
Where (p, d, q)(P, D, Q) m are the nonseasonal and seasonal part of the model, respectively. The d constraint tells how many differencing orders are going to be used to make the series stationary. The constraint m is the number of periods per season. The constraint m value is set with a period of 12.



EXPERIMENTAL RESULTS
Akaikes Information Criterion (AIC) is the most commonly used model selection criterion [10]. AIC basically deals with the goodness of fit of a model. AIC is calculated as [10]: AIC = 2 ln (maximum likelihood) + 2p
Where, p represents the number of independent constraints estimated. Therefore, when comparing models, the one with the least AIC value is chosen. According to Table 1, SARIMA (1, 1, 1) Ã— (1, 1, 1) 12 shows the lowest AIC value (AIC=196085.724). Other than this value all AIC values are larger. Thus, this model should be said that the good forecasting model.
SARIMA (p, d, q) (P, D, Q)s
AIC Values
SARIMA(0, 0, 0)x(0, 0, 0)12
AIC:204680.074
SARIMA(0, 0, 0)x(0, 0, 1)12
AIC:217984.649
SARIMA(0, 0, 0)x(0, 1, 0)12
AIC:200841.533
SARIMA(0, 0, 0)x(0, 1, 1)12
AIC:215339.952
SARIMA(0, 0, 0)x(1, 0, 0)12
AIC:197085.724
SARIMA(0, 0, 0)x(1, 0, 1)12
AIC:247760.254
SARIMA(0, 0, 0)x(1, 1, 0)12
AIC:196650.122
SARIMA(0, 0, 0)x(1, 1, 1)12
AIC:205623.637
SARIMA(0, 0, 1)x(0, 0, 0)12
AIC:245623.637
SARIMA(0, 0, 1)x(0, 0, 1)12
AIC:225623.637
SARIMA(0, 0, 1)x(0, 1, 0)12
AIC:235623.637
SARIMA(0, 0, 1)x(0, 1, 1)12
AIC:197623.637
..
..
SARIMA(1, 1, 1)x(1, 1, 1)12
AIC:196085.724
..
..
SARIMA (p, d, q) (P, D, Q)s
AIC Values
SARIMA(0, 0, 0)x(0, 0, 0)12
AIC:204680.074
SARIMA(0, 0, 0)x(0, 0, 1)12
AIC:217984.649
SARIMA(0, 0, 0)x(0, 1, 0)12
AIC:200841.533
SARIMA(0, 0, 0)x(0, 1, 1)12
AIC:215339.952
SARIMA(0, 0, 0)x(1, 0, 0)12
AIC:197085.724
SARIMA(0, 0, 0)x(1, 0, 1)12
AIC:247760.254
SARIMA(0, 0, 0)x(1, 1, 0)12
AIC:196650.122
SARIMA(0, 0, 0)x(1, 1, 1)12
AIC:205623.637
SARIMA(0, 0, 1)x(0, 0, 0)12
AIC:245623.637
SARIMA(0, 0, 1)x(0, 0, 1)12
AIC:225623.637
SARIMA(0, 0, 1)x(0, 1, 0)12
AIC:235623.637
SARIMA(0, 0, 1)x(0, 1, 1)12
AIC:197623.637
..
..
SARIMA(1, 1, 1)x(1, 1, 1)12
AIC:196085.724
..
..
Table1 AIC vales of SARIMA model
Where,

Xt = response variable at time t

Xt1, Xt2, ., Xtp = response variable
(1)
at time t1, t2 and tp respectively.

=constant term

1, 2 andp= coefficients to be estimated

t = error term at time t
The MA(q) model is defined by the equation:
Where,

Xt=response variable at time t

=constant term

wt1, wt2, , wtq = forecast errors at timeseries
(2)
lags t1, t2 and tq

1, 2 and q=coefficients to be estimated

t = error terms at time t
By combining equation (1) and (2) Autoregressive integrated moving average model ARIMA (p,d,q) can be written mathematically as

Seasonal ARIMA model
(3)
In addition to trend, stationary series quite commonly display
In addition to trend, stationary series quite commonly display
Diagnostic Test: The forecast accuracy of the selected model is validated by applying a DickeyFuller test. According to Table2 the AIC value of SARIMA (1, 1, 1) Ã— (1, 1, 1)12 is the lowest. Table2 summarizes the results of the diagnostics test of the SARIMA (1, 1, 1) Ã— (1, 1, 1,) 12 model.
Coef
std err
z
P>z
[0.025 0.0975] Const
28.1213
0.461
60.976
0.000
27.217 29.025
AR.L1
0.7704
0.003
222.732
0.000
0.764 0.777
MA.L1
1.0000
0.006
155.452
0.000
1.013 0.987
AR.S.L12
0.6651
0.005
147.274
0.000
0.674 0.656
MA.S.L12
0.8680
0.002
381.308
0.000
0.873 0.864
Coef
std err
z
P>z
[0.025 0.0975] Const
28.1213
0.461
60.976
0.000
27.217 29.025
AR.L1
0.7704
0.003
222.732
p>0.000 0.764 0.777
MA.L1
1.0000
0.006
155.452
0.000
1.013 0.987
AR.S.L12
0.6651
0.005
147.274
0.000
0.674 0.656
MA.S.L12
0.8680
0.002
381.308
0.000
0.873 0.864
Table2 Summary of the diagnostics test of the SARIMA (1,1, 1) Ã— (1, 1, 1,) 12 model.
The second column is the weight of the coefficients. The Coef column shows the weighting (i.e., importance) of each feature and how each one impacts the time series. The first set of AR and MA variables (AR.L1 and MA.L1, respectively) is lagged by one time step, while the second set is lagged by 12 time steps (AR.S.L12 and MA.S.L12) Since all values of P> z are less than 0.05, the results are statistically significant.
(5a)
(5b)
Figure 5 Diagnostic tests on the residuals of the model
(5a) Distribution of standardize residuals (5b) Normal QQ plot
The results of the diagnostic test on SARIMA (1,1,1) Ã— (1,1,1) 12 are shown in Figure 5. According to Figure 5a, the results imply that the residual follows a normal distribution, with mean equal to 0 and standard deviation equal to 1. In Figure 5b, the QQ plot of the residuals implies that the residuals follow a linear trend. Thus, the residuals are normally distributed. Table3 show the comparison between actual and predicted value of temperature in C.
Table3 Actual value vs predicted values of temperature ( C)
DateTime
Actual Values
Predicted Values
20190131
02:00:00
30.193548
28.765794
20190228
02:00:00
32.642857
31.115332
20190331
02:00:00
35.032258
35.282232
20190430
02:00:00
35.466667
35.745039
20190531
02:00:00
35.677419
35.375808
20190630
02:00:00
30.033333
33.774278
20190731
02:00:00
28.258065
27.734503
20190831
02:00:00
29.129032
28.324608
20190930
02:00:00
29.166667
29.434541
20191031
02:00:00
28.580645
29.058096
20191130
02:00:00
29.533333
28.366377
20191231
02:00:00
29.516129
29.968087
Figure 6, shows the time series plot of actual value and predicted values of the temperature using SARIMA model. To evaluate the quality of the model, we will first
compare the predicted values with the actual values. We can also see some kind of variations in the plot. These types of seasonal variations may cause by climate condition and any other external factors. From this figure, we can observe that the prediction results are almost equal to the actual data. We can say that the seasonal ARIMA model is performing better. Figure 7, shows the time
series plot of actual value and predicted values of the temperature using ARIMA model. From the results we can say that the model is not fitted well as compare to SARIMA model. Figure 8, shows the future predictionof
temperature using SARIMA model.
Figure 6Actual value v/s predicted value using SARIMA model
Figure 7Actual value v/s predicted value using ARIMA model
Figure 8 Future Prediction of temperature using SARIMA
model
Figure 9 Time series plot of the future prediction (year2021)
The above figure shows the time series plot of the temperature prediction. During the month from JuneAugust we can see the sudden decrease in temperature, we can assume that this will be due to rainy season.


PERFORMANCE EVALUATION
Mean Square Error(MSE), Root Mean Square Error(RMSE) and Mean Absolute Error(MAE) were used as performance evaluation metrics given in Table 4. By taking the square of the errors, MSE is calculated as [11]:
(5)
RMSE takes the root of the MSE. Thus, it has the same unit of measurement as the data. It is calculated as [11]:
(6)
Mean absolute error is the average of the absolute values of the deviation.
= () (7)
Method 
MAE 
MSE 
RMSE 
ARIMA 
6.052 
56.187 
7.496 
SARIMA 
0.60850 
0.58114 
0.762325 
Method 
MAE 
MSE 
RMSE 
ARIMA 
6.052 
56.187 
7.496 
SARIMA 
0.60850 
0.58114 
0.762325 
Table4 Results of the performance evaluation of the model
The predicted temperature values are compared with actual values for accuracy based on error metrics. We obtained MAE of 0.60850 and RMSE of 0.76233for SARIMA model and MAE of 6.052 and RMSE of 7.496 for ARIMA model. From the above table, we concluded that SARIMA model forecasts yielded least error in prediction of temperature as output.
CONCLUSION
In this paper, temperature data were collected from the year 20092020 at onehour intervals in Pune. The estimation and diagnostic analysis results revealed that the model adequately fitted to the historical data. Power load forecasting is the basic work of power grid control optimization and significant part of power system transmitting. In practical applications, the non linear relationship between environmental factors and load changes defined by the improved optimization algorithms can effectively reduce the deviation between predicted results and actual results. To maintain an electrical grid's production/consumption balance, stochastic production forecasting must be implemented at multiple temporal horizons based on the level of utilization. Finally, the predicted values were compared with the actual values of both using ARIMA and SARIMA model. Forecast accuracy measures, including MAE, MSE, and RMSE were calculated.
REFERENCES

Imdadullah. "Time Series Analysis". Basic Statistics and Data Analysis. itfeature.com. Retrieved 2 January 2014.

Raicharoen, t., lursinsap, c., & sanguanbhokai, p. (2018). Application of critical support vector machine to time series prediction. International symposium on circuits and systems (vol.5, pp.v741 v 744 vol.5). IEEE

Khandelwal, I., Adhikari, R., & Verma, G. (2015). Time series forecasting using hybrid arima and ann models based on DWT Decomposition. In Procedia Computer Science (Vol. 48, pp. 173179)
Elsevier B.V. https://doi.org/10.1016/j.procs.2015.04.167

Box, G. E. P., & Jenkins, G. M. (1976). Time series analysis forecasting and control – rev. ed. Oakland, California, HoldenDay, 1976, 37 (2), 238 – 242.

G. J. RiosMoreno, M. TrejoPerea, R. Castaneda Miranda, V. M. HernandezGuzman, and G. Herrera Ruiz, Modelling temperature in intelligent buildings by means of autoregressive models, vol. 16, pp. 713 722, 2014.

M. De Felice, A. Alessandri, and P. M. Ruti, Electricity demand forecasting over Italy, Potential benefits using numerical weather prediction models, Electr. Power Syst. Res., vol. 104, p. 7179, 2013.

Mahmudur Rahman, A.H.M. Saiful Islam, Sahah Yaser Maqnoon Nadvi, Rashedur M Rahman (2013): Comparative Study of ANFIS and ARIMA Model for weather forecasting in Dhaka IEEE

Chisimkwuo, J., Uchechukwu, G. And Okezie S.C. 2014 Time series analysis and forecasting of monthly maximum temperatures in south eastern Nigeria. International Journal of Innovative Research and Development. ISSN 2278 0211. 3(1), pp. 165 171

Roy, T. D. and Das K. K. 2012 Time series analysis of Dibrugarh air temperature. Journal of Atmospheric and Earth Environment. 1(1), pp. 30 34.

BoxJenkins models, NIST handbook of statistical method

C. Chatfield, Timeseries Forecasting. Chapman & Hall/CRC, 2015.

www.kaggle.com