Prediction of Air Quality in Urban Area, Chennai

— The quality of air in Alandur, Chennai is polluted by Particulate Matter (PM 2.5 ) over the years. Reports prove that particulates affect the health of humans and environment. Development of accurate forecasting models to find PM 2.5 concentration in air helps to take control measures, early warning and mitigative measures. In this study, the performance of non-linear model (Feed Forward Back Propagation using LEARNGD function) with meteorological data and gaseous pollutants as input parameters from the year 2015–2019 at Alandur with different surrounding activities of urban area. In this paper, the prediction of PM 2.5 in the study area is mainly focused to find the effects of harmful emissions. To predict PM 2.5 , an artificial neural network (ANN) prediction model is developed. The data obtained from the monitoring station on the Alandur Bus depot of Alandur area in Chennai is given as input variable. The prediction model is validated and evaluated by statistical calculations, and then it was found that it performed well in the prediction of PM 2.5 . The performance of the developed model was evaluated by Mean Square Error (MSE) and value of R 2 . The best prediction performance was observed in the model for Purelin transfer function with R 2 value of 0.96 and MSE of 0.094 and for Tansig transfer function with R 2 value of 0.97 and MSE of 0.103 from the framed networks.


I. INTRODUCTION
Technological advancements lead to the emissions of air pollutants over the decades. Major concerns in industrial cities which experience air pollution, can be harmful not only for the environment but also for human health. Due to this urban resident are more likely to live in less polluted neighborhoods to avoid the health impact of air pollution. Atmospheric pollution can be classified into three types based on the sources mobile, stationery and area sources. Mobile sources are due to the motor vehicles, airplanes, locomotives and other engines and equipment that are able to move to different locations. Stationary sources include foundries, fossil fuel burning, food processing plants, power plants, refineries and other industrial sources. Area sources is caused by certain local actions. Air pollution can be caused due to the pollutants which are emitted directly from a source or which are not directly emitted as such. It can result in the degradation of ambient air quality in the industrial cities. Also daily exposure of people to air pollution results in diseases like asthma, wheezing, and bronchitis.
Air quality monitoring data are used to check the concentration with the ambient air quality standards provided by the government. The purpose of prediction is to develop effective emission control strategies and also helps to find the contribution of each source causing pollution.
There are two types of prediction methods, deterministic and stochastic. In this work, deterministic method is used for the prediction. This methods works on the basis of physical and chemical transportation process of pollutants with the influences of meteorological variables, by mathematical models.
Artificial neural networks help to forecast the pollutants in complicatednon-linear functions. The accuracy of prediction by artificial neural networks is higher than other methods. The learning process of ANN is similar to animal brain and it can process nonlinear and complex data. It can learn and identify correlated patterns for input data sets to corresponding target values. After training, ANN is used to predict the output of new independent input data.
In this research, feed-forward back propagation neural network model is used for prediction of air quality where data collected for the last five years is prediction. This research is done due to the lack of awareness about the real time air quality status among the society. The prediction model by ANN is done by MATLAB software.
The objective is to collect the PM2.5 and meteorological data that play a major role in ambient air pollution and to predict the concentration of PM2.5 by ANN.

A. Study Area
Chennai, the capital of Tamil Nadu in India is located on the Coromandel Coast off the Bay of Bengal. It is the economic and educational centre of south India. Chennai lies on the south-eastern coast of India.

International Journal of Engineering Research & Technology (IJERT)
ISSN: 2278-0181 http://www.ijert.org The city's population is 7,088,000. The area is 426 km 2 and is the densely populated area. The climatic conditions of Chennai is dry in summer tropical wet to the months of May to June and the cool in the month of January with occasional rainfall. The rivers that flow in Chennai are Kortalaiyar in the northern part, Cooum rivers and Buckingham canal flows parallel to the coast and the Otteri Nullah that is east -west stream.

Fig. 2. Alandur boundary map
Alandur is one of the zones of Chennai corporation, and an urban node in Guindy division in Chennai district in the state of Tamil Nadu, India. Alandur is the densely populated urban area in Chennai. It is located at the latitude and longitude of 13.03°N 80.21°E. It has an average elevation of 12 meters (39 feet) from mean sea level (MSL). Alandur had a population of 164,430 according to 2011 Census. It has land area of 2 sq.km. It has State highway SH -48, National highway NH -45, Kathipara grade fly over and SIDCO industrial estate. This area was so busy with their vehicular movement and it is one of the congested areas in Chennai.

II. METHODS AND MATERIALS A. Data Sets
The first and foremost step in modelling is to collect and group the relevant data, both past data and data from air quality monitoring. The data is collected from the website of Central Pollution Control Board. It is very important that the required data and the factors that cause pollution are collected. The daily 24-hour average data for five years (2015-2019) is collected for the following parameters; wind speed, relative humidity, wind direction, temperature, sulphur dioxide, oxides of nitrogen, PM2.5. The five year mean of above parameters is given in table I.  model. It is easier to work and more flexible. The Neural Network Toolbox of this software has many variety of parameters for developing the networks.

C. Modelling
The second step to the modelling process is the implementation of the modelling software, this research uses Artificial Neural Network to determine the input and output of the model. Preprocessing of data, removing errors in data and dividing for training, validation and evaluation has to be done to get better results. After this data is ready to be implemented in ANN.
In this research the methodology implies the Feed Forward Backpropagation neural network with three layers (input, hidden, and output). The network has input, target and output files. The input layer has the pollutant and meteorological data which is multiplied by coefficient of weights that is obtained by training process. And this meteorological data should have influence on output data.
The five year past data on daily twenty-four-hour average measurements are used to form the input matrix.. The matrix helps the model to insight the meteorological condition at a given time. The two hidden layer is adopted in this modelling and consist of the ten neurons. The output layer has the target data which is to be predicted. This model uses FFANNBP, after that the training, validation, and evaluation of ANN model can be conducted. The type of the activation functions and training algorithm used influences the strength of model prediction. The data between the hidden layers is directed by the activation function. The optimization of weight coefficient in every iteration of the training process is done by training algorithm, which helps in increasing the accuracy of model. Due to the ability to adapt to nonlinear problems, nonlinear activation function and learning algorithm are widely used. So, sigmoid activation functions and Variable learning rate back propagation learning algorithm were chosen. The schematic network of ANN model is shown in figure 4.

D. Validation
The third step involves the validation of the model. It will define the quality of the model and the response as the training process when completed. A prepared set of the input and output data were used to validate the model and the data response is compared with modelled and measured. The model response is provided graphically based on the validation data set. It has to be understood by numerical quality measures so Mean Squared Error (MSE) and Coefficient of Determination (R 2 ) are used.

E. Evaluation
The next step is to evaluate the model, with the response of training or validation process. The earlier prepared set of the input and output data are used to evaluate the model and the data response is compared with modelled and measured. The evaluation and validation is similar in their process, the difference lies in the number of numerical quality measures used. In general, •    Form the above figure 6, the regression value is R = 0.965. The relevancy of the target and the ANN output is given by the regression plot. R= 0.965 shows that the output of ANN matches with the target. The relevance between the outputs and targets were indicated by the Regression (R) value. When R is 1, precise linear relevance is achieved between targets and outputs. Similarly, when R is zero, there is no linear relevance is achieved between targets and outputs. In this study, the training data show proper relevance between targets and outputs. Also, the validation and checked outcome gives R values greater than 0.965.

B. Performance of Purelin Transfer Function
The following figure 7 shows that the best performance is achieved by the model using purelin transfer function with the minimum MSE of 0.094.  The relevancy of the target and the ANN output is given by the regression plot. R= 0.964 shows that the output of ANN matches with the target. The relevance between the outputs and targets were indicated by the Regression (R) value. When R is 1, precise linear relevance is achieved between targets and outputs. Similarly, when R is zero, there is no linear relevance is achieved between targets and outputs. In this study, the training data show proper relevance between targets and outputs. Also, the validation and checked outcome gives R values greater than 0.964. There is a relatively similar reaction between TANSIG and PURELIN transfer functions in predictions in the term of correlation coefficient. But by considering MSE it can be concluded that PURELIN performs better than TANSIG when correlation is high, especially.
From the figure 6 and 8 the correlation coefficient, R value for both models are more or less similar. The model shows a good agreement between predicted and measured values is the best model, based on correlation coefficient value, R was chosen.
IV. CONCLUSION In this paper, the prediction of PM2.5 is done. Prediction is one of the application of artificial neural networks. The main objective of this research was to develop the model to predict of PM2.5 in Alandur location based on data from monitoring stations. The developed model can be used as a decision making tool to create early warning about the pollution of air in the particular area. Based on the analysis, the model having PURELIN transfer function in the neural network structure produces the best performance in the prediction of air quality compared to the network structure that uses TANSIG transfer function based on the values of R and the prediction accuracy. This model produces R of 0.965 which shows a good agreement between the targets and predicted outputs. However, this model produced good results for air quality forecasting. This type of model is simple and cost efficient, the model capability is associated with their performance. The produced model is more reliable for urban air quality characterization. And it insists to allow the further developments in order to produce best integrated air quality surveillance system for the Alandur area, since it reflects the problems due to the urban features, such as traffic and industries.