DEEP LEARNING MODELS DETECTING HIDDEN ANOMALIES IN POLLUTION DATA

S. Saranya; V. Anusuya

doi:10.17577/IJERTCONV14IS030017

ICCT - 2026 (Volume 14 - Issue 03)

DEEP LEARNING MODELS DETECTING HIDDEN ANOMALIES IN POLLUTION DATA

DOI : 10.17577/IJERTCONV14IS030017

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 18
Authors : S. Saranya, V. Anusuya
Paper ID : IJERTCONV14IS030017
Volume & Issue : Volume 14, Issue 03, ICCT – 2026
Published (First Online) : 04-05-2026
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

DEEP LEARNING MODELS DETECTING HIDDEN ANOMALIES IN POLLUTION DATA

Student

Department of Information Technology, Ramco Institute of Technology, 953624205044@ritrjpm.ac.in,

Abstract – Environmental pollution has become a major global concern due to rapid industrialization and urban growth, leading to harmful increases in pollutants such as CO, NOx, NO, O, and PM2.5. Monitoring pollution levels and identifying abnormal variations is essential for ensuring public health and guiding environmental policies. However, traditional statistical and rule-based methods often fail to detect hidden, nonlinear, or sudden anomalies present in real-world pollution data.

This system presents a deep learningbased framework for detecting hidden anomalies in environmental pollution datasets. The approach integrates three neural architecturesAutoencoder, Long Short-Term Memory (LSTM), and Bidirectional LSTM (Bi-LSTM)to learn normal pollutant behavior and identify deviations using reconstruction and prediction errors. The dataset undergoes preprocessing, normalization, and feature scaling before being analyzed by the models.

Experimental results show that deep learning models significantly outperform conventional methods in identifying subtle anomalies. Among the models compared, the Bi- LSTM achieves the highest accuracy due to its ability to learn temporal dependencies in both forward and backward directions. The outcomes demonstrate the potential of deep learning for reliable air-quality monitoring and provide a foundation for future smart environmental management systems.

INTRODUCTION

Air pollution has become one of the most serious environmental challenges in todays world, mainly due to rapid urbanization, industrial activities, and increased vehicle emissions. Pollutants such as CO, NOx, NO, O, and PM2.5 directly affect human health and contribute to environmental degradation. Pollution data collected from monitoring stations is usually time-dependent, noisy, and influenced by weather conditions, making it difficult to interpret using simple analytical methods.

To ensure reliable environmental monitoring, it is essential to detect anomaliesunexpected spikes, sudden drops, or irregular patterns in pollutant levels. These anomalies may indicate hazardous pollution events, sensor malfunction, or unusual environmental conditions. Traditional statistical models struggle to capture the nonlinear and complex relationships present in real-world pollution datasets, creating a need for more intelligent, automated approaches.

Deep learning models provide a powerful solution by learning hidden patterns in the data without requiring manual feature engineering. Models such as Autoencoders, LSTM, and Bi-LSTM can analyze temporal behavior, reconstruct normal trends, and accurately identify unusual deviations. This project

Associate Professor Department of Information Technology,

Ramco Institute of Technology. anusuyav@ritrjpm.ac.in

focuses on building a deep learningbased framework capable of detecting hidden anomalies in pollution data.

The main objectives of the study are to preprocess real pollution datasets, develop and train three deep learning models, compare their performance using error-based metrics, and visualize the detected anomalies. The scope of the project includes algorithm development, model comparison, and result interpretation, serving as a foundation for future real-time environmental monitoring systems.
METHODOLOGY

The methodology of this study focuses on developing a deep learningbased framework capable of detecting hidden anomalies in pollution data. The approach involves a structured sequence of processes, starting from data collection to model evaluation.
1. Dataset Description
  
  The dataset contains hourly readings of major pollutants such as CO, NO, NOx, O, and PM2.5, along with atmospheric parameters like Temperature (T), Relative Humidity (RH), and Absolute Humidity (AH). These variables provide both environmental and temporal context required for accurate anomaly detection.
2. Data Preprocessing
  
  To ensure data quality, several preprocessing steps are performed:
  - Handling Missing Values: Using interpolation and forward-filling methods.
  - Normalization: Scaling all features using Min-Max normalization to ensure uniform input range.
  - Time Indexing: Converting timestamps into sequential time-series format.
  - Feature Selection: Retaining only relevant pollutant and atmospheric features.
3. Model Selection
  
  Three deep learning models are used to analyze normal patterns and detect anomalies:
  - Autoencoder: Learns compressed feature representations and identifies anomalies using reconstruction error.
  - LSTM: Captures long-term temporal relationships and detects deviations through prediction error.
  - Bi-LSTM: Processes data in both forward and backward directions for richer temporal understanding.
    
    System Workflow
    
    The complete workflow includes:
  - Importing and cleaning the dataset
  - Normalizing and preparing data for sequential analysis
  - Training each model on normal pollutant behavior
  - Computing reconstruction/prediction errors
  - Setting a dynamic threshold (99th percentile)
  - Labeling data points above the threshold as anomalies
4. Training and Evaluation
  
  Each model is trained using optimized parameters such as learning rate, batch size, and number of epochs. Performance is evaluated using:
  - Accuracy
  - Precision
  - Recall
    
    Visualization tools, including scatter plots, error graphs, and correlation heatmaps, are used to interpret the detected anomalies.
    
    Figure 3.2 PM2.5

RESULTS

Output Screens

Histogram & KDE plots showing data distribution
Reconstruction/Predictive error graphs
Scatter plots highlighting anomalies (red dots)
Analysis as shown in Figures 4.1,4.2,4.3,4.4

,4.5,4.6,4.7,4.8,4.9 and 4.10

Figure 3.1 O3

Figure 3.3 NO2(GT)

Figure 3.4 Distribution of T

Figure 3.5 Distribution of NO

Figure 3.6 Reconstruction Error

Figure 3.7 Anamoly detected by the Encoder

Figure 3.8 Reconstruction Error Overtime

Figure 3.9 Correlation between Pollutants and Atmospheric Parameters

Figure 3.10 F1 Score

Accuracy and Performance Comparison:

To evaluate the effectiveness of anomaly detection models, we compared three deep learning approaches:
Autoencoder (AE)
Long Short-Term Memory (LSTM)
Bidirectional LSTM (Bi-LSTM

Why Autoencoder Performs Better for Air Pollution Anomaly Detection?

Unsupervised Learning
- Autoencoder doesnt need labeled anomaly data, which is ideal because true anomaly labels are rarely available in pollution datasets.
Feature Reconstruction Ability
- It learns to reproduce the normal correlation among pollutants (e.g., how CO, NO, PM2.5, RH, and T behave together).
- When this relationship is disturbed (during an anomaly), the reconstruction error increases sharply making anomalies easy to detect.
Noise Resistance
- Air quality data can e noisy or incomplete.
- Autoencoders generalize patterns and are less affected by missing or slightly corrupted data compared to LSTM/Bi-LSTM, which may overfit.
Lower Computational Cost
- Autoencoders have a simpler structure (feed- forward layers) than sequence-based models, so they train faster and require less memory.
Better Adaptability
- Works well for both temporal and non-temporal datasets (e.g., static sensor readings).
- LSTM/Bi-LSTM are better for pure time-sequence prediction, not always for unsupervised anomaly detection.

Performance Comparison

Metric	Autoencoder	LSTM	Bi-LSTM
Accuracy	9296%	8590%	8892%
Precision	0.900.95	0.820.88	0.840.90
F1-Score	0.910.94	0.800.86	0.830.89
Computation Time	Low	Moderate	High
Data Label Requirement	Not required	Required	Required

CONCLUSION

This study presents a deep learningbased framework for detecting hidden anomalies in environmental pollution data using Autoencoder, LSTM, and Bi-LSTM models. The analysis demonstrates that deep learning techniques are highly effective in identifying irregular pollutant patterns that traditional statistical methods often fail to capture. By preprocessing the dataset, normalizing pollutant features, and training sequential models, the system successfully learns the natural behavior of pollutants such as CO, NO, NOx, O, and PM2.5.

Among the models evaluated, the Bi-LSTM showed superior performance due to its bidirectional learning capability, enabling it to understand both past and future temporal relationships. The Autoencoder and LSTM also performed well, but Bi-LSTM offered higher accuracy and more reliable anomaly detection. The results confirm that deep learning models can play a significant role in smart environmental monitoring systems, providing accurate early detection of unusual pollution events.

This work lays a strong foundation for future extensions such as real-time deployment, IoT sensor integration, cloud-based streaming, and automated alert systems. With further refinement, this framework can support smart city initiatives and help in environmental decision-making and public health protection.

REFERENCES

P. Malhotra et al., Long Short Term Memory Networks for Anomaly Detection in Time Series, Proc. ESANN, 2015.
T. Amarbayasgalan et al., Unsupervised Novelty Detection Using Deep Autoencoders, Applied Sciences, 2018.
Z. Qi et al., Deep Air Learning: Prediction and Feature Analysis of Fine-Grained Air Quality, IEEE Transactions on Knowledge and Data Engineering, 2018.
J. Chen et al., LSTM Networks for Air Quality Prediction, Atmospheric Environment, 2019.
Y. Zhang et al., Deep Learning for Air Pollution Forecasting: A Survey, IEEE Access, 2020.
F. Chollet, Deep Learning with Python, Manning Publications, 2017.
H. Ren et al., Time-Series Anomaly Detection Using LSTM Autoencoders, IEEE Access, 2019.
S. Hochreiter and J. Schmidhuber, Long Short-Term Memory, Neural Computation, 1997.
D. P. Kingma and M. Welling, Auto-Encoding Variational Bayes, arXiv, 2014.
K. He et al., Deep Residual Learning for Image Recognition, CVPR, 2016.