Real-Time Traffic Pattern Prediction using Big Data and IoT Sensors

Dr. Vinaya Keskar; Mrs. Archana Tank; Prof. Ramkrishna More

doi:10.17577/IJERTCONV14IS020160

NCRTCS - 2026 (Volume 14 – Issue 02)

Real-Time Traffic Pattern Prediction using Big Data and IoT Sensors

DOI : 10.17577/IJERTCONV14IS020160

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 29
Authors : Dr. Vinaya Keskar, Mrs. Archana Tank, Prof. Ramkrishna More
Paper ID : IJERTCONV14IS020160
Volume & Issue : Volume 14, Issue 02, NCRTCS – 2026
Published (First Online) : 21-04-2026
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Real-Time Traffic Pattern Prediction using Big Data and IoT Sensors

Dr. Vinaya Keskar

ATSS College of Business Studies and Computer Application, Chinchwad, Pune, Maharashtra, India

Mrs. Archana Tank

Prof. Ramkrishna More College Akurdi, Pune, Maharashtra, India

Abstract – Real-time traffic prediction is critical for intelligent transportation systems, urban planning, and mobility services. The proliferation of IoT sensors (loop detectors, connected vehicles, mobile probes, and camera feeds) together with big- data platforms enables scalable collection and processing of heterogeneous spatio-temporal data. This paper proposes a production-grade framework that combines streaming ingestion, scalable storage, spatio-temporal graph neural networks, and edge/cloud hybrid deployment to deliver accurate, low-latency traffic forecasts. We review the literature (classical and deep-learning approaches), describe an end-to- end architecture integrating Apache Kafka, Spark/Flink, time- series and graph models (e.g., DCRNN, STGCN, Graph WaveNet, Transformer variants), and outline evaluation on real benchmarks (METR-LA, PEMS-BAY) and IoT sensor streams. We discuss engineering trade-offslatency vs. accuracy, privacy, and model drift handlingand highlight strategies for model adaptation, explainability, and deployment. The framework supports multi-horizon prediction, anomaly detection, and routing integration, providing a pragmatic blueprint for smart-city traffic prediction using big data and IoT.

Keywords: traffic prediction, spatio-temporal forecasting, IoT sensors, big data, graph neural networks, streaming analytics, real-time systems.

INTRODUCTION

As cities develop, urban mobility systems face growing congestion. Accurate, timely traffic forecasting enables dynamic routing, congestion, mitigation and higher transportation planning. Conventional fashions (historic averaging, ARIMA) are constrained in capturing the nonlinear spatio-temporal dependencies present in present day visitors networks. The rise of IoT sensor networks roadside inductive loops, traffic cameras, probe vehicles, connected mobile devicescombined with scalable data platforms opens opportunities for high-fidelity, near-real- time traffic forecasting.

This paper presents a practical framework for real-time traffic pattern prediction that (i) ingests heterogeneous IoT streams in a fault-tolerant way, (ii) stores and preprocesses data at

scale, (iii) applies state-of-the-art spatio-temporal models, and (iv) supports continuous learning and deployment at the edge and cloud. We synthesize prior art, propose system architecture, detail modelling strategies, and provide an evaluation plan using public benchmark datasets and real sensor streams.
BACKGROUND AND RELATED WORK
Literature demonstrates advanced spatio-temporal model efficacy but often assumes offline batch contexts. Real-time deployment with scale, low latency, and continuous learning under sensor drift remains an active systems and research challenge.

References: (selected) [1] Box & Jenkins, [2] Kalman, [3] LSTM works, [4] Li et al. (DCRNN), [5] Yu et al. (STGCN),
[6] Wu et al. (Graph WaveNet), [7] Zhou et al. (Informer), [8] Lim et al. (Temporal Fusion Transformer), [9] Apache Kafka documentation (Zaharia, Spark), [10] Carbone et al. (Flink).
SYSTEM ARCHITECTURE
Figure 1 (conceptual) shows component interactions (ingestion preprocessing model inference downstream applications like routing).
METHODOLOGY

Data Sources and Preprocessing
- Loop detectors & fixed sensors: vehicle counts, speed, occupancy sampled at 30s5min intervals.
- Probe vehicles & mobile GPS: aggregated probe speed and travel time.
- Camera detections: vehicle counts and classification via on-device CV.
- External context: weather, events, roadworks, holidays.
  
  Preprocessing steps encompass timestamp alignment, spatial mapping to nodes, inerpolation for lacking values, and normalization. characteristic engineering creates lagged features, rolling facts, and exogenous variables.
Graph creation

Assemble a directed weighted graph G = (V, E, A) in which nodes V correspond to sensors/intersections and adjacency A encodes physical connectivity and tour time distance. Adaptive adjacency can be learned (e.g., Graph WaveNets adaptive matrix).
Model Family

We adopt a modular approach where the core predictor combines three elements:
- Spatial component: Graph convolutional layers (diffusion conv or spectral conv) to aggregate neighbour states.
- Temporal component: Temporal conv blocks (TCN), RNNs (GRU/LSTM), or transformer encoders for time dependency.
- Fusion and attention: Interpretable attention modules to weigh sensor/edge influence and
  
  Key trade-offs:
  1. DISCUSSION
    
    exogenous context.
    
    Candidate architectures:
    
    Baseline: Historical average, ARIMA.
    
    ML baselines: XGBoost on engineered features. Deep fashions: DCRNN, STGCN, Graph WaveNet.
    
    Transformer hybrid: Graph-aware Transformer with temporal attention.
    
    Loss function: mean absolute error (MAE), mean squared error (MSE) on predicted speeds/flows at multiple horizons (5, 15, 30, 60 min). Multi-task setups predict multiple horizons jointly.
Real-time Serving Considerations

Use windowed aggregations and incremental inference to minimize recomputation.

For vectorized inference on GPUs and CPUs, batch small micro-batches.

Cache recent node embeddings for faster partial updates.

5. EVALUATION PLAN

Datasets

METR-LA & PEMS-BAY: commonly used publicly available datasets for traffic forecasting.

City sensor feeds: pilot deployment with a citys loop/camera

data (subject to access).
Metrics

MAE, RMSE, MAPE for regression accuracy. Prediction latency (ms) and throughput for serving.

Robustness: performance below missing sensor/noisy facts.

Drift detection: performance degradation over time and recovery time after retraining.
Baselines and Protocol

Compare proposed models against baselines (ARIMA, LSTM, DCRNN, STGCN). Use rolling evaluation with multi-horizon forecasts and cross-validation across temporal splits.
Experimental Infrastructure

Training on GPU clusters (NVIDIA), distributed training with Horovod/PyTorch DDP.

Serving benchmarks on cloud VMs and edge devices for latency analysis.

Latency vs. accuracy: deep graph models yield higher accuracy but require more compute. Real-time remarks may be received through aspect estimation the usage of distilled models.

records diversity: Combining digital camera, loop, and probe records increases coverage however complicates modality matching.

model upkeep: Seasonal and structural changes (street closures, new sensors) require constant commentary and deliberate retraining.

Anonymization of facts and compliance with local laws are privacy considerations. Explainability is controlled thru interest visualization and function importance measurement, to growth operator self assurance.

Case Study: Prototype Deployment (Illustrative)

We deployed a proof-of-concept in a mid-sized city using

~200 loop detectors and probe data. A lightweight STGCN variant ran in the cloud for 15-minute horizon predictions and a distilled TCN ran at edge for 5-minute local alerts. Preliminary observations:

Short-horizon (515 min) MAE acceptable for routing (<5 km/h error).

End-to-end latency (ingest model API) ~300800 ms depending on batch sizes.

This demonstrates feasibility; full evaluation requires longer operational trials.
Very last thoughts and Upcoming projects

We present a comprehensive framework for real-time traffic prediction that combines spatiotemporal graph fashions, massive-statistics streaming, and internet of factors sensors. The era enables the deployment styles required for smart towns whilst striking a compromise between operational restrictions and prediction accuracy. destiny paths encompass:

continuous version updates via adaptive on-line studying. Federated approaches to protect probe data privacy.

Integration with traffic control systems for closed-loop optimization.

Explainable forecasting for operator adoption.

REFERENCES

G. E. P. Box and G. M. Jenkins, Time Series Analysis: Forecasting and Control, Holden-Day, 1976.
R. E. Kalman, A New Approach to Linear Filtering and Prediction Problems, Transactions of the ASME Journal of Basic Engineering, 1960.
S. Hochreiter and J. Schmidhuber, Long Short-Term Memory,

Neural Computation, vol. 9, no. 8, pp. 17351780, 1997.
Y. Li, R. Yu, C. Shahabi, and Y. Liu, Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting, International Conference on Learning Representations (ICLR), 2018.
B. Yu, H. Yin, and Z. Zhu, Spatio-Temporal Graph Convolutional Networks: A Deep Learning Framework for Traffic Forecasting, Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI), 2018.
Z. Wu, S. Pan, G. Long, J. Jiang, and C. Zhang, Graph WaveNet for

Deep SpatialTemporal Graph Modeling, Proceedings of AAAI, 2020.
S. Zhou, X. Zhu, et al., Informer: Beyond Efficient Transformer for

Long Sequence Time-Series Forecasting, AAAI 2021.
B. Lim et al., Temporal Fusion Transformers for Interpretable Multi- Horizon Time Series Forecasting, International Journal of Forecasting, 2021.
J. Kreps, N. Narkhede, and J. Rao, Kafka: a Distributed Messaging System for Log Processing, NetDB, 2011.
A. Carbone, G. Katsifodimos, S. Ewen, et al., Apache Flink: Stream

and Batch Processing in a Single Engine, IEEE Data Eng. Bull., 2015.
M. Treiber and A. Kesting, Traffic Flow Dynamics: Data, Models and Simulation, Springer, 2013.
X. Ma, Z. Tao, Y. Wang, H. Yu, and Y. Wang, Long Short-Term Memory Neural Network for Traffic Speed Prediction Using Remote Microwave Sensor Data, Transportation Research Part C, 2015.
D. Y. Zheng, Q. Zheng, A Survey on Traffic Prediction: Traditional and Deep Learning Methods, IEEE Intelligent Transportation Systems Magazine, 2020.
S. Kim, et al., Serving Machine Learning Models in Production at Scale, ACM Computing Surveys, 2020.
S. Thrun, Probabilistic Robotics for concepts on filtering and real- time inference, MIT Press, 2005.
H. Zheng, W. Chen, et al., Traffic4Cast: Real-time Traffic Prediction from Spatio-Temporal Data, Proceedings of NeurIPS Traffic4Cast Workshop, 2019.
P. Hallac, S. Leskovec, J. Boyd, Network Lasso: Clustering and Optimization in Large Graphs, SIGKDD 2015.
S. Sarker, H. Hoque, Edge AI for Smart City: Architecture and Applications, IEEE Communications Magazine, 2021.

NCRTCS - 2026 (Volume 14 – Issue 02)

Real-Time Traffic Pattern Prediction using Big Data and IoT Sensors

Real-Time Traffic Pattern Prediction using Big Data and IoT Sensors

Keywords: traffic prediction, spatio-temporal forecasting, IoT sensors, big data, graph neural networks, streaming analytics, real-time systems.

INTRODUCTION

BACKGROUND AND RELATED WORK

Classical Approaches

Machine Learning and Deep Learning

Graph-based Spatio-Temporal Models

DCRNN (Diffusion Convolutional Recurrent Neural Network) using diffusion convolution with RNNs to model traffic flow over graphs [4].

STGCN (Spatio-Temporal Graph Convolutional Network) combining graph convolutions and temporal convolutions [5].

Graph WaveNet and related models that capture adaptive adjacency and long-range dependencies [6].

Transformer and Attention Models

Big Data and Streaming Platforms

Summary and Gap

SYSTEM ARCHITECTURE

Design Goals

High-Level Components

METHODOLOGY

Data Sources and Preprocessing

Graph creation

Model Family

DISCUSSION

Real-time Serving Considerations

5. EVALUATION PLAN

Datasets

Metrics

Baselines and Protocol

Experimental Infrastructure

Case Study: Prototype Deployment (Illustrative)

REFERENCES