International Research Platform
Serving Researchers Since 2012

Traffic Volume Prediction using a Hybrid AutoML Framework with Contextual Intelligence

DOI : 10.5281/zenodo.21096306
Download Full-Text PDF Cite this Publication

Text Only Version

Trafc Volume Prediction using a Hybrid AutoML Framework with Contextual Intelligence

Sandeep K, Sudhanbalaji M, Thakshin Kumar T, Sibi Senthil, Praveen G, Mr.Veerakumar S

Department of Computer Science and Engineering PSG College of Technology

Coimbatore, Tamil Nadu, India

Abstract – Urban trafc congestion poses signicant challenges affecting economic productivity, environmental sustainability, and quality of life. This paper presents a comparative study of machine learning and deep learning approaches for hourly trafc volume forecasting. We implement XGBoost with engineered lag features and LSTM networks for sequential pattern learning. The dataset encompasses hourly trafc volumes enriched with tem- poral features, meteorological data, and contextual information. Performance evaluation demonstrates that LSTM achieves MAE of 298.70, RMSE of 455.93, and R² of 0.9121, while XGBoost achieves MAE of 315.54, RMSE of 480.21, and R² of 0.9815. A web application demonstrates practical deployment with real- time prediction and crowd alert capabilities. The study provides insights into trade-offs between model complexity, interpretabil- ity, and predictive performance for intelligent transportation systems.

Index TermsTrafc volume prediction, XGBoost, LSTM, time-series forecasting, intelligent transportation systems, ma- chine learning, deep learning

  1. Introduction

    1. Background and Motivation

      Urban trafc congestion has emerged as a critical challenge in modern cities, causing economic losses, environmental degradation, and reduced quality of life [1]. Traditional infras- tructure expansion proves nancially prohibitive and provides only temporary relief. This has shifted focus toward Intelligent Transportation Systems (ITS) that optimize existing infrastruc- ture through data-driven approaches [2].

      Real-time trafc volume forecasting enables proactive man- agement strategies including dynamic signal timing, variable speed limits, and predictive travel information [3]. Modern cities generate massive trafc data through sensors, cameras, and GPS devices. Combined with contextual information such as weather and events, this data enables sophisticated pattern recognition. Machine Learning (ML) and Deep Learning (DL) have proven exceptionally suited for uncovering complex, non- linear trafc patterns [4].

    2. Problem Statement and Objectives

      Trafc ow is a complex stochastic process inuenced by temporal patterns (daily/weekly cycles), stochastic events (ac- cidents, closures), environmental factors (weather), and socio- economic factors (holidays, events). Traditional time-series models struggle to capture non-linear relationships between these factors.

      This research addresses: To design, implement, and evaluate machine learning and deep learning models for accurate hourly trafc volume prediction, leveraging historical data with temporal and meteorological features, and compare their effectiveness for short-term forecasting.

      Key objectives include: (1) comprehensive data prepro- cessing and feature engineering, (2) implementing XGBoost with lag features, (3) implementing LSTM for sequential learning, (4) rigorous comparative analysis, and (5) developing a practical deployment framework.

  2. Related Work

    Traditional statistical methods including ARIMA and SARIMA assume linear correlations and stationarity, limiting effectiveness with non-stationary trafc data [4]. Machine learning approaches using Random Forests, SVM, and XG- Boost demonstrate strong performance but require careful feature engineering [5].

    Deep learning architectures, particularly LSTM networks, revolutionized sequence modeling through automatic feature learning [6]. Studies combining CNN-BiLSTM with attention achieve notable accuracy under adverse weather [7]. Graph neural networks model spatial-temporal dependencies across road networks [8]. Integration of contextual data (weather, so- cial media, events) signicantly enhances prediction accuracy [9].

    Despite individual model demonstrations, systematic com- parisons with identical preprocessing frameworks remain lim- ited. This work addresses this gap through rigorous com- parative analysis with complete implementation details and practical deployment.

  3. Methodology

    1. System Architecture

      Figure 1 illustrates the pipeline from raw data ingestion to deployment. Key stages include: data ingestion, comprehen- sive preprocessing, feature engineering, chronological train- test splitting (80/20), parallel model training (XGBoost and LSTM), evaluation, and iterative 24-hour forecasting.

    2. Dataset Description

      The dataset contains hourly trafc records with comprehen- sive features:

      Fig. 1. System architecture owchart.

      Temporal: datetime, hour, day, month, weekday, isweekend

      Meteorological: temperature, humidity, windspeed, winddirection, visibility, dewpoint, rainph, snowph, cloudsall

      Contextual: isholiday, airpollutionindex, weathertype, weatherdescription

      Target: trafcvolume (vehicles/hour)

    3. Data Preprocessing Pipeline

      1. Data Cleaning and Feature Extraction: Missing isholiday values were imputed with None category. Times- tamps were converted to datetime objects enabling extraction of explicit temporal features: hour (0-23), day (1-31), month (1-12), weekday (0-6), and isweekend (binary).

        ( )

      2. Cyclical Feature Encoding: Temporal features are cycli- cal: hour 23 is close to hour 0. Sine-cosine transformations preserve cyclical relationships:

        Fig. 2. LSTM stacked architecture with gating mechanisms.

        4) Model-Specic Preparation: XGBoost: 24 lag features were created representing trafc volume at previous hours (t 1 through t 24):

        lagk (t) = trafcvolume(t k), k = 1, …, 24 (3)

        LSTM: Sliding window approach created 3D sequences [samples, timesteps=24, features], where each sequence con- tains 24 hours of features predicting the 25th hour.

    4. Model Architectures

      1. XGBoost Conguration: XGBoost builds sequential de- cision tree ensembles where each tree corrects previous errors. Conguration: nestimators=300, learningrate=0.05, maxdepth=8, subsample=0.8, colsamplebytree=0.8, with early stopping (50 rounds).

      2. LSTM Architecture: LSTM networks use gating mecha- nisms to capture long-range dependencies:

        Forget Gate: ft = (Wf · [ht1, xt]+ bf )

        Input Gate: it = (Wi · [ht1, xt]+ bi)

        Xsin

        = sin 2X , X Xmax

        cos

        = cos 2X (1)

        ( )

        Xmax

        Cell State: Ct = ft 0Ct1 +it 0tanh(WC ·[ht1, xt]+bC)

        Hidden State: ht = ot 0 tanh(Ct)

        where Xmax = 24 for hour and 7 for weekday. This maps features onto unit circles preserving proximity.

      3. Encoding and Scaling: Categorical features (isholiday, weathertype, weatherdescription) were label-encoded. Stan- dardScaler normalized numerical features to zero mean and unit variance:

        Our architecture: LSTM(64 units, returnsequences=True)

        Dropout(0.2) LSTM(32 units) Dense(1). Compiled with Adam optimizer and MSE loss, trained for 20 epochs with batch size 32.

    5. Evaluation Metrics

    Three complementary metrics assess performance:/p>

    x

    z =

    (2)

    MAE: Average absolute deviation

    X

    n

    For LSTM, additional MinMaxScaler compressed values into [0,1] range.

    MAE = 1 |y

    i

    n

    i=1

    yi| (4)

    Fig. 3. XGBoost predictions closely tracking actual trafc patterns.

    RMSE: Penalizes larger errors

    n

    vu 1 X 2

    Fig. 4. 24-hour forecast showing realistic trafc patterns.

    1. Visualization Analysis

    RMSE = t n

    i=1

    (yi yi)

    (5)

    Figure 3 shows XGBoost predictions closely tracking actual values, successfully capturing daily cycles including morning

    R²: Proportion of variance explained

    P

    2

    i=1

    Pn (yi yi)2

    peaks, midday plateaus, and overnight lows. Figure 4 displays

    the 24-hour iterative forecast, exhibiting realistic patterns: low early-morning trafc, morning peak around 06:00, sustained

    R = 1 n

    i=1

    (yi

    (6)

    y¯)2

    daytime volume, and evening decline.

  4. Results and Analysis

    A. Performance Comparison

    Data was split chronologically (80% training, 20% testing). Table I shows comparative results:

    TABLE I

    Model Performance Comparison

    Model

    MAE

    RMSE

    XGBoost

    315.54

    480.21

    0.9815

    LSTM

    298.70

    455.93

    0.9121

    Improvement

    5.3%

    5.1%

    1.1%

    LSTM achieves marginally superior accuracy across all metrics. The MAE of 298.70 indicates typical errors are small relative to peak volumes (3000-5000 vehicles/hour). Both models explain over 91% of variance, demonstrating excellent predictive capability.

    B. Trade-off Analysis

    Beyond raw accuracy, practical considerations include:

    Training Efciency: XGBoost trains 4× faster (3.2 vs 12.7 minutes), enabling rapid iteration.

    Interpretability: XGBoost provides feature importance rankings; LSTM is a black box.

    Deployment: XGBoost has lighter footprint with fewer dependencies.

    Accuracy: LSTMs 5% improvement may justify complex- ity for high-stakes applications.

    Feature importance analysis showed recent lags (lag1 to lag6) and temporal features (hoursin, hourcos) most inu- ential for XGBoost, validating design choices.

    D. Discussion

    Success factors include comprehensive feature engineering (cyclical encoding, meteorological data), appropriate model selection, proper preprocessing, sufcient data volume, and hyperparameter tuning.

    LSTMs advantage stems from automatic feature learn- ing and long-range dependency capture. XGBoosts strengths include training efciency, interpretability, and deployment simplicity. Model selection should consider accuracy require- ments, computational constraints, interpretability needs, and deployment complexity.

  5. Web Application Implementation

    To demonstrate practical deployment, a web application was developed with three-tier architecture: HTML/CSS/JavaScript frontend, Flask REST API backend, and model serving layer.

    1. API Endpoints

      /api/trafc-lags (GET): Retrieves last 24 trafc volumes and current environmental data (weather, AQI, holiday status) for prediction context.

      /predict (POST): Generates predictions for specied times- tamps. Inputs include datetime, isholiday, 24 lag values, and weather parameters. Processing includes temporal feature extraction, cyclical encoding, scaling, and model inference. Outputs predicted trafc volume with condence interval.

      /api/crowd-alert (GET): Detects nearby crowd events us- ing NewsData.io API. Fetches recent news, extracts locations, calculates distances, lters events within 50km radius, and generates alerts for potential trafc disruption.

      Fig. 5. Web application dashboard with real-time predictions and alerts.

    2. User Interface

    The interface provides: (1) Dashboard with real-time trafc display and 24-hour forecast, (2) Prediction form for cus- tom scenarios, (3) Historical analysis visualizing past pre- dictions versus actual trafc, and (4) Alert panel displaying crowd/event warnings with map integration (Figure 5).

    Production deployment requires containerization, API au- thentication, rate limiting, monitoring, and integration with live sensor feeds.

  6. Conclusion and Future Work

This research presented a comprehensive comparative study of XGBoost and LSTM for trafc volume prediction. Both models achieve excellent performance: LSTM with 5% lower error rates through automatic temporal learning, and XGBoost with competitive accuracy plus advantages in efciency, inter- pretability, and deployment simplicity.

Key ndings: (1) Comprehensive feature engineering is crit- ical for both approaches, (2) LSTM provides marginal accu- racy improvements at higher computational cost, (3) XGBoost remains highly competitive with careful feature design, (4) Model selection should consider full operational requirements beyond accuracy metrics, and (5) Practical deployment is feasible with appropriate system architecture.

Limitations include single-location focus without spatial correlations, missing unpredictable event data, static model assumptions, and limited generalizability assessment.

Future directions include: spatial-temporal graph neural networks modeling trafc propagation, attention mechanisms for interpretable temporal focus, transfer learning for cross-city adaptation, online learning for continuous updates, uncertainty quantication with condence intervals, integration with trafc control systems, and multimodal data fusion incorporating social media and mobile data.

Accurate trafc prediction enables environmental benets (reduced emissions), economic gains (decreased travel times), improved safety, and enhanced quality of life. Responsible deployment requires attention to privacy, fairness, and system resilience considerations.

Acknowledgment

We thank Dr. K Prakasan, Principal, Dr. G Sudha Sadasi- vam, Head of Department, Mr. VeeraKumar S, Faculty Guide, Dr. Arul Anand N, Program Coordinator, and Dr. Anisha C D, our tutor, for their invaluable support throughout this research.

References

  1. A. Kashyap et al., Trafc ow prediction models: A review of deep learning techniques, Cogent Engineering, vol. 9, no. 1, 2022.

  2. A. Sayed and S. Al-Ghamdi, Articial intelligence-based trafc ow prediction for smart cities, Journal of Engineering and Applied Science, vol. 70, no. 1, 2023.

  3. M. Yuan and Y. Li, A survey of trafc prediction: From spatio-temporal data to intelligent transportation, Data Science and Engineering, vol. 6, no. 1, 2021.

  4. B. Peng, L. Zhao, and T. Wang, An overview based on the overall architecture of trafc forecasting, Data Science and Engineering, vol. 9, no. 2, 2024.

  5. Y. Bai et al., AST-GCN: Attribute-augmented spatiotemporal graph convolutional network for trafc forecasting, arXiv:2011.11004, 2020.

  6. S. Rahman and N. Kumar, Trafc ow prediction using machine learn- ing techniques: A systematic literature review, Sustainable Computing: Informatics and Systems, vol. 41, 2025.

  7. Forecasting freeway trafc volumes with adverse weather via a CNN- BiLSTM-attention model, J. Transportation Engineering, Part A: Sys- tems, 2025.

  8. Trafc volume prediction: A fusion deep learning model consideing spatialtemporal correlation, Sustainability, vol. 13, no. 19, 2021.

  9. From Twitter to trafc predictor: Next-day morning trafc prediction using social media data, Transportation Research Part C, 2021.