Implementation of a Demand Forecasting System Using LSTM, XGBoost, and Temporal Fusion Transformer with a Distributed Data Pipeline

doi:https://doi.org/10.5281/zenodo.19878506

Volume 15, Issue 04 (April 2026)

Implementation of a Demand Forecasting System Using LSTM, XGBoost, and Temporal Fusion Transformer with a Distributed Data Pipeline

DOI : https://doi.org/10.5281/zenodo.19878506

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 6
Authors : Yashraj Umesh Panhalkar, Harsha Peshave, Naveed Malik, Prof. Pranali Navghare
Paper ID : IJERTV15IS042386
Volume & Issue : Volume 15, Issue 04 , April – 2026
Published (First Online): 29-04-2026
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Implementation of a Demand Forecasting System Using LSTM, XGBoost, and Temporal Fusion Transformer with a Distributed Data Pipeline

Yashraj Umesh Panhalkar, Harsha Peshave, Naveed Malik

Department of Computer Science SCTRs Pune Institute of Computer Technology Pune, Maharashtra, India

Prof. Pranali Navghare

Department of Computer Science SCTRs Pune Institute of Computer Technology Pune, Maharashtra, India

Abstract – This paper presents the end-to-end implementation of a demand forecasting system for a regional FMCG distributor

in Maharashtra, India. The system comprises a Node.js/Express REST API backed by MongoDB, a Python Flask ML sidecar, and a React dashboard. Three machine learning modelsLong Short-Term Memory (LSTM), XGBoost, and Temporal Fusion Transformer (TFT)were trained on real SKU-level quarterly sales data. The ML pipeline covers outlier clipping, categorical encoding, lag feature construction, festival-proximity scoring, and chronological train/validation/test splitting. On the held-out test set, LSTM achieved RMSE = 4,457.49 and MAPE = 18.34%; XGBoost achieved RMSE = 3,526.88 and MAPE = 14.87%; TFT produced RMSE = 724.44, MAE = 381.67, MAPE = 6.32%, and

WAPE = 4.50%. The backend exposes authenticated REST end-points for CSV ingestion, feature preprocessing, model dispatch, and forecast retrieval. This paper documents each implementa-tion layerrepository structure, data pipeline, model design, API routes, and deployment strategyas a reproducible blueprint for distributor-scale forecasting.

Index Terms – Demand forecasting, LSTM, XGBoost, Tempo-ral Fusion Transformer, Node.js, Express, MongoDB, REST API,

festival seasonality, FMCG, inventory optimization

Introduction

Regional FMCG distributors in India record every invoice but rarely use that data for forward planning. Replenishment decisions follow manufacturer targets rather than actual sell-through, causing warehouse build-up at quarter ends and stock-outs during festival peaks [1]. Invoice records contain enough signal to build accurate SKU-level quarterly forecasts, but realizing that potential requires a complete, deployable path from raw CSV to a running inference service.

This paper documents that path as built and deployed. The system is organized into four concrete layers: (1) a Node.js/Express REST API with JWT authentication and MongoDB persistence; (2) a Python preprocessing pipeline producing lag features, rolling statistics, and a continuous festival-proximity score; (3) three fully trained and serialized models served from a Flask sidecar; and (4) a React dashboard for SKU selection and forecast visualization.
1. Why Three Models
  
  LSTM captures sequential dependencies across the 4-quarter lookback window without hand-crafted interaction
  
  terms. XGBoost exploits the explicit lag and festival columns through regularized tree splits and achieves sub-5 ms inference. TFT routes known future inputsfestival calendar, quarter indicatorsthrough a dedicated encoder, enabling multi-head self-attention over prior-year festival quarters. All three share the same feature set, so accuracy differences are attributable to architecture alone.
2. Contributions
Three aspects distinguish this work. First, the dataset is operational rather than synthetic, carrying real-world noise, return transactions, and irregular festival timing. Second, fes-tival proximity is encoded as a continuous score rather than a binary ag, capturing the gradual pre-festival demand ramp-up. Third, the full systempreprocessing, training, inference, and APIis documented as a reproducible implementation blueprint at the code and schema level.
Related Work

ARIMA and Holt-Winters exponential smoothing [2] have anchored supply chain forecasting for decades. Both assume linear demand structure and are ill-suited to the compound seasonality introduced by Indian festivals overlapping quar-terly cycles.

XGBoost [3] shifted the standard for tabular prediction. Its regularized tree objective and second-order gradient ap-proximation achieve competitive accuracy on modest-volume datasets typical of distributor scale. Grinsztajn et al. conrmed that tree-based models consistently match or exceed deep learning on tabular tasks [4], motivating XGBoost as the primary baseline here.

LSTMs [5] address vanishing gradients through gated cell states. Applied globally across SKU groups, LSTM has shown competitive retail forecasting accuracy [6]. The limitation for quarterly data is that year-ago festival signals must be compressed into a xed-size hidden state rather than attended to directly.

The Temporal Fusion Transformer [7] resolves this by combining LSTM encoding with multi-head self-attention and a dedicated future-input pathway. Probabilistic extensions such

as DeepAR [8] motivate quantile outputs for safety stock sizing. Retail-specic evaluations [9] conrm that model se-lection requires empirical comparison, directly motivating the three-model design.
Repository and System Architecture

A. Repository Layout
Data Pipeline and Feature Engineering

A. Dataset

Sales records from a Maharashtra-based packaged goods distributor span approximately three scal years at quar-terly granularity. Each record carries: SKU identier, product category, manufacturer, geographic zone, quarter label, sold quantity, and invoice value. After deduplication, SKUs with

The backend repository Demand-Forecasting-Backend fewer than six quarters of history are dropped, retaining several

follows a strict separation of concerns. index.js bootstraps Express, registers middleware, and mounts routers. config/ holds the MongoDB connection module and serialized encoder/scaler parameters (encoders.json, scalers.json) written during training and reloaded at inference time. middleware/ provides JWT authentication (auth.js), centralized error handling, and rate limiting. routes/ denes the seven REST endpoints. schema/ contains Mongoose document models. utils/ holds the Python preprocessing pipeline and festival calendar. ml_sidecar/ contains the Flask application, ModelFactory class, and three training scripts. models/ stores serialized weights (lstm_model.p, xgb_model.json, tft_checkpoint/). files/ stores uploaded CSVs.

B. End-to-End Architecture

Fig. 1 shows the full system. A React dashboard commu-nicates with the Node.js/Express API over HTTPS. The API authenticates requests via JWT middleware, stores records in MongoDB, and dispatches inference to the Python ML sidecar over a local HTTP call. The sidecar wraps all three trained models behind a single /predict endpoint and returns point estimates and prediction intervals as JSON.

hundred SKUs with at least two full annual cycles including multiple Diwali and Holi quarters.

B. Preprocessing Pipeline

Fig. 2 illustrates the ve-stage preprocessing pipeline im-plemented in utils/preprocessor.py. All tting is performed exclusively on training rows; the resulting arte-facts (encoders.json, scalers.json) are serialized to config/ and reloaded identically at inference time to guarantee feature-space parity.

1 · Outlier Clipping

Per-SKU IQR clip (1st99th pctl)

2 · Categorical Encoding

config/ encoders.json

OHE: category, manufac-turer LabelEnc: zone

3 · Min-Max Normalisation

Raw CSV upload

config/ scalers.son

Per-SKU scale tted on train split only

4 · Temporal Fea-ture Construction

utils/ festivalCalendar.py

Lag1-4 · Rolling MA/CV ·

qsin /qcos · Festival prox.

5 · Chronological Split

70% train / 15% val / 15% test · no shufe

Feature matrices model training & inference

Web Dashboard

React · SKU picker · Forecast charts

HTTPS REST

Node.js / Express API

index.js · /routes · JWT · CORS

Middleware

auth.js · errorHandler.js · rateLimiter.js

Feature Pipeline

utils/ · lag · festival proxJ·SOnoNrmresponse

Python ML Sidecar (Flask)

ModelFactory · /predict

· quantile outputs

XGBoost

.json

Forecast Response (JSON)

point · P10 · P90 · MAPE

· WAPE per SKU/quarter

TFT

PyTorch Lightning

LSTM

Keras · .p

File Store

les/ · CSV

MongoDB

schema/ · ODM

Fig. 1. End-to-end system architecture. The Node.js API authenticates re-quests, delegates feature preparation to the Python utility layer, and dispatches inference to the ML sidecar. All three models share a single /predict interface; MongoDB stores SKU metadata and forecast history.

Fig. 2. Five-stage preprocessing pipeline (utils/preprocessor.py). Orange dashed arrows indicate serialised artefacts written during training and reloaded at inference time to guarantee feature-space parity.

Outlier clipping suppresses probable entry errors by clamp-ing per-SKU quantities to the 1st99th percentile range while preserving legitimate demand spikes. Categorical encoding applies one-hot encoding to category and manufacturer and label encoding to zone; tted encoders are serialized

and reloaded at inference time. Normalization applies min-max scaling per SKU using training-only statistics stored in config/scalers.json. Temporal features include four lag columns, a 4-quarter trailing moving average, quarter-over-quarter growth rate, trailing coefcient of variation, and

consecutive quarters constructed in strict left-to-right temporal order. The model is serialized to models/lstm_model.p and loaded at sidecar startup. The gated recurrence is governed by:

cyclical quarter encoding as sine/cosine pairs:

ft = (Wf [ht1, xt]+ bf )

(3)

i = (W [h ,x ]+ b ) (4)

sin

4

cos

4

q = sin 2q , q = cos 2q , q {1, 2, 3, 4} (1)

t i t1 t i

ct = ft 0 ct1 + it 0 tanh(Wc[ht1, xt]+ bc)

(5)

Chronological splitting partitions data 70/15/15 by time index; random shufing is explicitly prohibited to prevent look-ahead contamination.

C. Festival Proximity Score

The festival proximity score is the key design choice over a binary ag:

ht = (Wo[ht1, xt]+ bo) 0 tanh(ct) (6)

B. XGBoost

XGBoost ts an additive ensemble of regression trees on pseudo-residuals of the current ensemble [3]. The regularized objective is:

prox

(q) =

[qs, qe] [fs45d, fe+45d]

(2)

2

L = (yi, yi)+ Tk + 1 wk 2 (7)

i

k

|[qs, qe]|

A quarter entirely containing a festival window receives a score of 1.0; adjacent quarters with partial over-lap receive proportional scores, capturing the gradual pre-festival demand build-up that a binary ag misses. utils/festivalCalendar.py encodes Diwali, Holi, and Navratri dates for the three-year study period. For TFT, festival ags and quarter indicators are designated as known future inputs and routed through TFTs dedicated future en-coder. For LSTM they are appended to the input sequence vector; for XGBoost they appear as standard columns.

D. Engineered Feature Set

Table I summarizes the seven feature groups shared across all three models.

TABLE I

Engineered Feature Set (shared across all models)

Group Features

Conguration: 1,000 estimators, max depth 7, learning rate 0.05, subsample 0.8, column subsample 0.8, = 1.0. Early stopping (patience = 50) on validation RMSE prevents overtting; nal hyperparameters were selected via 5-fold cross-validation on the training set. The model serializes to models/xgb_model.json and loads in under one second.

C. Temporal Fusion Transformer

TFT [7] handles three input streams simultaneously: static covariates (xed SKU attributes), known future inputs (fes-tival ags, quarter indicatorsavailable for all forecast steps at call time), and past observed inputs (historical demand and derived features). Variable Selection Networks built from Gated Residual Networks produce instance-wise soft attention weights over each stream:

GRN(a, c) = LayerNorm(a + GLU(W1 ELU(W2a + W3c)))

(8)

Separate LSTM encoders process historical and future se-quences; multi-head self-attention (H = 4 heads) then at-

Lag demand Rolling stats

Cyclical time Festival ag

qtyt1, qtyt2, qtyt3, qtyt4

4-qtr trailing MA; QoQ growth; trailing

CV

qsin, qcos; Q1Q4 dummies

Binary: 1 if quarter overlaps

tends directly to relevant prior quarters without hidden-state compression. TFT is trained with quantile (pinball) loss over Q = {0.1, 0.5, 0.9}:

q(y yq) y yq

Festival prox.

Diwali/Holi/Navratri

Continuous: fraction of quarter in 45-day festival band

Lq =

qQ

(1 q)(yq y) y < yq

(9)

Context aggs Category- and manufacturer-level aggre-

gate demand

SKU metadata Encoded category, manufacturer, zone

Model Implementation

A. LSTM

Two stacked LSTM layers (128 units, then 64 units) with inter-layer dropout of 0.3 receive a 4-quarter lookback window. A linear dense output head produces the scalar point forecast. Training uses Adam (lr = 103) on MSE loss for up to 100 epochs with early stopping (patience = 10) on validation MSE,

batch size 32. Each training example is a sliding window of 4

Point metrics are evaluated against y0.5; y0.1 and y0.9 are

returned as prediction interval bounds for direct use in safety stock calculations. Training conguration: embedding dimen-sion 64, 4 attention heads, 2 transformer layers, Adam (lr =

103), gradient clipping at norm 1.0, batch size 64, up to 80

epochs with early stopping (patience = 15). Weights are saved

to models/tft_checkpoint/.

Backend API Implementation

A. API Design

The Node.js/Express backend exposes seven REST end-points (Table II). All endpoints except /api/auth/login require a valid JWT in the Authorization header. The

JWT middleware in middleware/auth.js veries the token signature and attaches the distributor identity to the request context before any route handler executes.

TABLE II

REST API ENDPOINTS

Method Endpoint Purpose

average planning error. MAPE expresses scale-independent percentage error for cross-SKU comparison. WAPEthe ratio of total absolute error to total actual demandis preferred for aggregate inventory planning as it is robust to near-zero denominators on low-volume SKUs. R2 measures the proportion of demand variance explained.

B. Performance Comparison

POST /api/auth/login POST /api/upload/csv GET /api/sku/list POST /api/forecast/run GET /api/forecast/:id

Authenticate; return JWT Ingest raw sales CSV List distributor SKUs Trigger model inference Retrieve saved forecast

TABLE III

Model Performance on Held-Out Test Set

Model RMSE MAE R2 MAPE % WAPE %

POST	/api/admin/retrain	Trigger model retraining	LSTM	4,457	2,304	0.870	18.34	13.76
			XGBoost	3,527	1,978	0.910	14.87		11.23
			TFT	724	382	0.997	6.32	4.50

GET /api/forecast/compare Side-by-side model metrics

B. Data Flow

On a POST /api/forecast/run request the API: (1) veries the JWT; (2) fetches ordered SKU history from MongoDB; (3) rejects requests with fewer than six quarters of history; (4) forwards the payload to the Python ML sidecar via an internal HTTP call; (5) persists the returned forecast using an upsert keyed on (skuId, model, quarter) to ensure idempotent re-runs; and (6) returns the saved document as JSON. CSV uploads land in files/ and are parsed with csv-parser. Each row is upserted into the SalesRecord collection keyed on (skuId, quarter), so re-uploads are safe and additive.

C. Mongoose Schemas

The ForecastResult schema persists model outputs with full provenance. Key elds include: skuId, model (enum: lstm / xgboost / tft), quarter, pointForecast, lowerBound (P10), upperBound (P90), actualDemand (backlled post-quarter for MAPE tracking), and createdAt. A compound unique index on (skuId, quarter, model) enforces one forecast record per SKUquartermodel combination.

D. Python ML Sidecar

The Flask sidecar exposes /predict and /reload end-points. A ModelFactory class loads each model once at startup and caches it in memory, avoiding repeated disk reads per request. At inference time the factory se-lects the requested model, applies the feature pipeline from utils/preprocessor.py, runs inference, and returns a JSON payload containing point, p10, p90, and quarter. XGBoost and LSTM return point estimates; TFT returns all three quantiles. The /reload endpoint enables zero-downtime model hot-swap after retraining.

Results and Analysis

A. Evaluation Metrics

Five metrics are computed on the chronologically held-out test partition. RMSE penalizes large errors and reects sensi-tivity to demand spike mispredictions. MAE gives a unit-level

Fig. 3 visualizes RMSE, MAPE, and WAPE across the three models, making the TFT improvement immediately apparent.

MAPE (%) WAPE (%) RMSE (×102)

Value (% or normalized RMSE)

20

15

10

5

0

LSTM XGBoost TFT

Fig. 3. Model performance comparison on the held-out test set. MAPE and WAPE are shown as percentages; RMSE is scaled by ×102 for axis compatibility. Lower is better on all three metrics. TFT achieves the largest gains through its dedicated future-input encoder and multi-head self-attention.

C. Analysis

LSTM achieves R2 = 0.87, establishing a viable sequential baseline. Festival-quarter errors are disproportionately large: the hidden state must compress context from four or more quarters into a xed-size vector, diluting prior-year festival signals.

XGBoost improves R2 to 0.91. Post-training feature im-

portance by split gain identies qtyt1, festival proximity, and trailing MA as the three highest-ranked inputs, jointly accounting for over 50% of total gain. The tree structure exploits these directly without compression.

TFT achieves R2 = 0.997 and WAPE = 4.50%. Two struc-tural mechanisms drive the gap. First, festival ags and quarter indicators pass through the dedicated known-future-input encodera pathway absent in both LSTM and XGBoost. Second, multi-head self-attention allows the decoder to weight the prior-year Diwali quarter directly rather than relying on hidden-state propagation. The WAPE of 4.50% is consistent

across both festival and non-festival quarters, conrming ac-curacy does not degrade during the highest-stakes planning periods.

Operationally, a WAPE of 4.50% means a distributor cal-ibrating safety stock to TFT-level uncertainty holds roughly one-third the buffer required under LSTM-level uncertainty at equivalent ll ratesa material reduction in tied-up working capital.
Deployment
1. Latency
  
  XGBoost inference completes in under 5 ms per SKU due to the serialized JSON booster and vectorized tree evalua-tion. LSTM requires approximately 20 ms due to sequence construction and Keras overhead. TFT requires 80120 ms for the attention computation, acceptable for a quarterly planning workow where sub-two-second responses sufce.
2. Retraining Workow
  
  The /api/admin/retrain endpoint triggers the rele-vant Python training script, archives prior weights to a ver-sioned directory, and signals the sidecar to hot-swap the new weights via /reload. The double-buffer approach ensures the old weights continue serving requests until new weights are fully loaded and validated, achieving zero-downtime model updates.
3. Limitations and Future Work
The system is validated on one distributor in one re-gion; generalization requires retraining on new data. Exter-nal demand driverscompetitor pricing, supply shortages, weatherare absent from the feature set. Distribution shift detection via rolling MAPE monitoring on newly observed actuals is planned as the next operational improvement. Con-tainerizing both services via Docker Compose is the recom-mended next deployment step.
Conclusion

This paper described the end-to-end implementation of a demand forecasting system for an Indian FMCG distributor. Three models were fully implemented and evaluated: LSTM as a sequential recurrent baseline (MAPE = 18.34%), XGBoost exploiting explicit lag and festival features (MAPE = 14.87%), and TFT leveraging known-future-input encoding and multi-head self-attention (MAPE = 6.32%, WAPE = 4.50%). TFTs advantage stems from its dedicated future-input encoder and attention-based direct access to relevant prior-year quarters. The Node.js/Express/MongoDB backend provides authenti-cated REST endpoints for the full data lifecycle, and the Python ML sidecar wraps all three models behind a unied prediction interface with quantile outputs for safety stock siz-ing. Together these components form a reproducible, deploy-able forecasting system that replaces intuition-based inventory ordering with data-backed quarterly demand estimates.

Acknowledgment

The authors thank Prof. Pranali Navghare for guidance and the Department of Computer Science at SCTRs Pune Institute of Computer Technology for institutional support.

References

A. A. Syntetos, Z. Babai, and J. E. Boylan, Supply chain forecasting in volatile environments, European Journal of Operational Research, vol. 299, no. 3, pp. 817835, 2022.
R. J. Hyndman and G. Athanasopoulos, Forecasting: Principles and Practice, 3rd ed. OTexts, 2021.
T. Chen and C. Guestrin, XGBoost: Scalable and accurate gradient boosting, ACM Trans. Intell. Syst. Technol., vol. 14, no. 1, pp. 126, 2023.
L. Grinsztajn, E. Oyallon, and G. Varoquaux, Why tree-based models still outperform deep learning on tabular data, in Proc. NeurIPS, vol. 35,

pp. 507520, 2022.
S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural Computation, vol. 9, no. 8, pp. 17351780, 1997.
K. Bandara, C. Bergmeir, and H. Hewamalage, LSTM-MSNet: Lever-aging forecasts on sets of related time series, IEEE Trans. Neural Netw. Learn. Syst., vol. 33, no. 4, pp. 15861599, 2022.
B. Lim, S. O. Arik, N. Loeff, and T. Pster, Temporal fusion transform-ers for interpretable multi-horizon time series forecasting, International Journal of Forecasting, vol. 37, no. 4, pp. 17481764, 2021.
D. Salinas, V. Flunkert, J. Gasthaus, and T. Januschowski, DeepAR: Probabilistic forecasting with autoregressive recurrent networks, Inter-national Journal of Forecasting, vol.36, no. 3, pp. 11811191, 2020.
R. Fildes, S. Ma, and S. Kolassa, Retail forecasting: Research and practice, International Journal of Forecasting, vol. 38, no. 4, pp. 12831318, 2022.

Implementation of a Demand Forecasting System Using LSTM, XGBoost, and Temporal Fusion Transformer with a Distributed Data Pipeline

1 · Outlier Clipping

3 · Min-Max Normalisation

4 · Temporal Fea-ture Construction

5 · Chronological Split