XGBoost-Based Predictive Framework for High-Performance Concrete Compressive Strength with SHAP-Guided Explainability

doi:https://doi.org/10.5281/zenodo.20124425

Volume 15, Issue 05 (May 2026)

XGBoost-Based Predictive Framework for High-Performance Concrete Compressive Strength with SHAP-Guided Explainability

DOI : https://doi.org/10.5281/zenodo.20124425

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 8
Authors : Dr. Amarsinh B. Landage, Aniket P. Bhusari, Mahivish N. Mirkar, Naaz B. Nimbal
Paper ID : IJERTV15IS050738
Volume & Issue : Volume 15, Issue 05 , May – 2026
Published (First Online): 11-05-2026
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

XGBoost-Based Predictive Framework for High-Performance Concrete Compressive Strength with SHAP-Guided Explainability

Amarsinh B. Landage (1) , Aniket P. Bhusari (2) , Naaz B. Ni (3) , Mahvish N. Mirkar (4)

(1) Assistant Professor, Department of Civil and Infrastructure Engineering, Government College of Engineering, Ratnagiri, 415612, India,

(2,3,4) Research Scholar, Department of Civil and Infrastructure Engineering, Government College of Engineering, Ratnagiri, 415612, India

Abstract – Accurate estimation of concrete compressive strength is central to structural safety, construction quality control, and sustainable material use. Conventional destructive testing mandates curing durations of 7 to 28 days, creating costly delays and obstacles to rapid mix design iteration.

This work presents a machine learning framework to estimate the compressive strength of high-performance concrete (HPC) from eight mix-design parameters and curing age. Six algorithms were evaluated Linear Regression, Support Vector Regression (SVR), Random Forest, Gradient Boosting, eXtreme Gradient Boosting (XGBoost), and Deep Neural Networks (DNN) on 1,030 experimental specimens from Yehs (1998) benchmark dataset. XGBoost consistently outperformed all alternatives on R², RMSE, and MAE. SHAP interpretability analysis was applied to identify dominant features and validate established concrete chemistry principles, including Abrams Law.

The final model was deployed as a cloud-hosted web application returning predictions in under 100 ms, providing a practical non-destructive alternative for real-time construction site decision-making.

Keywords Compressive Strength Prediction, XGBoost, Gradient Boosting, SHAP Analysis, High-Performance Concrete, Mix Design Optimization, Feature Importance.

INTRODUCTION

Concrete is the most widely used structural material worldwide, with applications spanning foundations, pavements, bridges, and high-rise buildings. Its compressive strength is the principal design parameter governing load-bearing performance and must be reliably quantified for structural safety and economic efficiency. Conventional strength determination relies on destructive compression testing of specimens cured over defined intervals, typically 7, 14, or 28 days. While scientifically sound, this process delays formwork removal, increases testing expenditure, and cannot assess concrete already embedded in a completed structure.

Machine learning (ML) offers an effective alternative by learning complex, high-dimensional nonlinear relationships between mix parameters and resulting strength from experimental data, enabling real-time prediction without physical specimen preparation. However, the opacity of ensemble methods has restricted adoption in safety-critical civil engineering contexts. This study addresses both concerns by integrating a systematic multi-model comparison with SHAP-based interpretability, producing a transparent,

accurate, and practically deployable strength prediction framework for HPC.

DATASET AND EXPLORATORY ANALYSIS

The study used Yehs (1998) benchmark HPC dataset [6], available from the UCI Machine Learning Repository, comprising 1,030 records spanning diverse concrete mix formulations. Eight continuous input features and one continuous output variable (compressive strength) describe each record. Table I summarizes the dataset statistics.

TABLE I. DESCRIPTIVE STATISTICS OF DATASET FEATURES [6]

Feature	Min	Max	Mean	Std	Unit
Cement	102.0	540.0	281.2	104.5	kg/m³
BF Slag	0.0	359.4	73.9	86.2	kg/m³
Fly Ash	0.0	200.1	54.2	63.9	kg/m³
Water	121.8	247.0	181.6	21.4	kg/m³
Superplasticizer	0.0	32.2	6.2	8.3	kg/m³
Coarse Aggregate	801.0	1145.0	972.9	77.8	kg/m³
Fine Aggregate	594.0	992.6	773.6	80.9	kg/m³
Curing Age	1.0	365.0	45.7	63.4	days
Comp. Strength	2.33	82.60	35.3	16.8	MPa

The feature set captures all principal physicochemical parameters affecting concrete performance. Cement (102540 kg/m³) is the primary binder; blast furnace slag and fly ash act as supplementary cementitious materials. Water content (121.8247.0 kg/m³) governs hydration and capillary porosity. Superplasticizer enables workability at reduced water-to-cement (w/c) ratios. Coarse and fine aggregates form the inert skeletal volume. Curing age (1365 days) reflects temporal strength gain. No missing values were identified. Correlation analysis confirmed expected trends: cement content and curing age correlated positively with strength; water content showed a clear inverse relationship aligned with Abrams Law. No physically unrealistic outliers were detected.

METHODOLOGY
1. Data Preprocessing and Feature Engineering
  
  The dataset was partitioned into training/validation (85%, n = 875) and held-out test (15%, n = 155) subsets. Z-score standardization (z = (x )/) via StandardScaler was
  
  applied exclusively to training data to prevent leakage. An engineered water-to-cement (w/c) ratio feature was added to explicitly encode Abrams Law. Pairwise multicollinearity checks confirmed no feature pair exceeded |r| > 0.95, so no dimensionality reduction was required. Five-fold stratified cross-validation with GridSearchCV was used for hyperparameter tuning, maximizing cross-validation R² as the selection criterion.
2. Machine Learning Model Implementation
  
  Six algorithms spanning the complexity spectrum were assessed. Ordinary Least Squares (OLS) regression provided a linear baseline. SVR with RBF kernel was optimized over C {0.1, 1, 10, 100} and {0.001, 0.01, 0.1, scale}.
  
  Random Forest used bootstrap aggregation over n_estimators
  
  {100, 200, 500} and max_depth {10, 15, 20}. Gradient Boosting used n_estimators {100, 200}, learning_rate
  
  {0.05, 0.1}, max_depth {3, 5}. XGBoost extended gradient boosting with second-order Taylor loss approximation, L1/L2 regularization, and column subsampling (max_depth
  
  {3, 5, 7}; learning_rate {0.01, 0.05, 0.1}). A DNN with three fully connected layers (1286432 neurons), ReLU activation, dropout, and early stopping completed the comparison. Model ranking used a composite score: R² (40%), RMSE (40%), MAE (10%), CV standard deviation
  
  (10%).
3. SHAP Interpretability Framework
  
  To lift the black-box limitation of the optimal ensemble model, SHAP (SHapley Additive exPlanations) values were computed using cooperative game theory (Shapley, 1953). Each input feature receives a Shapley value representing is exact marginal contribution to a given prediction, averaged over all possible feature permutation subsets. SHAP satisfies local accuracy, consistency, and missingness, qualifying it as the gold standard for ensemble interpretability. Summary plots visualize contribution magnitude and direction across all test samples, with color encoding representing feature value magnitude (red = high, blue = low).
4. Prediction System Architecture
The operational system consists of three layers: a User Input Interface with out-of-range validation; a Processing Core performing full preprocessing and XGBoost inference; and an Output Layer returning predicted strength (MPa) with bootstrap-estimated 95% confidence intervals. Implemented in Python using Streamlit, the trained model is serialized via Pickle and exposed through /predict and /model_info REST API endpoints.
RESULTS AND DISCUSSION
1. Comparative Model Performance
  
  Table II presents performance metrics for all six models on the held-out test set, ranked by descending R². Results confirm that predictive accuracy scales with algorithmic capacity to capture nonlinear physicochemical relationships.
  
  Model
  
  R²
  
  RMSE (MPa)
  
  MAE (MPa)
  
  Time (s)
  
  XGBoost
  
  0.9099
  
  4.7499
  
  3.3963
  
  0.81
  
  Grad. Boosting
  
  0.9079
  
  4.8005
  
  3.6269
  
  0.61
  
  TABLE II. COMPARATIVE PERFORMANCE METRICS ON HELD-OUT TEST SET
  
  Random Forest
  
  0.8807
  
  5.4648
  
  3.9008
  
  1.53
  
  DNN
  
  0.8550
  
  6.0238
  
  4.0533
  
  4.02
  
  SVR
  
  0.8354
  
  6.4185
  
  4.4472
  
  0.09
  
  Linear Regression
  
  0.6041
  
  9.9547
  
  7.8809
  
  0.03
  
  XGBoost achieved the highest R² of 0.9099, accounting for approximately 91% of total strength variance, with the lowest RMSE (4.7499 MPa) and MAE (3.3963 MPa). Its marginal but consistent advantage over Gradient Boosting (R² = 0.0020, RMSE = 0.051 MPa) reflects the added benefit of second-order gradient optimization, built-in regularization, and feature subsampling. The linear baseline yielded an RMSE more than double that of XGBoost (9.9547 MPa), confirming that HPC strength development is governed by nonlinear interactions structurally incompatible with linear statistical models.
  
  Actual vs. Predicted Compressive Strength Best Model: XGBoost (R²: 0.9099)
  
  100
  
  80
  
  60
  
  40
  
  20
  
  0
  
  0
  
  20
  
  40
  
  60
  
  80
  
  -20
  
  Actual Compressive Strength (MPa)
  
  Predictions Linear (Predictions)
  
  Predicted Compressive Strength (MPa)
  
  Fig. 1. Actual vs. Predicted Compressive Strength (XGBoost, R²
  
  = 0.9099)
  
  Scatter analysis of XGBoost predictions revealed a tight cluster around the perfect prediction line (y = x) across the full operational range of 2.3382.60 MPa, with no systematic directional bias and normally distributed residuals centered at zero, indicating well-calibrated model performance. A slight increase in prediction uncertainty was observed above 70 MPa, attributable to the relatively sparse representation of ultra-high-strength specimens in the training data.
2. SHAP Feature Importance and Physical Validation
  
  Table III presents the SHAP-derived feature importance rankings alongside material science interpretations, demonstrating that XGBoost independently recovered well-established concrete chemistry principles from experimental data without any domain knowledge encoded in the model architecture.
  
  Feature Importance (Mean |SHAP|)
  
  Fly Ash (component 3)(kg in a m^3 mixture)
  
  Coarse Aggregate (component 6)(kg in a m^3 mixture)
  
  Fine Aggregate (component 7)(kg in a m^3 mixture)
  
  Superplasticizer (component 5)(kg in a m^3 mixture)
  
  Blast Furnace Slag (component 2)(kg in a m^3 mixture)
  
  Water (component 4)(kg
  
  in a m^3 mixture)
  
  Cement (component 1)(kg in a m^3 mixture)
  
  Age (day)
  
  0 2 4 6 8 10
  
  Features
  
  Mean Absolute SHAP Value

TABLE III. SHAP FEATURE IMPORTANCE RANKINGS

Rank	Feature	Material Science Significance
1	Curing Age	Primary +ve driver; C-S-H gel accumulation, capillary porosity reduction
2	Cement Content	Strong +ve; governs total hydration potential and C-S-H volume
3	Water Content	Dominant ve; validates Abrams Law excess water forms load-reducing capillary pores
4	BF Slag	Moderate +ve; latent hydraulic reactivity beyond 28 days
5	Superplasticizer	Indirect enabler; reduces w/c at target workability
6	Fly Ash	Low-moderate; pozzolanic contribution at extended ages
78	Aggregates	Lowest; inert volumetric fillers, secondary role

Mean Absolute SHAP Value

Fig. 2. SHAP Feature Importance (Mean |SHAP| values)

Curing age ranked first, reflecting self-limiting diffusion-controlled C-S-H gel kinetics that progressively densify paste microstructure over time. Cement content ranked second, with the model correctly identifying diminishing strength returns at high dosages from heat of hydration and shrinkage effects. Water content produced the most physically significant negative SHAP values, independently validating Abrams Law (1918). Excess water beyond hydration requirements (w/c 0.25) occupies paste volume, evaporating to form capillary pores that reduce the

effective load-bearing cross-section and create crack initiation pathways. This autonomous rediscovery of a century-old empirical law from unlabeled patterns provides compelling evidence that model predictions are physically grounded rather than artifacts of overfitting. Blast furnace slag, superplasticizer, and fly ash ranked as secondary contributors consistent with their delayed or indirect physicochemical roles. Aggregate components showed the lowest influence, as expected for inert volumetric fillers.

Engineering Implications and System Demonstration

The alignment between SHAP-attributed importances and established concrete science underpins practical deployability. As a forward predictor for sustainable mix designs incorporating industrial by-products, the system can substantially reduce destructive quality control testing, accelerating project delivery and lowering costs. The model processes queries in under 100 ms on standard cloud infrastructure, enabling seamless integration into real-time construction workflows.

The framework was operationalized as ConcretIQ v2.0, a cloud-hosted Streamlit web application. The interface accepts all eight mix parameters across two input tabs (Binders & Water; Aggregates & Admixtures), computes the w/c ratio in real time with bounds flagging, and presents results as predicted strength in MPa with grade classification per IS 456:2000, a mix composition chart, and a strength development projection from 1 to 365 days on a logarithmic time scale.

Fig. 3 (a) Binders & Water input tab

Fig. 3 (b) Aggregates & Admixtures tab

fundamenally nonlinear: Linear Regressions R² of 0.6041 and RMSE of 9.9547 MPa demonstrate the structural inadequacy of linear statistical approaches. Second, XGBoost delivers the optimal solution for tabular concrete data, with superior generalization from second-order gradient approximation, regularization, and column subsampling at

0.81 s training time. Third, SHAP analysis conclusively establishes scientific grounding: curing age and cement content dominate as primary strength drivers; water content independently validates Abrams Law; supplementary cementitious materials and admixtures are correctly ranked as secondary contributors. Fourth, the deployed ConcretIQ v2.0 framework achieves sub-100 ms inference compatible with real-time construction decision support.

Future directions include integration of IoT sensor inputs (curing temperature, humidity), dataset expansion to geopolymer and recycled aggregate concrete, containerized enterprise deployment (Docker / AWS / Google Cloud), and BIM platform integration for automated structural design workflows.

Fig. 3 (c) Prediction results panel

Fig. 3. ConcretIQ v2.0 Deployed Web Application Interface

A disclaimer note at the base of the results panel explicitly states that predictions are indicative and that laboratory testing remains mandatory for structural design, ensuring responsible AI deployment in safety-critical contexts. All six validation criteria were fulfilled, with sub-100 ms inference latency confirmed on cloud infrastructure.

CONCLUSION

This study developed, validated, and deployed a machine learning framework for HPC compressive strength prediction, jointly addressing predictive accuracy and engineering interpretability. Table IV summarizes the principal quantitative outcomes.

TABLE IV. SUMMARY OF KEY RESEARCH OUTCOMES

Parameter	Quantified Outcome
Optimal Algorithm	XGBoost superior across all three metrics
R² (XGBoost)	0.9099 (91% of total variance explained)
RMSE (XGBoost)	4.7499 MPa (58% of mid-range strength)
MAE (XGBoost)	3.3963 MPa (±5 MPa engineering tolerance)
Linear Baseline RMSE	9.9547 MPa (confirms nonlinear nature)
Primary SHAP Finding	Age & cement dominate; water validates Abrams Law
Inference Latency	Sub-100 ms compatible with real-time site use

Four principal conclusions were drawn. First, HPC strength development is empirically confirmed as

REFERENCES

W. B. Chaabene, M. Flah, and M. L. Nehdi, Machine learning prediction of mechanical properties of concrete: Critical review, Constr. Build. Mater., vol. 260, p. 119889, 2020.
O. O. Omotayo, C. Arum, and C. M. Ikumapayi, Assessment of machine learning methods for concrete compressive strength prediction, J. Soft Comput. Civ. Eng., vol. 8, no. 4, pp. 116140, 2024.
Y. Gamil, Machine learning in concrete technology: A review, Front. Built Environ., vol. 9, p. 1145591, 2023.
P. G. Asteris et al., Predicting concrete compressive strength using hybrid ensembling of surrogate ML models, Cem. Concr. Res., vol. 145, p. 106449, 2021.
Q. Han, C. Gui, J. Xu, and G. Lacidogna, A generalized method to predict HPC compressive strength by improved random forest, Constr. Build. Mater., vol. 226, pp. 734742, 2019.
I. C. Yeh, Modeling of strength of high-performance concrete using artificial neural networks, Cem. Concr. Res., vol. 28, no. 12,

pp. 17971808, 1998.
T. Chen and C. Guestrin, XGBoost: A scalable tree boosting system, in Proc. 22nd ACM SIGKDD, 2016, pp. 785794.
L. Breiman, Random forests, Mach. Learn., vol. 45, pp. 532, 2001.
A. Ahmad et al., Prediction of compressive strength of fly ash concrete using individual and ensemble algorithms, Materials, vol. 14, no. 4, p. 794, 2021.
P. L. Ng and Y. Ding, Machine learning prediction and SHAP analysis of concrete strength, J. Civ. Eng. Urban Planning, vol. 7, no. 2, pp. 122128, 2020.
M. A. DeRousseau, J. R. Kasprzyk, and W. V. Srubar, Computational design optimization of concrete mixtures: A review, Cem. Concr. Res., vol. 109, pp. 4253, 2019.
I. B. Mustapha et al., Comparative analysis of gradient-boosting ensembles for quaternary blend concrete strength, Int. J. Concr. Struct. Mater., vol. 18, no. 1, p. 20, 2024.
L. Tang, Machine learning-based prediction of concrete compressive strength and interpretability analysis, J. Civ. Eng. Urban Planning, vol. 7, no. 2, pp. 122138, 2025.
R. Cook et al., Prediction of concrete compressive strength: Critical comparison of hybrid vs. standalone ML models, J. Mater. Civ. Eng., vol. 31, no. 12, p. 04019255, 2019.
G. A. Lyngdoh et al., Prediction of concrete strengths enabled by missing data imputation and interpretable ML, Cem. Concr. Compos., vol. 128, p. 104414, 2022.

Model	R²	RMSE (MPa)	MAE (MPa)	Time (s)
XGBoost	0.9099	4.7499	3.3963	0.81
Grad. Boosting	0.9079	4.8005	3.6269	0.61

Random Forest	0.8807	5.4648	3.9008	1.53
DNN	0.8550	6.0238	4.0533	4.02
SVR	0.8354	6.4185	4.4472	0.09
Linear Regression	0.6041	9.9547	7.8809	0.03