Neuro-Symbolic Alpha: A Reproducible Hybrid Framework for Interpretable Stock Selection

doi:10.5281/zenodo.19608074

Volume 15, Issue 04 (April 2026)

Neuro-Symbolic Alpha: A Reproducible Hybrid Framework for Interpretable Stock Selection

DOI : 10.5281/zenodo.19608074

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 48
Authors : Mohammad Owais Hussain Sayed, Dr. Rajesh Bansode, Dr. Anil Vasoya
Paper ID : IJERTV15IS041058
Volume & Issue : Volume 15, Issue 04 , April – 2026
Published (First Online): 16-04-2026
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Neuro-Symbolic Alpha: A Reproducible Hybrid Framework for Interpretable Stock Selection

Mohammad Owais Hussain Sayed

Department of Information Technology) Thakur College of Engineering and Technology Mumbai, India

Dr. Anil Vasoya

Department of Information Technology) Thakur College of Engineering and Technology Mumbai, India

Dr. Rajesh Bansode

Department of Information Technology) Thakur College of Engineering and Technology Mumbai, India

Abstract – We introduce a reproducible neuro-symbolic stock selection pipeline correcting the interpretability – performance trade-off in stock selection machine learning. The system includes enforcing deterministic underlying basic safety regulations on upstream (Debt/Equity < 2.0, Operating Margin > 0, Free Cash Flow > 0, etc.) creating a consolidated Trust Score, and then gradient-boosted neural ranking (XGBoost).

It utilizes 461 S&P 500 constituents that had to have rigid temporal out-of-sample validation (80% training fold, 20% held out test set) to achieve a raw portfolio return of 73.08% (Sharpe 0.90) on the held-out test, and is strictly better than the equal- weight market baseline of 18.98 (Sharpe 0.25). We take specific caution to mention that this raw 73.08% number is not adjusted to point-in-time (PIT) data reporting lags and is representative of the extraordinary 2023-2024 technology bull-market regime; a cross-corroborated one should be 37.61% (Sharpe 0.45, 95% CI: [22.3%, 53.9%]). The component ablation shows that symbolic filtering only gives r = 0.19 (p=0.040), neural ranking only gives r

= 0.55 (p<0.001), and the hybrid system gives r = 0.53 (p<0.001). The small IC reduction from pure ML (r=0.55r=0.53, -3.3%) represents the explicit interpretability-accuracy trade-off: the symbolic layer purposely advertises a limited number of high- return but essential unsafe stocks. Its entire end-to-end pipeline is open-source so that one may verify it independently.

Index Terms- Neuro- Symbolic AI, Quantitative Finance, Re – producible Research, Algorithms Trading, Explainable AI, Hy- brid Systems.

Introduction

The use of machine learning to select equity has a fundamental tension relative to predictive power and interpretability. Deep neural architectures [1] have a very large non- linear modeling capacity but can be considered black-box, hence cannot effectively be used by institutions in risk-sensitive ways. On the other hand, classical factor models [2] would offer interpretability by way of clear economic explanation but with little expressiveness and excessive biasness.This dichotomy is particularly problematic in quantitative finance, where regulatory frameworks (e.g., MiFID II in Europe) increasingly demand algorithmic transparency [3],

This dichotomy is especially pernicious in quantitative finance, where regulators (e.g. MiFID II in Europe) steadily insist on algorithmic transparency [3], and competitive pressures stimulate use of advanced techniques of ML. Current studies have investigated hybrid methods [4], although most of the applications have compromised either interpretability with performance or the other way around.

An empirical pragmatic Knowledge-Injected architecture that alleviates this tension, which we propose, would consist of trainable two stage pipeline: (1) a deterministic symbolic symbol safety filter which imposes fundamental constraints and consolidation inside a global Trust Score followed by (2) an XGBoost neural ranking-granted model that optimizes against prediction of returns using this structured constraint both as an input feature (prioritized). Three basic benefits of such a design are:
1. Interpretable Constraint of Veto: All rejected stocks can be attributed to particular rule violation (ex: excessive leverage, negative cash flow), and safety must be enforced before machine learning prediction.
2. High-Signal Encoding: We encode human financial logic into a deterministic vector, which has the benefit of providing the neural ensemble with a strongly prioritized signal, providing reduced search space and eliminating overfitting.
3. Strong Empirical validation: We force the explicit separation between evaluation with rigorous out-of-sample holdout sets, we explicitly avoid the disastrous in-sample look ahead bias so prevalent in retail trading systems.
  1. Contributions
    
    This work makes four primary contributions:
    1. Reproducible Framework: We provide a complete open-source implementation with explicit data process- ing steps, enabling independent verification of our re- sults.
    2. Component Ablation: We systematically isolate the contribution of symbolic rules, neural ranking, and their combination, demonstrating synergistic benefits (Fig. 6).
    3. Stability Analysis: We validate predictive consistency across multiple temporal folds and decile portfolios (Figs. 7, 8).
    4. Honest Limitations Disclosure: We explicitly acknowl- edge data quality issues, regime specificity, and survivor- ship bias, providing a template for transparent financial ML research.
RELATED WORK
1. Machine Learning in Finance
  
  The use of ML in the asset pricing has developed past a linear model with a few features [2] to advanced deep learning architectures. Gu et al. [1] established that non-linear interactions of factors were able to be characterized by neural networks giving them excellent out-of-sample performance. Nevertheless, their models are not interpretable, so they are not easily adopted by institutions.
  
  Recent research has examined the attention mechanisms [6] and graph neural networks [7] to predict stocks, which is still not transparent. Our work is distinct in the meaning of applying strict symbolic restrictions up to neural inference, which guarantees interpretability, but expressiveness is not lowered. We are different by introducing symbolic rules as an explicit feature-engineered downstream constraint applied pragmatically before the model, making a literal tradeoff of the ability to distinguish theory to real-world convergence and predictive accuracy of out-of-sample prediction on tabular data.
2. Factor Investing and Quantitative Strategies
This type of pipeline design is economically viable as a result of its safety-first design.

Classical factor investment [9] is based on established value risk premia (quality), momentum and risk premia (value), and momentum. Other principles (low leverage, positive cash flow) embedded in our symbolic rules are binary filters instead of continuous scores. The neural component then acquires patterns of residual not explained by these factors.

Jegadeesh and Titman [10] were able to find momentum effects in returns of equities. Our feature importance analysis (Fig. 4) validates the result of this literature, as trend-following features are primarily used in our model to make predictions..
METHODOLOGY
1. Data Universe and Preprocessing
  
  Its universe of study is 461 S & P 500 constituents as of January 1, 2024. The data is found in Yahoo Finance API and it includes:
  - Basic: P/E ratio, Debt/Equity, Current Ratio, Operating Margin, Free Cash Flow, Return on Equty.
  - Technical: RSI, MACD, Price vs SMA(50/200) Volume Ratio, Volatility (ATR).
  - Target: forward one-year (Jan 2023 -Jan 2024)
  - Temporal Split: we have an extreme temporal cutoff to avoid look ahead bias
  - Training: All information until January 1, 2023.
  - Testing: January 1, 2023 through January 1, 2024. Critical Caveat: Yahoo Finance information might not indicate actual point-in-time (PIT) availability. Basic ratios that were issued in Q4 2022 (announced in Feb/Mar 2023) date to Dec 31, 2022. This brings about the possibility of look-ahead bias. We recognize this shortcoming in Section VIII.
2. The Neuro-Symbolic Pipeline
Fig. 1. System Architecture. It is implemented in three phases: (1) 60% of the candidates are rejected by Symbolic Safety Filter on the basis of basic constraints, (2) 1-year returns are predicted with the help of 35 technical projections by XGBoost Ranker, and (3) qualitative risk assessment is synthesized by the LLM Context Layer to survivors. The symbolic layer is some sort of a hard veto, making sure that no fundamentally unsound stock be chosen.
1. Our model (Fig. 1) works under three consecutive steps:
  
  Stage 1: Symbolic Safety Filter: A rule-based engine evaluates each stock against 13 fundamental constraints de- rived from Graham-Dodd value investing principles [11]:
  1. Debt/Equity < 2.0 (Solvency)
  2. Current Ratio > 1.0 (Liquidity)
  3. Operating Margin > 0 (Profitability)
  4. Free Cash Flow > 0 (Cash Generation)
  5. Return on Equity > 0 (Capital Efficiency)
  6. Revenue Growth > -10% (Business Viability)
  7. … (7 additional rules, see Appendix B)
    
    Each rule contributes to a Trust Score [0, 100]. Stocks scoring below 60 are strictly vetoed. This reduces the candi-
    
    date universe from 461 to 180 stocks.
    
    Implementation: The symbolic engine is implemented in
    
    scripts/core/neuro_symbolic.py as a standalone Python class, enabling independent testing and validation.
    
    Stage 2: Neural Context Layer (LLM): To stocks that have passed through the symbolic filter, a Large Language Model (Llama 3 70B on Groq API) generates a qualitative investment thesis. A structured prompt (Appendix A) is offered to the model and contains:
    - Trust Score and rule violations
    - Key financial ratios
    - Technical indicators
    - Sector context
      
      The LLM outputs a Bearish/Neutral/Bullish verdict with
      
      reasoning. This serves as a soft ranking signal, capturing
      
      nuanced context (e.g., sector-specific headwinds) that quan- titative features might miss.
      
      Rationale: We use Llama 3 70B rather than smaller models to leverage advanced reasoning capabilities for complex finan- cial analysis. The ablation study (Section V-E) quantifies the LLMs contribution.
      
      Stage 3: Gradient Boosted Ranking: This is the last stage where XGBoost regressor [12] is used to rank using forward returns 1 year ahead. Features include:
    - 35 technical indicators (RSI, MACD, trend strength, volatility)
    - Trust Score from Stage 1
    - LLM sentiment from Stage 2 (encoded as -1/0/+1)
Hyperparameters: n_estimators=100, max_depth=3, learning_rate=0.05, reg_lambda=1.0 (L2 regularization to prevent overfitting The model is trained using 5-fold cross- validation using data prior to 2023, after which it is tested on the held-out data between 2023-2024.

EXPERIMENTAL SETUP

Evaluation Metrics

We report four primary metrics:

TABLE I

Raw Out-Of-Sample Portfolio Performance (N=113 Holdout Test Set)

Strategy	Return	Sharpe	Std Dev	Win Rate
Market (Equal Weight)	18.98%	0.25	57.10%	63.7%
Random 20 Stocks	23.66%	0.51	37.83%	75.0%
Trust Score Top 20	34.55%	0.73	41.42%	80.0%
ML Pipeline (Top 20)	73.08%	0.90	76.21%	90.0%

exceptional bull-market period (20232024 Our conservative estimate, 5-fold cross-validated at 37.61 (Sharpe 0.45, 95% CI: [22.3%, 53.9%]) is our more justifiable figure to compare institutions.

Predictive Power
1. Information Coefficient (IC): Pearson correlation be- tween predicted and realized returns. We compute cross- sectional IC (correlation across stocks at a single time point) rather than time-series IC.
2. Sharpe Ratio: rrf where µr is mean portfolio return,
  
  r
  
  rf = 4.5% (20232024 US T-Bill rate), and ar is cross-
  
  sectional return volatility. All returns are expressed as percentages.
3. Annualized Alpha: Excess return over market baseline (S&P 500 buy-and-hold).
4. Decile Monotonicity: Correlation between decile rank and average return, testing whether higher predicted scores consistently map to higher realized returns.

Baseline Comparisons

We compare against four baselines:
1. Market (Buy & Hold): Passive S&P 500 investment
2. Simple Heuristic: Buy stocks with RSI < 70 and the trend is positive.
3. Pure Rules (Symbolic Only): Use Trust Score as ranking signal
4. Pure Neural (ML Only): XGBoost without symbolic filtering

RESULTS

The knowledge-infused pipeline has a raw out of sample performance of 73.08% and Sharpe ratio of 0.90, outperforming the equal weight market (18.98%, Sharpe of 0.25), and random selection (23.66%, Sharpe of 0.51) on totally unobserved data used in training. The difference in the outperformance is statistically significant (t = 2.60, p = 0.013). Importantly, we stress that 73.08% is a raw unadjusted portfolio re- turn that has been calculated on a 20-stock concentrated portfolio over an.

Fig. 2. Symbolic Filter Predictive Power. Scatter plot of Trust Score (symbolic rule output) vs realized 1-year returns for N=460 stocks. The correlation of r = 0.06 (p = 0.22, not statistically significant) demonstrates that fun- damental safety rules alone provide minimal predictive signal. This baseline establishes the value of the neural ranking component: the full neuro-symbolic system achieves r = 0.53 (see Fig. 3),. The weak correlation here validates our hybrid approachneither symbolic nor neural components are sufficient in isolation.

Figure 2 provides the predictive power of the symbolic filter only. The Trust Score (funda-mental safety rules output) has low correlation with future returns (r = 0.06, p = 0.22, not statistically significant), which supports the fact that the hard-coded fundamental restrictions are not enough to select stocks. This poor baseline is not accidental–the symbolic layer is meant to be safe-filtered (to reject fundamentally unsound stocks), rather than to rank things. The complete neurosymbolic system (this symbolic veto combined with neural ranking) attains r = 0.53 (Fig. 3). This justifies our architectural decision: symbolic rules offer interpretable constraints and neural components learn predictive patterns..

Fig. 3. Architecture Comparison. The neuro-symbolic system outperforms all baselines on both correlation (blue bars) and Sharpe ratio (green bars). Pure LLM baseline (leftmost) represents literature estimates for sentiment- only approaches. The hybrid system achieves the highest performance on both metrics, with Sharpe ratio of 0.88 exceeding the next-best baseline (Pure Neural, 0.65) by 35%.
DISCUSSION

A. Interpretation of Results

Our results demonstrate three key findings:
1. Interpretability-Accuracy Trade-off: The ablation study (Fig.6) quantifies an explicit trade-off: the symbolic filter reduces raw IC from 0.548 (pure ML) to 0.530 (full system), a
  
  3.3% relative reduction. This is the intentional cost of enforcing safety constraints.The main contribution introduced
  
  in the paper is not asking that symbolic rules have, alone, improved prediction–they do not, but symbolic rules alone– but that symbolic rules can give a principled safety-first constraint, which not only renders the study of neural stock selection interpretable, but also auditable as well.
2. Momentum Dominance: Fig. 4 titled Feature importances in the analysis of trend-following features shows that trend- following features (Price vs SMA200: 24.3, Volatility: 23.2, MACD: 17.6) are the most predictive. This is in line with momentum factor literature [10].
3. Regime Sensitivity: The IC stability analysis (Fig. 7) indicates that there is moderate and temporal variance across the bends of time, i.e. sensitivity to market regimes. The 2023-2024 test period was a robust bull market, and it probably increased predictive results of momentum.
1. Comparison to Prior Work
  
  Gu et al. [1] reported IC 0.100.15 using deep learning on institutional-grade data. Our reported IC of 0.53 (full system, cross-sectional Pearson r) exceeds this, but we acknowledge
  
  three inflating factors:
  1. Data Quality: Yahoo Finance vs. institutional PIT data
  2. Regime: Bull market (20232024) vs. multi-year vali- dation
  3. Survivorship: Current S&P 500 vs. historical universe Adjusting for these biases, a realistic estimate would be IC
    
    0.100.15, which is competitive with published results while using only freely available data.
2. Practical Implications
  
  We have shown through our paradigm that hybrid neuro- symbolic methods can be competitive in terms of performance and interpretable at the same time. The veto layer is a symbolic mechanism that helps to trace all the rejected stocks to particular rule violations, meeting regulatory transparency requirements.
  
  However, deployment at institutional scale would require:
  1. Point-in-Time Data: Eliminating look-ahead bias through proper data infrastructure
  2. Multi-Regime Validation: Testing across bull, bear, and sideways markets
  3. Capacity Analysis: Determining maximum AUM be- fore market impact degrades returns
  4. Risk Management: Implementing position limits, sector constraints, and drawdown controls
Transaction Costs & Implementation Friction

The gross returns that are reported are exclusive of fees and market power. An institutional implementation should consider the friction costs. Using annual portfolio rebalancing; assuming

100 percent turnover of the portfolio; the conservative transaction cost, 20 basis points (bps) per transaction:

Rnet = Rgross (Turnover × Cost per trade)

Rnet 37.61% (2.0 × 0.20%) = 37.21%

With the cross-validated conservative estimate (37.61%), when one friction is done, the strategy still has material net alpha remaining. Their favored low-frequency annual rebalancing schedule is a significant strength; greater frequency would be affected by transaction costs and bid-ask spreads by much more.
LIMITATIONS

We explicitly acknowledge four critical limitations:
Our symbolic rules use fixed thresholds (e.g., Debt/Equity

< 2.0) that do not account for sector-specific norms. Utilities
1. Reproducibility: Complete open-source implementation with all data processing and figure generation code.
While our reported raw performance metrics (Return 73.08%, Sharpe 0.90, IC r = 0.53) are computationally correct and reproducible from the open-source code, they must be interpreted with caution. Adjusting for known data artifacts and regime effects, conservative cross-validated es-

timates are Sharpe 0.45 and IC 0.15. Even at these adjusted levels, the framework demonstrates that hybrid neuro- symbolic approaches can achieve competitive performance

without sacrificing the explainability required for institutionaladoption.

Future work must prioritize validation with institutional- grade point-in-time data, multi-regime stress testing across bear markets, and the implementation of sector-relative rule thresholds.

REFERENCES
1. S. Gu, B. Kelly, and D. Xiu, Empirical asset pricing via machine learning, Review of Financial Studies, vol. 33, no. 5, pp. 2223-2273, 2020.
2. E. F. Fama and K. R. French, Common risk factors in the returns on stocks and bonds, Journal of Financial Economics, vol. 33, no. 1, pp. 3- 56, 1993.
3. F. Doshi-Velez and B. Kim, Towards a rigorous science of interpretable
  
  machine learning, arXiv preprint arXiv:1702.08608, 2017.
4. C. Chen, L. Zhao, and J. Bian, Investment behaviors can tell what inside: Exploring stock intrinsic properties for stock trend prediction, in Proc. 25th ACM SIGKDD, 2019, pp. 2376-2384.
5. A. dAvila Garcez and L. C. Lamb, Neurosymbolic AI: The 3rd wave,
  
  arXiv preprint arXiv:2012.05876, 2020.
6. F. Feng, X. He, X. Wang, C. Luo, Y. Liu, and T.-S. Chua, Temporal re- lational ranking for stock prediction, ACM Transactions on Information Systems, vol. 37, no. 2, pp. 1-30, 2019.
7. D. Matsunaga, T. Suzumura, and T. Takahashi, Exploring graph neural networks for stock market predictions with rolling window analysis, arXiv preprint arXiv:1909.10660, 2019.
8. J. B. Heaton, N. G. Polson, and J. H. Witte, Deep learning for finance: deep portfolios, Applied Stochastic Models in Business and Industry, vol. 33, no. 1, pp. 3-12, 2017.
9. E. F. Fama and K. R. French, A five-factor asset pricing model, Journal of Financial Economics, vol. 116, no. 1, pp. 1-22, 2015.
10. N. Jegadeesh and S. Titman, Returns to buying winners and selling losers: Implications for stock market efficiency, Journal of Finance, vol. 48, no. 1, pp. 65-91, 1993.
11. B. Graham and D. Dodd, Security Analysis, McGraw-Hill, 1949.
12. T. Chen and C. Guestrin, XGBoost: A scalable tree boosting system,
  
  in Proc. 22nd ACM SIGKDD, 2016, pp. 785-794.
  
  APPENDIX
  
  The Llama 3 70B model was queried with the following
  
  naturally run higher leverage than Technology companies, system prompt: yet our rules treat all sectors identically. Future work should
  
  implement sector-relative thresholds (e.g., reject if Debt/Equity
  
  > 80th percentile within sector).
CONCLUSION

We have presented a reproducible neuro-symbolic frame- work for stock selection that combines interpretable rule- based filtering with neural ranking. Our key contributions are methodological rather than merely empirical:

Methodological: A modular architecture enabling inde- pendent validation of symbolic and neural components.
Empirical: Quantification of the interpretability-

accuracy trade-off (3.3% IC reduction from pure ML to full system) and strict decile monotonicity.
Transparency: Honest disclosure of data quality issues, regime specificity, and survivorship bias.

You are a skeptical hedge fund analyst. I will

provide you with the financial metrics of a company. Your job is to identify RED FLAGS that pure num- bers might miss. Focus on: 1) Debt sustainability in rising rate environments, 2) Quality of earnings (Cash Flow vs Net Income), 3) Sector-specific head- winds. Output a Bearish, Neutral, or Bullish verdict with reasoning.

For each stock, the prompt is instantiated with:

Symbol and sector
Trust Score and specific rule violations
- Key ratios (P/E, Debt/Equity, Current Ratio, Operating Margin)
- Technical indicators (RSI, Price vs SMA200) The complete set of 13 fundamental constraints:

Debt/Equity < 2.0 (Solvency)
Current Ratio > 1.0 (Liquidity)
Operating Margin > 0 (Profitability)
Free Cash Flow > 0 (Cash Generation)
Return on Equity > 0 (Capital Efficiency)
Revenue Growth > -10% (Business Viability)
P/E Ratio < 50 (Valuation Sanity)
Profit Margin > 0 (Earnings Quality)
Cash Reserves > Operating Costs (Runway)
Dividend Yield < 15% (Sustainability)
Net Income > 0 (Accounting Profitability)
Analyst Target > Current Price (Consensus Support)
Price Change (1Y) > -50% (Momentum Screen)

13

Each rule contributes 100 7.7 points to the Trust Score. The complete codebase is available at: https://github.com/

Owais-15/Neuro-symbolic-finance Key modules:
- scripts/core/neuro_symbolic.py: Symbolic engine and LLM context generator
- scripts/generation/generate_temporal_dataset.py: Data preprocessing
- scripts/analysis/ablation_study.py: Com- ponent ablation experiments
- scripts/analysis/generate_advanced_metrics.py: IC stability and decile analysis
- scripts/generation/generate_thesis_charts.py: All figure generation (9 charts)