International Research Press
Serving Researchers Since 2012

NetRakshak: A User-Centric Threat Intelligence Framework for Real-Time Cyber Fraud Detection

DOI : https://doi.org/10.5281/zenodo.19603809
Download Full-Text PDF Cite this Publication

Text Only Version

NetRakshak: A User-Centric Threat Intelligence Framework for Real-Time Cyber Fraud Detection

Kommareddy Prathyusha Reddy

Assistant Professor

Dept. of Computer Science and Engineering Keshav Memorial Institute of Technology Hyderabad, Telangana, India

Shaik Mohammed Ishaq

Student (UG Scholar)

Dept. of Computer Science and Engineering Keshav Memorial Institute of Technology Hyderabad, Telangana, India

Jitendra Dhaduvai

Student (UG Scholar)

Dept. of Computer Science and Engineering Keshav Memorial Institute of Technology Hyderabad, Telangana, India

Rajamaina Abhinav

Student (UG Scholar)

Dept. of Computer Science and Engineering Keshav Memorial Institute of Technology Hyderabad, Telangana, India

Kotala Sudhamshu Bushan

Student (UG Scholar)

Dept. of Computer Science and Engineering Keshav Memorial Institute of Technology Hyderabad, Telangana, India

AbstractThe rapid proliferation of digital payments and online communication has produced a parallel surge in phishing attacks, fraudulent URLs, and scam-based cyber fraud, dispro- portionately targeting users who lack accessible verication tools. Existing cybersecurity solutions operate primarily at the system level and return opaque verdicts without transparent reason- ing or localized post-incident guidance. This paper introduces NetRakshak, a unied proof-of-concept cyber fraud detection framework that performs real-time verication of URLs, phone numbers, and email addresses through a multi-factor threat intelligence pipeline. The system integrates WHOIS domain analysis, heuristic-based phishing signal detection, brand imper- sonation recognition, and a tiered risk scoring engine to classify inputs as Safe, Suspicious, High Risk, or Critical, surfacing interpretable risk factors and actionable safety recommendations rather than binary threat verdicts. Evaluation on 150 labeled samples from PhishTank and curated legitimate domains achieves 91.3% accuracy, 93.1% precision, 92.7% recall, and an F1-score of 92.9%, with scoring latency below 100 ms.

Index TermsCybersecurity, Phishing Detection, Threat In- telligence, Risk Scoring, Explainable Security, Brand Imperson- ation, URL Analysis, Cyber Fraud Prevention

  1. Introduction

    The rapid digitization of nancial services in India and globally has expanded the attack surface for cybercriminals. Phishing attacks, fraudulent payment links, scam phone num- bers, and deceptive emails are the most prevalent vectors of cyber fraud targeting ordinary citizens. According to the Indian Cybercrime Coordination Centre (I4C), cybercrime complaints have grown signicantly year over year, with nancial fraud accounting for the majority of incidents [1]. General internet usersthe fastest-growing segment of Indias digital popula-

    tionremain disproportionately vulnerable because accessi- ble, real-time fraud-detection tools are scarce.

    Blacklist-based systems are reactive, agging only previ- ously catalogued threats while missing newly registered phish- ing domains. Enterprise-grade tools are designed for security analysts, not the general public, and offer little human-readable guidance. No unied, publicly accessible tool currently veries URLs, phone numbers, and emails from a single interface with interpretable output.

    NetRakshak closes these gaps by aggregating multiple threat intelligence signalsWHOIS domain analysis, heuristic phishing detection, and brand impersonation recognitioninto a transparent, tiered risk assessment accompanied by plain- language risk factors and India-specic safety recommenda- tions. The principal contributions are:

    • A unied scan pipeline verifying URLs, phone numbers, and emails from a single interface in real time.

    • A tiered, multi-factor risk scoring engine with exponential tier weighting and an override rules engine that hard- escalates dangerous signal combinations.

    • A transparency layer presenting interpretable risk indica- tors and plain-language safety recommendations to users lacking security expertise.

    • India-specic post-incident guidance directing users to the National Cyber Crime Reporting Portal and helpline 1930.

    • Empirical evaluation on 150 labeled samples demonstrat- ing 91.3% accuracy and sub-100 ms scoring latency.

  2. Related Work

    1. Phishing Detection and URL Analysis

      Sahingoz et al. [2] showed that Random Forest classiers on lexical URL features exceed 97% accuracy on PhishTank snap- shots, but their system offers no real-time WHOIS analysis and no user-readable explanation of why a URL was agged. NetRakshak adds both. Mohammad et al. [3] combined rule- based heuristics with neural networks for phishing classica- tion but provide no post-detection user guidance; NetRakshak addresses this through a recommendation engine with India- specic reporting channels. Khonji et al. [4] identied the core weakness of blacklist-only systemsnewly registered do- mains escape detection until manually submittedmotivating NetRakshaks WHOIS domain-age module, which ags do- mains under 30 days old even before they appear on any blocklist.

    2. Threat Intelligence Aggregation

      Sillaber et al. [5] demonstrated that multi-source aggre- gation signicantly reduces false negatives, motivating Ne- tRakshaks hybrid signal model. VirusTotal [6] aggregates 70+ engine verdicts but returns no unied score or domain legitimacy analysis. Google Safe Browsing (GSB) [7] provides binary safe-or-unsafe verdicts with no domain-age, brand- impersonation, or recovery-guidance output. NetRakshak syn- thesizes both as input signals within a broader scoring model rather than treating either as a standalone arbiter.

    3. Explainable and User-Centric Security

      Gunning et al. [8] argued that explainable AI is essential for user trust in high-stakes automated decisions. Almalki and Masud [9] showed SHAP-based explainability improves trust in nancial fraud detection, but SHAP plots require statistical literacy. Volkamer et al. [10] found users act on security warnings signicantly more often when clear explanations and next steps are provided. NetRakshak implements this principle through plain-language risk factor descriptions and prioritized recommendations, rather than feature-importance plots designed for analysts.

    4. Research Gap

    No single publicly accessible tool unies URL, phone num- ber, and email verication; existing systems return verdicts without plain-language reasoning; and no tool provides post- incident guidance calibrated to Indias cybercrime reporting infrastructure. NetRakshak addresses all three gaps simultane- ously.

  3. Problem Statement

    Given an arbitrary input I (URL, phone number, or email address), the goal is to design a system S that: (i) auto-detects input type; (ii) aggregates multiple threat intelligence signals;

    (iii) computes a normalized risk score r [0, 100]; (iv) clas- sies r into L {Safe, Suspicious, High Risk, Critical};

    (v) generates a human-readable explanation of contributing

    risk factors; and (vi) provides actionable safety recommen- dations anchored to Indias cybercrime reporting ecosystem.

    Three gaps in existing tools motivate this formulation. Gap 1 (Fragmentation): No unied tool veries URLs, phone numbrs, and emails from one interface; users must consult multiple services with inconsistent terminology. Gap 2 (Opacity): Existing platforms return binary verdicts without explaining their reasoning, leaving lay users unable to make informed decisions. Gap 3 (No local guidance): Global tools offer no recovery pathways specic to Indias cybercrime reporting mechanisms, such as cybercrime.gov.in and helpline 1930.

  4. System Architecture

    NetRakshak comprises ve principal components: Input Processing and Type Detection, Threat Intelligence Aggrega- tion, WHOIS Domain Analysis, a Heuristic Phishing Detection Engine, and a Tiered Risk Scoring Engine with Explainability output. Fig. 1 shows the complete data ow; dashed lines indicate graceful-degradation paths when external APIs are unavailable.

    Fig. 1. NetRakshak system architecture. Solid arrows show the primary analysis path; dashed arrows indicate fallback paths activated when external threat intelligence APIs are rate-limited or unavailable.

    1. Input Processing and Type Detection

      Raw string input is classied via regular-expression match- ing into URL/domain, phone number, or email address. URLs are identied by HTTP/HTTPS scheme or domain struc- ture; phone numbers by Indian/international numeric patterns; emails by RFC 5322 syntax. The detected type routes the input to the appropriate analysis services.

    2. Threat Intelligence Aggregation

      For URL inputs, the system queries Google Safe Browsing

      [7] and VirusTotal [6], normalizing each response into a

      standard schema containing threat classication, condence score, and source attribution. If an external API is unavailable, the system continues with remaining signals rather than failing entirely.

    3. WHOIS Domain Analysis

      Domain creation date, registrar, and privacy-protection sta- tus are retrieved and converted to the following signals:

      (i) domain age <30 days: Tier 1 weight 15; (ii) domain age

      <90 days: Tier 1 weight 8; (iii) WHOIS privacy protection: Tier 1 weight 5; (iv) high-risk TLD (.tk, .ml, .ga, .cf): Tier 2 weight 18. Fresh domains are a strong phishing indicator because attackers register new domains specically to evade blocklists [4].

    4. Heuristic Phishing Detection Engine

      The detection module evaluates four feature categories: structural URL features (length >100 chars, IP-based domain, @ symbol, subdomain depth >2); credential-harvesting key- words (login, verify, account, secure, suspended); nancial targeting keywords (payment, billing, invoice); and protocol features (absent HTTPS, high-risk TLD). A phishing proba- bility score is computed as:

      N

      p = min 1, k sk (1)

      Each scan also produces a condence score c [0, 1]:

      r

      c = 1 (3)

      max(r, 1)

      where r is the standard deviation of individual signal con- tributions. Low r (signals agree) yields c 1; conicting signals raise r and lower c, agging borderline cases for additional user scrutiny.

      An override rules engine bypasses numeric scoring for known dangerous signal combinations (Table I).

      TABLE I Override Rule Matrix

      Condition

      Level

      Score

      Conrmed threat intel match

      CRITICAL

      95

      Financial brand + auth key-

      words

      CRITICAL

      88

      Non-nancial brand + auth

      keywords

      HIGH

      75

      Financial + auth keywords (no

      brand)

      HIGH

      72

      3 or more Tier 3 signals

      HIGH

      78

      Brand + urgency + auth key-

      words

      CRITICAL

      90

      Brand + nancial keywords

      HIGH

      74

      The nal risk level mapping is:

      Safe 0 r 30

      where sk {0, 1} is the binary outcome for feature k and N = 12 is the total feature count. The architecture supports replacing this heuristic engine with a trained Random Forest or XGBoost classier in future iterations.

    5. Brand Impersonation Detection

      A JSON brand database covers high-value nancial and technology brands (PayPal, Chase, Bank of America, Ama- zon, Google, Microsoft, and others) with ofcial domains and typosquatting variants. Detection applies four strate- gies sequentially: (i) ofcial domain verication (imper- sonation ruled out); (ii) subdomain impersonation, e.g., paypal.phishing-site.com (95% condence); (iii) ty- posquatting, e.g., paypai.com (90%); (iv) brand keyword in non-ofcial domain (80%). Financial brands trigger more severe override rules than technology brands, reecting greater harm from nancial credential theft.

    6. Risk Scoring Model

      L

      Signals are classied into three tiers: Tier 1 Informational (×1.0), Tier 2 Suspicious (×2.5), and Tier 3 Critical (×5.0). The aggregate score is:

      S = wi × mt(i) (2)

      i

      L = Suspicious 31 r 60

      High Risk 61 r 85

      Critical 86 r 100

      (4)

      where wi is the base weight of signal i and mt(i) is its tier multiplier. When multiple Tier 3 signals co-occur, a 10-point bonus per additional signal reects exponentially increasing danger.

      Fig. 2. Risk scoring engine signal ow. Signals are tier-classied and weight- multiplied; the override engine (right branch) bypasses numeric aggregation for known high-risk combinations and directly assigns the nal risk level.

    7. Explainability and Recommendation Output

    For each scan the explainability layer produces: (i) a list of agged risk factors with plain-language descriptions; (ii) con- dence score c (Eq. 3); and (iii) prioritized safety recom- mendations matched to the detected risk level. For Critical assessments with nancial brand impersonation, recommen- dations include warnings against credential entry, guidance to verify through ofcial channels, instructions to report at cy- bercrime.gov.in, and advice to change passwords immediately.

  5. Implementation

    NetRakshak is implemented as a full-stack web application with a FastAPI [11] backend (Python 3.11) and a React.js frontend. The backend is structured into ve independent service modules: whois_service.py, threat_intel_enhanced.py, ml_service.py, risk_scorer.py, and unified_scanner.py. This separation allows each module to be tested, replaced, or upgraded independently.

    Scan pipeline. Upon receiving a request, the orchestrator executes sequentially: (1) input validation and type detec- tion; (2) URL feature extraction (length, subdomain depth, TLD class, HTTPS status, keyword presence); (3) parallel WHOIS lookup and threat intelligence API queries; (4) brand impersonation detection; (5) heuristic phishing classication;

    (6) tiered risk score aggregation including override evaluation;

    (7) explainability output generation; (8) MongoDB persistence and response delivery. Scan results are cached in MongoDB; WHOIS results are cached for seven days, reducing repeated- domain latency to under 50 ms. Fig. 3 illustrates the complete pipeline ow.

    Fig. 3. NetRakshak scan pipeline execution ow. Steps 12 are synchronous preprocessing; Steps 35 execute in parallel; Steps 68 produce and deliver the nal risk output.

    Frontend. Scanner.jsx handles input submission with a live loading indicator. RiskScore.jsx ren- ders the score and risk level with color-coded indicators (green/yellow/orange/red). ThreatDetails.jsx displays risk factors and recommendations. History.jsx maintains a session-scoped scan history.

    API endpoints. POST /api/scan executes the full pipeline an returns a structured result (score, level, risk factors, condence, and recommendations). GET

    /api/scan/{id} retrieves a previous result. GET

    /api/whois/{domain} performs a standalone WHOIS lookup. GET /api/health returns service health status.

  6. Evaluation and Results

    1. Dataset and Methodology

      We evaluated NetRakshak on 150 labeled samples: 80 phishing URLs from the PhishTank public feed [12] (30-day window prior to testing), 20 from the OpenPhish live feed

      [13] representing active campaigns, and 50 conrmed legiti- mate domains from the Alexa Top-1M list veried manually. Phishing-class prevalence is 67%, consistent with real-world fraud-detection distributions.

      A URL was classied as positive (phishing) if the system assigned Suspicious, High Risk, or Critical; Safe was treated as negative. Latency was measured from API receipt to score delivery, excluding external API wait time. For baseline com- parison, GSB and VirusTotal were queried via their public APIs on the same 150 URLs.

    2. Aggregate Performance

      Table II reports classication metrics and latency across the full dataset.

      TABLE II

      Aggregate Performance Metrics (150-Sample Dataset)

      Metric

      Value

      Accuracy

      91.3%

      Precision

      93.1%

      Recall

      92.7%

      F1-Score

      92.9%

      False Positive Rate

      8.0%

      Avg. scoring latency (no external API)

      <100 ms

      Avg. end-to-end latency (with WHOIS)

      1.8 s

      WHOIS cache-hit latency

      <50 ms

      The system correctly classied 137 of 150 samples. Of 13 misclassications, 4 were false positives (legitimate domains agged Suspicious due to young domain age combined with generic commercial keywords such as secure or account) and 9 were false negatives (phishing URLs that mimicked legitimate structure without triggering brand impersonation or high-risk TLD signals). All false positives were Suspicious-level (score 3552); none reached High Risk or Critical, limiting real- world harm. Planned mitigations include a domain-reputation whitelist for known hosting providers and downweighting the domain-age penalty when a well-known CA has issued the SSL certicate.

    3. Baseline Comparison

      Table III compares NetRakshak against GSB, VirusTotal, and the ML-only classier of Sahingoz et al. [2].

      TABLE III

      Numeric Performance Comparison

      System

      Acc.

      Prec.

      Rec.

      F1

      NetRakshak (ours)

      91.3%

      93.1%

      92.7%

      92.9%

      Google Safe Browsing

      74.0%

      88.2%

      66.3%

      75.7%

      VirusTotal (majority vote)

      82.0%

      90.5%

      78.8%

      84.3%

      ML-only [2]

      97.4%

      97.6%

      97.3%

      97.4%

      NetRakshak outperforms GSB by 17.3 pp and VirusTotal by 9.3 pp on accuracy. The recall advantage over GSB is particularly pronounced because WHOIS domain-age analysis ags newly registered phishing domains before they appear on any blocklist. The trained ML-only classier achieves higher raw accuracy (97.4%) on its own benchmark splitexpected for a supervised model evaluated on its own training distri- butionbut provides none of NetRakshaks interpretability, multi-input unication, or localized guidance. Replacing the heuristic engine with a trained classier is planned as the primary next-phase improvement.

    4. Signal Detection and Override Accuracy

      Table IV reports per-signal detection rates. High-risk TLD and HTTPS-absence detection reach 100% through deter- ministic structural checks. Lower rates for domain-age and WHOIS privacy signals reect incomplete registrar records, consistent with [4]. All seven override rule conditions triggered correctly across tested inputs; the nancial/non-nancial brand distinction was applied correctly in every case.

      TABLE IV

      Signal Detection Rates (150-Sample Dataset)

      Signal Category

      Tier

      Detection Rate

      Brand impersonation

      3

      94.3%

      Auth. keywords

      3

      96.0%

      Financial keywords

      3

      91.2%

      High-risk TLD

      2

      100%

      No HTTPS

      2

      100%

      Domain age <30 days

      1

      87.5%

      WHOIS privacy

      1

      82.1%

    5. Feature Comparison

    Table V compares architectural capabilities. NetRakshak is the only evaluated system offering all seven features simulta- neously; no existing single tool combines real-time WHOIS analysis, brand impersonation detection, interpretable output, and India-specic guidance in a unied interface.

  7. Discussion and Limitations

    1. Key Findings

      Three design hypotheses are conrmed by evaluation. First, exponential tier weighting correctly prioritizes critical phishing indicators without letting them be diluted by low-weight

      TABLE V Architectural Feature Comparison

      Feature

      Ours

      GSB

      VT

      ML-only

      Unied multi-input

      ×

      ×

      ×

      Real-time WHOIS

      ×

      ×

      ×

      Brand detection

      ×

      ×

      ×

      Interpretable

      output

      ×

      ×

      ×

      Override rules

      ×

      ×

      ×

      India-specic guid-

      ance

      ×

      ×

      ×

      Multi-source intel

      ×

      ×

      informational signals. Second, the override engine success- fully hard-escalates dangerous combinationsparticularly – nancial brand impersonation with credential-harvesting key- wordsthat weighted averaging alone would underweight. Third, the plain-language explainability layer satises the design principle of Volkamer et al. [10]: every risk verdict is paired with specic, actionable guidance rather than an unexplained alarm. The WHOIS domain-age module proved the most impactful differentiator over blacklist-only baselines, proactively catching phishing infrastructure before it appears on any external blocklist.

    2. Limitations

    Heuristic detection engine. The phishing module is rule- based; sophisticated URLs that avoid keyword patterns, high- risk TLDs, and structural anomalies may evade detection (9 false negatives observed). Integrating a trained classier is the highest-priority next step.

    False positive rate. The 8.0% FPR, concentrated in Suspicious-level classications of newly launched legitimate services, could erode user trust at scale. SSL-certicate-aware scoring and provider whitelists are planned mitigations.

    Band database and API coverage. The current brand database does not include major Indian nancial institutions (HDFC Bank, SBI, Paytm, PhonePe); expanding it is a pri- ority. The system also depends on GSB and VirusTotal API availability, though graceful degradation limits the impact of outages.

    Dataset scale. The 150-sample dataset yields meaningful aggregate metrics but is insufcient for denitive generaliza- tion claims; large-scale evaluation on thousands of live feed samples is required.

  8. Conclusion

NetRakshak demonstrates that detection rigor and user- facing transparency are not competing objectives: a multi- signal scoring system structured to expose its own evidence can be both analytically effective and directly interpretable by the people who most need protection. Empirical evaluation on 150 labeled samples achieved 91.3% accuracy, outperforming GSB by 17.3 pp and VirusTotal by 9.3 pp on the same dataset, with the WHOIS domain-age module emerging as the key differentiator over reactive blacklist-only approaches.

The override rule engine validated the value of encoding expert knowledge as deterministic rules: nancial brand im- personation combined with credential-harvesting vocabulary poses a risk severe enough to warrant direct Critical escalation rather than relying on weighted averaging. The 8.0% false positive rateentirely at Suspicious levelhighlights the in- herent tension between aggressive early-stage domain agging and user trust, a trade-off to be addressed through CA-aware scoring and reputation whitelisting in the next development phase.

Replacing the heuristic detection engine with a trained classier and expanding brand coverage to major Indian nan- cial institutions are the highest-priority improvements. More broadly, this work offers a reference architecture showing that explainability-rst, multi-signal fraud detection is feasible at real-time latency, and may guide future user-centric security tools in rapidly digitizing regions where cybersecurity aware- ness still lags behind digital adoption.

Acknowledgment

The authors thank the Department of Computer Science and Engineering, Keshav Memorial Institute of Technology, Hy- derabad, for providing computational resources and research infrastructure.

References

  1. Indian Cybercrime Coordination Centre, Annual report on cybercrime in india, https://www.cybercrime.gov.in, 2023.

  2. O. K. Sahingoz, E. Buber, O. Demir, and B. Diri, Machine learning based phishing detection from URLs, Expert Systems with Applications, vol. 117, pp. 345357, 2019.

  3. R. M. Mohammad, F. Thabtah, and L. McCluskey, Predicting phishing websites based on self-structuring neural network, Neural Computing and Applications, vol. 25, pp. 443458, 2014.

  4. M. Khonji, Y. Iraqi, and A. Jones, Phishing detection: A literature survey, IEEE Communications Surveys and Tutorials, vol. 15, no. 4,

    pp. 20912121, 2013.

  5. C. Sillaber, C. Sauerwein, A. Mussmann, and R. Breu, Data quality challenges and future research directions in threat intelligence sharing practice, in Proceedings of the 2016 ACM Workshop on Information Sharing and Collaborative Security, 2016, pp. 6570.

  6. VirusTotal free online virus, malware and url scanner, https://www.virustotal.com, 2024.

  7. Google, Safe browsing API, https://safebrowsing.google.com, 2024.

  8. D. Gunning, M. Stek, J. Choi, T. Miller, S. Stumpf, and G.-Z. Yang, XAI explainable articial intelligence, Science Robotics, vol. 4, no. 37, 2019.

  9. F. Almalki and M. Masud, Financial fraud detection using explainable AI and stacking ensemble methods, arXiv preprint arXiv:2505.10050, 2025.

  10. M. Volkamer, K. Renaud, B. Reinheimer, and P. Rack, User experiences of TORPEDO: Tooltip-assisted phishing email detection, Computers and Security, vol. 71, pp. 100113, 2017.

  11. S. Ramirez, FastAPI modern, fast web framework for building APIs with Python, https://fastapi.tiangolo.com, 2024.

  12. PhishTank free community site for phishing data, https://www.phishtank.com, 2024.

  13. OpenPhish phishing intelligence, https://openphish.com, 2024.