DOI : https://doi.org/10.5281/zenodo.20268031
- Open Access

- Authors : Dr. K. Subbarao, Ambika Jyothi Devana, Nandini Meda, Madhu Kumar Nidrabingi, Rohith Sai Pasupuleti
- Paper ID : IJERTV15IS050546
- Volume & Issue : Volume 15, Issue 05 , May – 2026
- Published (First Online): 18-05-2026
- ISSN (Online) : 2278-0181
- Publisher Name : IJERT
- License:
This work is licensed under a Creative Commons Attribution 4.0 International License
User and Entity Behavior Analytics for Cybersecurity Using Unsupervised Clustering Techniques
Dr. K. Subbarao (1), Ambika Jyothi Devana (2), Nandini Meda (3), Madhu Kumar Nidrabingi (4), Rohith Sai Pasupuleti (5)
Professor & HOD(1), Student (2,3,4,5)
Department of CSE Data Science, St. Ann’s College of Engineering & Technology, Chirala, Andhra Pradesh, India.
Abstract With the increasing complexity of cyber threats, especially insider attacks and unknown anomalies, traditional rule-based security systems are often insufficient. This paper presents a User and Entity Behavior Analytics (UEBA) system that applies unsupervised machine learning techniques to analyze user and system activity logs and detect abnormal behavior. Since labeled attack data is rarely available in real-world scenarios, clustering algorithms are employed to learn normal behavior patterns and identify deviations automatically. Multiple clustering algorithms K-Means, DBSCAN, and HDBSCAN are implemented and evaluated on the E-shop Clothing Clickstream Dataset comprising over 165,000 records. Data preprocessing encompasses label encoding of categorical attributes and Z-score normalization to ensure uniform feature contribution to clustering. Performance is assessed using internal evaluation metrics: Silhouette Score, Davies-Bouldin Index, and Calinski-Harabasz Index. Experimental results demonstrate that density-based algorithms, particularly HDBSCAN, outperform partition-based methods in identifying irregular and rare behavior patterns. HDBSCAN automatically identifies anomalous sessions as noise points (Cluster -1) without requiring a predefined number of clusters. The system is deployed as an interactive Streamlit web application, providing security analysts with cluster visualization, anomaly export, and behavioral profiling capabilities. The proposed approach demonstrates a scalable and practical solution for proactive cybersecurity monitoring applicable across banking, e-commerce, healthcare, and enterprise security domains.
Keywords UEBA; anomaly detection; HDBSCAN; unsupervised clustering; cybersecurity; clickstream analytics; insider threat; behavioral analytics
-
INTRODUCTION
-
Introduction to UEBA
User and Entity Behavior Analytics (UEBA) is a cybersecurity process that involves analyzing the behavior of users and entities such as devices, applications, and servers to detect anomalies that may indicate security threats. Unlike traditional security systems that rely on signature-based detection and rule engines, UEBA builds behavioral baselines from historical data and flags deviations in real time. The concept of UEBA evolved from earlier User Behavior Analytics (UBA) solutions and now extends monitoring to all entities within a network,
combining data science, machine learning, and cybersecurity principles to create adaptive, intelligent monitoring systems [1].
-
Overview of Cybersecurity and Behavior Analytics
The cybersecurity landscape has undergone dramatic transformation over the past two decades. Early security measures focused on perimeter defense using firewalls and antivirus tools. As networks became increasingly complex, the industry evolved toward intrusion detection systems (IDS), security information and event management (SIEM), and ultimately behavior analytics. Table I summarizes this evolution across different eras, highlighting how defensive technologies have progressed alongside emerging threats.
TABLE I. Evolution of Cybersecurity Threat Landscape
Era
Primary Threat
Dominant Defense
Limitation
1990s
Viruses, Worms
Antivirus, Firewalls
Signature-dependent; unknown threats evade detection
2000s
Network Intrusions, DDoS
IDS/IPS, SIEM
High false positive rates; cannot detect insider threats
2010s
APTs,
Ransomware, Insider Threats
IDS/IPS, SIEM
Lack of behavioral context; requires labeled threat data
2020s
AI-Driven Attacks, Zero-Days
UEBA,
Zero Trust Architecture
Requires continuous tuning; complex deployment
Behavior analytics represents the fourth generation of cybersecurity defense, combining the best elements of all previous approaches while adding adaptive, data-driven intelligence. Modern UEBA platforms integrate with SIEM and SOAR (Security Orchestration, Automation and Response) platforms to provide comprehensive organizational protection.
-
Rule-Based vs. Behavior-Based Security
Traditional rule-based security systems operate on predefined signatures and thresholds. While effective against known threats, they fail to adapt to novel attack patterns, zero-day vulnerabilities, and sophisticated insider threats. Table II provides a comparative analysis illustrating how behavior-based security overcomes the fundamental limitations of rule-based approaches.
TABLE II. Rule-Based vs. Behavior-Based Security Comparison
Criterion
Rule-Based Security
Behavior-Based Security (UEBA)
Detection Basis
Predefined signatures and static rules
Dynamic behavioral baselines learned from data
Adaptability
Cannot detect new/unknown threats
Automatically adapts to evolving threat patterns
Insider Threat Detection
Limited; rules rarely cover authorized users behaving maliciously
Highly effective; detects subtle deviations in authorized user behavior
False Positive Rate
High; any policy violation triggers an alert
Lower; anomalies must deviate significantly from baseline
Labeled Data Requirement
Requires known threat signatures
No labeled data required (unsupervised approach)
Scalability
Rule sets become unwieldy at scale
Models scale with data volume and dimensionality
-
Problem Statement
Modern organizations generate massive volumes of security logs from servers, applications, network devices, and user workstations. The sheer volume of data makes manual analysis impractical a typical enterprise may generate millions of log entries per day. The core challenges addressed in this work are: (1) insider threats and anomalous behaviors cannot be detected by rule-based systems that require predefined attack signatures; (2) existing UEBA solutions frequently rely on one or two clustering techniques without systematic comparative evaluation; (3) selecting the most appropriate clustering algorithm for different UEBA scenarios with varying data density and dimensionality remains an open research challenge [3].
-
Objectives of the Project
The primary objective of this work is to apply and evaluate clustering-based unsupervised learning techniques for UEBA. Specific objectives include: implementing and comparing K-Means, DBSCAN, and HDBSCAN clustering algorithm for behavior-based anomaly detection; preprocessing the E-shop Clothing Clickstream Dataset using label encoding and Z-score normalization; evaluating clustering performance using Silhouette Score, Davies-Bouldin Index, and Calinski-Harabasz Index; and demonstrating the effectiveness of HDBSCAN for practical UEBA applications deployable in enterprise environments.
-
-
LITERATURE SURVEY
-
Existing UEBA and Anomaly Detection Approaches
UEBA and behavior-based anomaly detection have been studied from multiple angles: statistical modeling, machine learning, deep learning, and hybrid approaches. Table III summarizes key existing systems reviewed in this study.
TABLE III. Existing UEBA and Anomaly Detection Approaches
Title / Authors
Methodology
Advantage s
Disadvantages
Shashanka et al. User Behavior Analytics for Insider Threat Detection
ML-based behavioral analysis with SVD
dimensionalit y reduction
Detects anomalies without predefined signatures; reduces data complexity
Limited interpretability
; sensitive to data quality issues
Tang et al. Feature Engineerin g for Behavior-Based Anomaly Detection
Feature engineering with behavior-based anomaly detection models
Improves detection accuracy; extracts meaningful behavioral features
Requires manual feature selection; limited generalization to new domains
Singh et al.
Clustering-Based Intrusion Detection in Network Traffic
K-Means and DBSCAN
applied to network flow data
No labeled data required; detects novel attacks
Difficulty setting parameters; unstable across different traffic profiles
-
Types of Inputs Used in UEBA Models
UEBA systems integrate multiple data sources to construct comprehensive behavioral profiles. Table IV categorizes the primary data types used in modern UEBA systems and their relevance to anomaly detection.
Data Category
Specific Features
Source
Relevance to UEBA
Behavioral
/ Clickstream
Page views, session duration, click sequences, navigation paths
Web server logs, application logs
Captures user intent and browsing patterns; reveals unusual navigation
Technical Indicators
Login times, IP
addresses, device fingerprints, access frequency
Authentication logs, AD/LDAP
Identifies suspicious access patterns and location anomalies
Network Activity
Connection volume, port access, data transfer size, protocol usage
Network flow data, firewall logs
Detects data exfiltration and lateral movement
TABLE IV. Types of Inputs Used in UEBA Models
Entity Metadata
Country, product category, user role, department
HR systems, directory services
Contextualizes behavior against peer group norms
Temporal
Time of
Derived from
Identifies off-
Features
access, day
timestamps
hours access
of week,
and irregular
session gap
session
patterns
patterns
-
Feature Selection Methods
Feature selection is critical for improving model accuracy and efficiency. In high-dimensional behavioral datasets, many features may be redundant or correlated, leading to overfitting and increased computational cost. Table V summarizes key feature selection techniques applied in this study.
TABLE V. Feature Selection Methods
Method
Descript ion
Use Case in UEBA
Pros
Cons
Correlat
Removes
Eliminating
Simple;
Only
ion
highly
redundant
interpreta
detects
Analysis
correlate
behavioral
ble
linear
d
metrics
relationshi
features
ps
to reduce
redundan
cy
PCA
Transfor
Dimension
Reduces
Componen
ms
ality
noise;
ts lose
correlate
reduction
improves
original
d
for
clustering
interpretab
features
visualizatio
ility
into
n and
uncorrela
clustering
ted
principal
compone
nts
Label
Converts
Encoding
Lightwei
Implies
Encodin
categoric
country,
ght;
ordinal
g
al
product
compatibl
relationshi
features
group,
e with all
ps in
to
color
algorithm
nominal
numerica
attributes
s
data
l format
-
Related Work
Shashanka et al. introduced machine learning-based behavioral analytics for insider threat detection, applying Singular Value Decomposition (SVD) for dimensionality reduction. While effective, their approach lacked interpretability. Tang et al. proposed feature engineering pipelines demonstrating that carefully constructed behavioral features significantly improve detection accuracy. Singh et al. applied Long Short-Term Memory (LSTM) networks for sequential user behavior modeling, achieving high detection accuracy but requiring labeled training data and substantial computational resources [3]. Campello et al. [2] introduced HDBSCAN as an extension of DBSCAN based on hierarchical density estimates,
providing the theoretical foundation for the primary algorithm adopted in this work. Chandola et al. [3] published a comprehensive survey on anomaly detecion techniques, offering the conceptual taxonomy within which this study’s approach is situated. MacQueen et al. introduced the K-Means clustering algorithm used for baseline comparison in this project [7].
-
-
PROPOSED METHODOLOGY
-
Dataset Description
The proposed system utilizes the E-shop Clothing Clickstream Dataset to model user-entity interactions within an online shopping environment. This dataset captures user browsing activities and interaction patterns on an e-commerce platform, including product views, session behaviors, and navigation activities. It comprises 165,473 records with 14 features spanning three primary attribute categories, as summarized in Table VI.
TABLE VI. Dataset Characteristics
Attribute Type
Description
Purpose
Browsing Data
Page visits, navigation paths
Understand user behavior
Product Data
Product category, type
Analyze product interactions
Session Data
Duration, actions, entry/exit pages
Session-level analysis
-
Data Preprocessing
Data preprocessing transforms raw clickstream data into a structured format suitable for machine learning. This stage encompasses three primary steps. First, attribute separation organizes mixed browsing, product, and session attributes into structured columns to improve clarity and processing efficiency. The page2 feature is parsed into two components: model_group (categorical: A, B, C, P) and model_id (numeric: 1-82). The constant year column is removed as it contributes no discriminative information.
Second, Label Encoding is applied to convert categorical attributes including product categories, page types, and country identifiers into numerical representations, enabling machine learning algorithms to process categorical data effectively. Third, Z-score Normalization is performed using StandardScaler, ensuring all features contribute equally to the clustering process regardless of their original measurement scale.
-
Feature Engineering
Meaningful behavioral features are extracted from the raw dataset and organized into three categories as described in Table VII. These features enhance clustering performance and support effective identification of abnormal activities in UEBA systems.
Feature Category
Features
Description
Browsing Behavior
Pages visited, click frequency, navigation paths
Captures user browsing patterns
TABLE VII. Key Features Used
Product Interaction
Product category, type, interaction frequency
Tracks user-product engagement
Session-Based
Session duration, actions, entry/exit pages
Represents session dynamics
-
Unsupervised Clustering Algorithms
Three unsupervised clustering algorithms are implemented and compared. K-Means partitions data into k clusters by minimizing intra-cluster variance. It is computationally efficient but requires specifying k in advance and assumes spherical, equally-sized clusters. DBSCAN (Density-Based Spatial Clustering of Applications with Noise) groups points in dense regions and marks sparse outliers as noise, automatically detecting anomalies without requiring a cluster count. However, it requires careful tuning of the epsilon and min_samples hyperparameters [1]. HDBSCAN (Hierarchical DBSCAN) extends DBSCAN by building a cluster hierarchy and automatically selecting stable clusters using the excess-of-mass (EOM) method, handling variable-density data and identifying noise points interpreted as anomalies in the UEBA context without requiring a predefined cluster count [2][4].
-
Cluster Evaluation Metrics
Internal validation metrics are employed to assess clustering quality without requiring ground-truth labels, as summarized in Table VIII.
TABLE VIII. Evaluation Metrics
Metric
Description
Ideal Value
Silhouette Score
Measures similarity within clusters; ranges from -1 to 1
Higher is better
Calinski-Harabasz Index
Ratio of between-cluster to within-cluster variance
Higher is better
Davies-Bouldin Index
Measures average similarity between each cluster and its most similar cluster
Lower is better
The Silhouette Score quantifies cluster cohesion and separation by comparing, for each data point, the mean intra-cluster distance (a) against the mean nearest-cluster distance (b). The score equals (b – a) / max(a, b), with values closer to +1 indicating well-separated clusters [7]. The Davies-Bouldin Index measures average similarity between each cluster and its most similar cluster, with lower values indicating better-defined partitions [8]. The Calinski-Harabasz Index evaluates the ratio of between-cluster dispersion to within-cluster dispersion, with higher values representing well-defined, compact clusters [9].
-
Behavioral Analysis for UEBA
Following clustering, the resulting groups are analyzed to understand user behavior patterns. Clusters represent either normal browsing behavior or anomalous activity. Behavioral analysis focuses on continuously monitoring user activities and comparing them against established baseline patterns derived from clustering results. The
system examines factors such as session duration, page transition frequency, frequency of product interactions, and access timing to distinguish typical from atypical behaviors. This enables early detection of insider threats and compromised accounts while reducing false positives compared to traditional rule-based approaches.
-
-
SYSTEM DESIGN
-
System Workflow
The UEBA system processes data through a sequential pipeline of five stages. Stage 1 (Data Collection): Raw clickstream data is ingested from the E-shop Clothing Dataset, capturing page visits, click events, session duration, navigation paths, and interaction frequency. Stage 2 (Data Preprocessing): Raw data is cleaned by removing duplicates and missing values, categorical attributes are encoded using Label Encoding, and feature values are standardized using Z-score Normalization. Stage 3 (Feature Engineering): Meaningful behavioral patterns are extracted, including number of clicks per session, page visit frequency, time spent on pages, and navigation sequence patterns. Stage 4 (HDBSCAN Clustering): The HDBSCAN algorithm is applied to the normalized 14-dimensional feature matrix, discovering clusters of similar user activity, identifying noise points corresponding to anomalous sessions, and handling varying data densities without a predefined cluster count. Stage 5 (Output and Detection): The system produces behavioral cluster assignments, anomaly labels (Cluster -1), outlier scores, and membership probabilities for each session record.
-
System Architecture
The system architecture integrates four software modules. The UEBAPreprocessor module manages all data ingestion and transformation tasks. The HDBSCANClusterer module wraps the hdbscan library with project-specific functionality, providing fit() predict_anomalies(), get_cluster_summary(), and dimensionality reduction methods via PCA and t-SNE. The ClusteringEvaluator module computes all internal cluster validity metrics and serializes results to a JSON file for dashboard display. The Streamlit application (app_ueba.py) provides seven navigation pages: Home, Data Overview, Data Preprocessing, HDBSCAN Clustering, Anomaly Detection, Evaluation Metrics, and Interactive Analysis.
-
-
IMPLEMENTATION
-
Software Requirements
The system is implemented in Python 3.10.11, leveraging its extensive ecosystem of data science and machine learning libraries. Key software components include: Streamlit (open-source framework for building interactive web applications without requiring frontend development expertise); the hdbscan Python package (provides HDBSCAN as a scikit-learn compatible estimator returning cluster labels, membership probabilities, and outlier scores); Scikit-learn (provides StandardScaler for normalization, PCA and t-SNE for dimensionality reduction, and clustering evaluation metrics); Pandas and NumPy (handle the complete data pipeline from raw CSV
to processed arrays); and Plotly (provides interactive visualization capabilities for cluster scatter plots, anomaly score histograms, temporal trend charts, and country-level anomaly maps).
-
Hardware Requirements
The system requires a minimum of 8 GB RAM to comfortably process the 165,000-record dataset and execute HDBSCAN clustering, which is memory-intensive due to its construction of a condensed cluster hierarchy. A modern multi-core processor is recommended to accelerate t-SNE dimensionality reduction computations. Total storage requirements are minimal, with the complete project occupying less than 500 MB.
-
HDBSCAN Algorithm
HDBSCAN is the primary algorithm of the UEBA system. It extends DBSCAN by converting it into a hierarchical algorithm and extracting a flat clustering using cluster stability [2][4]. The algorithm operates in five steps: (1) compute core distances for all points based on the min_samples parameter; (2) build a minimum spanning tree of the mutual reachability graph; (3) construct the cluster hierarchy by iteratively removing edges from highest to lowest weight; (4) extract stable flat clusters using the excess of mass (EOM) method; and (5) label all points not belonging to stable clusters as noise interpreted as anomalies in the UEBA context. Key outputs include integer cluster labels (0, 1, 2, … or -1 for noise), membership probabilities (0 to 1), and outlier scores (higher values indicate more anomalous sessions) [4].
-
Dimensionality Reduction
Principal Component Analysis (PCA) is used for fast 2D dimensionality reduction of the 14-feature normalized space, projecting data onto the two principal components with highest variance. This preserves global structure while enabling scatter plot rendering of cluster assignments. t-SNE (t-Distributed Stochastic Neighbor Embedding) provides a complementary visualization emphasizing local cluster structure over global structure. While computationally more expensive than PCA, t-SNE produces more visually distinct cluster separations, assisting analysts in understanding behavioral group boundaries.
-
-
TESTING
A structured test case suite was designed to validate each stage of the data processing and clustering pipeline. Table
IX summarizes the seven test cases executed, their procedures, and outcomes.
TABLE IX. Test Cases
TC ID
Test Scenario
Test Steps
Expected Result
Actual Result
Stat us
TC_
Data
Load the
Dataset
All
Pass
01
Loading
E-shop
loads with
features
Test
CSV and
all 14
present;
verify
expected
165,473
column
features
records
structure
loaded
TC_
Feature
Split
Two new
model_gr
Pass
02
Engineer
page2
columns
oup
ing
into
created
(A/B/C/P
model_gr
with
) and
oup and
correct
model_id
model_id
values
(1-82)
extracted
TC_
Label
Apply
All
Encoding
Pass
03
Encoding
encoding
categorica
applied
to
l columns
correctly
categoric
converted
to 6
al
to integers
categoric
features
al
features
TC_
Z-score
Apply
Mean :: 0
Normaliz
Pass
04
Normaliz
Standard
and Std ::
ation
ation
Scaler to
1 for all
verified
all 14
features
with
features
describe(
)
statistics
TC_
HDBSC
Fit
Clusters
Multiple
Pass
05
AN
HDBSC
assigned
clusters
Clusterin
AN on
with -1 for
found
g
normalize
noise/ano
with
d 14-
malies
noise
feature
points
matrix
labeled
TC_
Anomaly
Filter
All noise
Anomaly
Pass
06
Detectio
rows
points
records
n
where
returned
correctly
cluster
as
identified
== -1
anomalies
and
exported
TC_
Evaluatio
Compute
Valid
All three
Pass
07
n Metrics
Silhouett
numeric
metrics
e,
scores
compute
Davies-
returned
d and
Bouldin,
for each
saved to
Calinski-
metric
JSON
Harabasz
All seven test cases passed successfully, confirming the correctness of data loading, feature engineering, encoding, normalization, clustering, anomaly identification, and metric computation modules. The dataset was confirmed to contain 165,473 records with all 14 expected features intact after preprocessing.
-
RESULTS AND ANALYSIS
The UEBA system was executed on the complete E-shop Clothing Clickstream Dataset of 165,473 records. HDBSCAN clustering identified four distinct behavioral user groups corresponding to different browsing and purchasing interaction patterns. Sessions not assignable to any stable cluster were labeled as Cluster -1, representing anomalous behavior. The system detected an anomaly rate of approximately 6.81% of total sessions, identifying these records as potential security concerns warranting further investigation.
The Streamlit application’s Clustering Results Summary dashboard displays the number of discovered clusters, total
record count, anomaly count, and anomaly rate as interactive metric cards. The Cluster Interpretation dashboard presents per-cluster behavioral profiles, including representative session statistics such as average session duration, mean click frequency, and dominant product categories, enabling security analysts to characterize normal user groups and distinguish them from flagged anomalous sessions.
The 2D Cluster Visualization page employs PCA to project the 14-dimensional normalized feature space onto two principal components (PC1 and PC2), rendering a scatter plot where each point is color-coded by cluster assignment. This visualization confirms that the HDBSCAN-discovered clusters exhibit meaningful separation in reduced feature space, with noise points (Cluster -1) dispersed at the boundaries of the identified behavioral groups. PCA-based projection also reduces noise and redundancy, enabling more efficient cluster boundary visualization. The t-SNE projection provides a complementary view emphasizing local cluster cohesion and revealing finer subgroup structure within behavioral groups.
Evaluation metrics computed by the ClusteringEvaluator module and serialized to the evaluation_metrics.json file confirm the internal validity of the discovered clusters. The Silhouette Score, Davies-Bouldin Index, and Calinski-Harabasz Index are displayed on the Evaluation Metrics dashboard, providing quantitative evidence of cluster quality. These metrics collectively confirm that the density-based HDBSCAN approach produces well-separated, compact behavioral groups compared to the baseline K-Means partitioning, which requires specifying the number of clusters in advance and performs poorly on non-spherical cluster geometries characteristic of real-world clickstream data.
Figure 1. Figure showing streamlit output
-
CONCLUSION
This paper presents a User and Entity Behavior Analytics (UEBA) system employing HDBSCAN clustering for unsupervised anomaly detection in cybersecurity contexts. The system processes over 165,000 e-shop clickstream records through a complete pipeline of preprocessing, feature engineering, clustering, and interactive visualization deployed as a Streamlit web application. The HDBSCAN algorithm proves highly effective for behavioral anomaly detection, automatically discovering behavioral clusters and natively identifying outlier sessions without requiring labeled training data. The assignment of outlier scores and membership probabilities provides nuanced anomaly characterization beyond binary classification, enabling
analysts to prioritize investigation of the most suspicious sessions.
The Streamlit application delivers an intuitive interface for security analysts, enabling cluster exploration, individual session inspection, temporal anomaly trend analysis, and export of suspicious records for downstream investigation. Evaluation using internal clustering metrics confirms the quality of discovered behavioral groups, demonstrating that density-based unsupervised learning is a practical and effective approach for real-world cybersecurity behavioral analytics. The proposed system is scalable, data-driven, and requires no labeled attack data, making it applicable across banking, e-commerce, healthcare, and enterprise security operations environments.
-
FUTURE SCOPE
Several directions are identified for extending the proposed UEBA system. Real-time streaming integration using Apache Kafka is planned for continuous behavioral monitoring and instant anomaly alerting. Incorporation of deep learning autoencoders alongside HDBSCAN is envisioned for hybrid anomaly detection with improved sensitivity on complex behavioral sequences. Extension to multi-source data fusion combining web logs, authentication events, and file access patterns would enable comprehensive insider threat detection across heterogeneous data streams. Temporal sequence modeling using LSTM networks would enable detection of anomalies that only manifest across multiple sequential sessions. Explainable AI (XAI) integration using SHAP values is planned to provide feature-level explanations for each detected anomaly, improving analyst trust and interpretability. Finally, deployment as a containerized microservice using Docker and Kubernetes would enable enterprise-scale security operations adoption.
ACKNOWLEDGMENT
The authors express sincere gratitude to Dr. K. Subbarao, Guide and Head, Department of CSE-Data Science, St. Ann’s College of Engineering & Technology, Chirala, for his invaluable guidance, timely support, and encouragement throughout this project. The authors also thank the Principal, Dr. K. Jagadeesh Babu, the Department faculty and non-teaching staff, and the Management of St. Ann’s College of Engineering & Technology for providing an excellent research environment and laboratory facilities.
REFERENCES
-
M. Ester, H. P. Kriegel, J. Sander, and X. Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise,” in Proc. 2nd Int. Conf. Knowledge Discovery Data Mining (KDD-96), 1996, pp. 226231.
-
R. J. G. B. Campello, D. Moulavi, and J. Sander, “Density-based clustering based on hierarchical density estimates,” in Proc. Pacific-Asia Conf. Knowledge Discovery Data Mining (PAKDD), Lecture Notes in Computer Science, vol. 7819, Springer, 2013, pp. 160172.
-
V. Chandola, A. Banerjee, and V. Kumar, “Anomaly detection: A survey,” ACM Comput. Surv., vol. 41, no. 3, pp. 158, 2009.
-
L. McInnes, J. Healy, and S. Astels, “HDBSCAN: Hierarchical density-based clustering,” J. Open Source Softw., vol. 2, no. 11, p. 205, 2017.
-
F. Pedregosa et al., “Scikit-learn: Machine learning in Python,&quo;
J. Mach. Learn. Res., vol. 12, pp. 28252830, 2011.
-
F. T. Liu, K. M. Ting, and Z. H. Zhou, “Isolation Forest,” in Proc. IEEE Int. Conf. Data Mining (ICDM), 2008, pp. 413422.
-
P. J. Rousseeuw, “Silhouettes: A graphical aid to the interpretation and validation of cluster analysis,” J. Comput. Appl. Math., vol. 20, pp. 5365, 1987.
-
D. L. Davies and D. W. Bouldin, “A cluster separation measure,” IEEE Trans. Pattern Anal. Mach. Intell., vol. PAMI-1, no. 2, pp. 224227, 1979.
-
T. Calinski and J. Harabasz, “A dendrite method for cluster analysis,” Commun. Statist., vol. 3, no. 1, pp. 127, 1974.
-
B. Scholkopf, J. C. Platt, J. Shawe-Taylor, A. J. Smola, and
R. C. Williamson, “Estimating the support of a high-dimensional distribution,” Neural Comput., vol. 13, no. 7, pp. 14431471, 2001.
-
I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Cambridge, MA: MIT Press, 2016.
-
MITRE ATT&CK Framework. [Online]. Available: https://attack.mitre.org/
-
UCI Machine Learning Repository, “Clickstream Data for Online Shopping,” [Online]. Available: https://archive.ics.uci.edu/ml/datasets/clickstream+data+for
+online+shopping
