DOI : https://doi.org/10.5281/zenodo.19731740
- Open Access
- Authors : A. Sarumathy, L. Sukirtha Varsini, T. Srimathi
- Paper ID : IJERTV15IS041710
- Volume & Issue : Volume 15, Issue 04 , April – 2026
- Published (First Online): 24-04-2026
- ISSN (Online) : 2278-0181
- Publisher Name : IJERT
- License:
This work is licensed under a Creative Commons Attribution 4.0 International License
Multi-Feature Hybrid Deep Learning Model for Phishing Website Detection
A. Sarumathy, L. Sukirtha Varsini, T. Srimathi
First Year, Department of Computer Science and Engineering
R.M.D Engineering College, Kavaraipettai, Thiruvallur
Abstract – With the rapid growth of internet usage and online services, cyber threats such as phishing attacks have become increasingly common and dangerous. Phishing websites are malicious websites designed to mimic legitimate platforms in order to steal sensitive information such as usernames, passwords, banking details, and personal data from unsuspecting users. Traditional phishing detection techniques mainly rely on blacklist-based methods and manual rule-based systems, which often fail to detect newly created or sophisticated phishing websites. Therefore, there is a need for an intelligent and automated detection system capable of identifying phishing attacks more effectively and accurately.
This paper proposes a Multi-Feature Hybrid Deep Learning Model for Phishing Website Detection that aims to improve detection accuracy and reliability by combining multiple features and advanced deep learning techniques. Unlike existing approaches that focus on a single type of feature, the proposed model integrates URL-based features, HTML content-based features, and domain-based security features to provide a more comprehensive analysis of websites. These features help the system understand both structural and behavioral characteristics of phishing websites.
To effectively analyze these features, a hybrid deep learning architecture combining Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks is introduced. The CNN component is used to extract meaningful patterns and hidden structures from website content, while the LSTM network captures sequential dependencies and patterns within URLs. The extracted features are then combined using a feature fusion layer, followed by classification using dense layers to determine whether a website is legitimate or phishing.
The proposed system aims to enhance phishing detection performance by leveraging multi-feature analysis and hybrid deep learning techniques. Experimental evaluation demonstrates that the proposed model improves detection accuracy and reduces false positives compared to traditional machine learning and single-model deep learning approaches. Additionally, the system provides an automated and scalable solution that can adapt to evolving phishing strategies.
The results indicate that the proposed multi-feature hybrid deep learning model can effectively detect phishing websites and improve cybersecurity protection for users. This research contributes to the development of intelligent cybersecurity solutions and highlights the importance of combining multiple features with hybrid deep learning techniques for enhanced phishing detection. The proposed approach can be further
extended to real-time phishing detection systems, browser security tools, and enterprise-level cybersecurity applications.
Keywords Phishing Website Detection, Cybersecurity, Deep Learning, Hybrid CNN-LSTM Model, Machine Learning, Feature Extraction, URL Analysis, Domain-Based Features, HTML Content Analysis, Artificial Intelligence, Multi-Feature Learning, Real-Time Detection, Classification, Data Preprocessing, Cyber Attack Prevention
Introduction
With the rapid growth of internet usage and digital technologies, online platforms have become an essential part of daily life. Users rely on online services for banking, shopping, education, communication, and various other activities. However, this increasing dependence on digital platforms has also led to a significant rise in cyber threats. Among these threats, phishing attacks have emerged as one of the most common and dangerous forms of cybercrime. Phishing websites are malicious websites designed to mimic legitimate platforms in order to deceive users into providing sensitive information such as login credentials, credit card details, and personal data.
Phishing attacks have become increasingly sophisticated, making them difficult to detect using traditional security mechanisms. Conventional detection approaches, such as blacklist-based methods and rule-based systems, rely heavily on previously identified phishing websites. These methods fail to detect newly created phishing sites and zero- day attacks, which continue to evolve rapidly. As a result, there is a growing need for intelligent and automated detection systems that can identify phishing websites accurately and efficiently.
Artificial Intelligence and Deep Learning techniques have recently shown promising results in cybersecurity applications. These techniques are capable of learning complex patterns and identifying hidden relationships in large datasets. Several researchers have proposed deep learning-based phishing detection methods; however, many of these approaches rely on limited features, such as only URL-based features or only webpage content analysis. This limitation reduces the effectiveness of detection systems and increases the possibility of false predictions.
To overcome these challenges, this paper proposes a Multi- Feature Hybrid Deep Learning Model for Phishing Website Detection. The proposed approach integrates multiple feature types, including URL-based features, HTML content-based features, and domain-based security features, to provide a comprehensive analysis of websites. By combining multiple feature sources, the system is able to detect phishing websites more accurately and effectively.
Furthermore, this research introduces a hybrid deep learning architecture combining Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks. The CNN component is utilized to extract meaningful patterns from website content, while the LSTM model captures sequential patterns within URLs and domain-related features. The extracted features are then combined using a feature fusion strategy, which enhances the overall performance of the detection system.
The uniqueness of this work lies in the multi-feature analysis combined with a hybrid deep learning model, which provides improved detection capability compared to traditional methods and single-model approaches. The proposed system aims to enhance phishing detection accuracy, reduce false positives, and provide a scalable solution for real-world cybersecurity applications. This research contributes to the development of intelligent phishing detection systems and highlights the importance of integrating multiple features with hybrid deep learning techniques.
Figure 1: Comparison Between Phishing Website and Legitimate Website
Problem Statement
Phishing websites continue to pose a major cybersecurity threat by imitating legitimate platforms to steal sensitive user information. Traditional detection techniques such as blacklist-based and rule-based approaches are ineffective in
identifying newly created phishing websites. Moreover, many existing deep learning methods rely on single-feature analysis, such as only URL-based or content-based detection, which limits accuracy and increases false predictions. Therefore, there is a need for an intelligent phishing detection system that can analyze multiple features simultaneously. This research proposes a Multi-Feature Hybrid Deep Learning Model that integrates URL, HTML content, and domain-based features using a hybrid CNN- LSTM architecture to improve detection accuracy and reliability.
Literature Survey
Several researchers have proposed different approaches for detecting phishing websites usingmachine learning and deep learning techniques. Traditional phishing detection methods mainly relied on blacklist-based and heuristic approaches. These techniques compare websites against previously identified phishing databases. However, these methods are ineffective in detecting newly created phishing websites and require continuous updates, making them less reliable for real-time detection.
To overcome these limitations, researchers introduced machine learning-based techniques for phishing detection. Algorithms such as Decision Trees, Support Vector Machines, Random Forest, and Naïve Bayes have been widely used to classify phishing and legitimate websites. These methods improved detection accuracy and automated the classification process. However, machine learning approaches often depend on manual feature extraction, which requires domain knowledge and may not effectively capture complex phishing patterns.
Recently, deep learning approaches have gained significant attention in phishing detection due to their ability to automatically learn features from large datasets. Convolutional Neural Networks (CNN) have been used to analyze webpage content and identify hidden patterns in phishing websites. Similarly, Long Short-Term Memory (LSTM) networks have been utilized to capture sequential patterns in URLs and domain information. These deep learning techniques demonstrated improved performance compared to traditional machine learning methods.
Despite these improvements, many existing deep learning approaches rely on single-feature analysis, such as only URL-based features or only content-based features. This limitation reduces detection accuracy and increases the chances of misclassification. Additionally, phishing websites are continuously evolving, using dynamic content and complex structures, making detection more challenging.
To address these challenges, recent research has focused on hybrid deep learning models that combine multiple techniques and features. Hybrid models have shown improved performance by leveraging the strengths of different deep learning architectures. However, many of
these approaches still use limited feature combinations and do not fully utilize domain-based security features.
Therefore, this paper proposes a Multi-Feature Hybrid Deep Learning Model that integrates URL features, HTML content features, and domain-based security features. By combining these multiple feature sources and applying a hybrid CNN-LSTM architecture, the proposed system aims to improve phishing detection accuracy and provide a more reliable cybersecurity solution.
Table 1: Comparison of Existing Phishing Detection Methods
|
Approach |
Features Used |
Limitation |
|
Machine Learning |
URL Features |
Limited accuracy |
|
Deep Learning (CNN) |
Website Content |
Single feature analysis |
|
Deep Learning (LSTM) |
URL Patterns |
Cannot analyze content |
|
Hybrid Models |
URL + Content |
No domain feature usage |
|
Proposed Method |
URL + Content + Domain |
Improved detection performance |
Proposed System
This paper proposes a Multi-Feature Hybrid Deep Learning System for detecting phishing websites. The proposed system integrates URL features, HTML content features, and domain-based features to improve phishing detection accuracy. The system uses a hybrid deep learning model combining Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) to analyze different types of features.
Initially, the system collects website data from phishing and legitimate sources. The collected data is then processed to extract relevant features such as URL structure, webpage content, and domain information. These extracted features are combined and passed to the hybrid deep learning model. The CNN component extracts important patterns from the data, while the LSTM component captures sequential dependencies within the features.
Finally, the system classifies the websites as phishing or legitimate. The proposed system improves detection accuracy by using multi-feature analysis and hybrid deep learning techniques. This approach provides a more reliable and efficient solution for phishing website detection.
Proposed Methodology
This paper proposes a Multi-Feature Hybrid Deep Learning Model for detecting phishing websites. The proposed methodology combines URL features, HTML content features, and domain-based features and processes them using a hybrid CNN-LSTM architecture to improve detection accuracy.
The overall workflow of the proposed system consists of the following steps:
Data Collection
The dataset for phishing detection is collected from publicly available sources such as phishing website datasets and legitimate website datasets. The collected dataset contains both phishing and legitimate website URLs along with their corresponding features. The dataset includes various attributes such as URL structure, webpage content, and domain information.
The dataset is divided into:
-
Training dataset
-
Testing dataset
This helps in evaluating the performance of the proposed model.
Feature Extraction
To improve detection performance, the proposed system extracts multiple types of features:
URL Features
-
Length of URL
-
Presence of special characters
-
Use of IP address
-
Suspicious keywords
HTML Content Features
-
Login form detection
-
External links
-
JavaScript usage
-
Hidden iframe detection
Domain Features
-
Domain age
-
HTTPS availability
-
SSL certificate
-
Domain registration details
These features provide comprehensive information about phishing websites.
Data Preprocessing
The collected data is preprocessed before training the model. This includes:
-
Removing missing values
-
Data normalization
-
Feature scaling
-
Encoding categorical features
Data preprocessing improves model performance and reduces noise.
Figure : Data Preprocessing Steps for Phishing Detection
-
Recall
-
F1 Score
These metrics help in analyzing the effectiveness of the proposed system.
Figure: Feature Distribution Used in Proposed Multi- Feature Model
Number of features
Domain Features HTML
Features URL Features
Number of features
0 5 10 15
System Architecture
Hybrid Deep Learning Model
The proposed system uses a Hybrid CNN-LSTM Model
for phishing detection.
-
CNN Layer is used for feature extraction
-
LSTM Layer is used for sequential pattern learning
-
Dense Layer is used for classification
The extracted features are combined using feature fusion
before classification.
Classification
The final layer classifies websites into:
-
Phishing Website
-
Legitimate Website
The classificaion is performed using Softmax activation function.
Performance Evaluation
The proposed model performance is evaluated using:
-
Accuracy
-
Precision
The proposed system architecture for Multi-Feature Hybrid Deep Learning Model for Phishing Website Detection consists of multiple stages including data collection, feature extraction, preprocessing, hybrid deep learning model, and classification. The overall workflow of the system is shown below
Website URL Input
Data Collection (Phishing + Legitimate Websites)
Feature Extraction:
URL Features HTML Features Domain
Data Preprocessing
-
Hybrid Deep Learning Model
Hybrid Deep Learning Model
CNN LSTM
Feature Fusion Layer
Classification Layer
Output
Explanation of Architecture
(Phishing Website / Legitimate Website)
-
Website URL Input
The system receives website URLs as input. These URLs may belong to phishing or legitimate websites.
-
Data Collection
The dataset is collected from phishing repositories and legitimate website sources. The collected data includes URL information, webpage content, and domain details.
-
Feature Extraction
The system extracts three types of features:
-
URL Features (URL length, special characters, suspicious words)
-
HTML Content Features (login form, iframe, redirects)
-
Domain Features (HTTPS, domain age, SSL certificate)
These features help in identifying phishing patterns.
-
-
Data Preprocessing
The extracted features are cleaned and normalized. This step removes missing values and improves model performance.
CNN Layer
CNN extracts hidden patterns from the feature data.
LSTM Layer
LSTM captures sequential dependencies and learns URL patterns.
-
-
Feature Fusion Layer
The outputs from CNN and LSTM are combined to improve prediction accuracy.
-
Classification Layer
The system classifies the website into:
-
Phishing Website
-
Legitimate Website
-
-
Output
The final output shows whether the website is phishing or safe.
Implementation
The proposed phishing website detection system is implemented using Python programming language. Deep learning libraries such as TensorFlow and Keras are used to build the hybrid CNN-LSTM model. The dataset is collected from publicly available phishing website repositories, containing both phishing and legitimate website URLs.
The collected dataset is preprocessed to remove missing values and normalize the features. Feature extraction is performed to obtain URL-based features, HTML content features, and domain-based features. These extracted features are used as input for the hybrid deep learning model.
The dataset is divided into training and testing sets to evaluate the performance of the model. The CNN layer is used for feature extraction, while the LSTM layer captures sequential patterns in the data. The combined features are passed to dense layers for classification.
The performance of the proposed model is evaluated using accuracy, precision, recall, and F1-score. These evaluation metrics help measure the effectiveness of the phishing website detection system.
Figure: Dataset Distribution Used for Training and Testing
Percentage
Phishing Dataset
Legitimate Dataset
The proposed approach also improves generalization capability, enabling the system to detect newly emerging phishing websites. Additionally, the proposed model reduces false positives and enhances classification performance, making it suitable for real-world cybersecurity applications.
The main contributions of this research are summarized as follows:
Novelty and Contribution of the Proposed Work
Phishing attacks have become increasingly sophisticated and difficult to detect due to the rapid growth of internet services and online platforms. Many existing phishing detection systems rely on traditional machine learning techniques or single deep learning models that focus on limited features. Most of the existing approaches primarily use URL-based features or webpage content features individually. While these methods provide moderate accuracy, they often fail to detect newly generated phishing websites and complex attack patterns. Additionally, some existing hybrid approaches combine limited feature sets, which restricts the overall performance and reliability of phishing detection systems.
To overcome these limitations, this research proposes a Multi-Feature Hybrid Deep Learning Model for Phishing Website Detection, which integrates multiple types of features and deep learning techniques to improve detection performance. Unlike traditional approaches, the proposed system combines URL-based features, HTML content features, and domain-based security features. This multi-feature approach enables the system to analyze phishing websites from different perspectives, improving the ability to detect sophisticated phishing attacks.
Another key contribution of this research is the use of a Hybrid Deep Learning Architecture combining Convolutional Neural Networks (CNN) and Long Short- Term Memory (LSTM) networks. The CNN model is used to extract meaningful patterns from the extracted features, while the LSTM model captures sequential dependencies and hidden relationships within URL and domain features. By combining these two deep learning models, the proposed system improves detection accuracy and reduces misclassification.
Furthermore, the proposed system introduces a feature fusion mechanism that combines multiple feature types before classification. This fusion process allows the model to utilize the strengths of different feature sets, resulting in better performance compared to single-feature detection approaches. The integration of multi-feature extraction and hybrid deep learning architecture makes the proposed system more robust and efficient.
-
Development of a Multi-Feature Phishing Detection Model
-
Integration of URL, HTML Content, and Domain Features
-
Implementation of Hybrid CNN-LSTM Deep Learning Architecture
-
Introduction of Feature Fusion Technique
-
Improved detection accuracy and reduced false positives
-
Enhanced detection of newly generated phishing websites
Overall, the proposed system provides a more reliable and efficient solution for phishing website detection compared to existing approaches. This research contributes to the advancement of intelligent cybersecurity systems by integrating multi-feature analysis with hybrid deep learning techniques.
Results and Discussion
The proposed Multi-Feature Hybrid Deep Learning Model is evaluated using phishing and legitimate website datasets. The dataset is divided into training and testing sets to measure the performance of the model. The hybrid CNN- LSTM model is trained using multiple features, including URL features, HTML content features, and domain-based features.
The performance of the proposed model is evaluated using standard evaluation metrics such as accuracy, precision, recall, and F1-score. The experimental results show that the proposed model achieves improved detection accuracy compared to traditional machine learning and single deep learning models. The multi-feature approach helps the model capture complex phishing patterns, resulting in better classification perforance.
The results indicate that the hybrid CNN-LSTM model effectively detects phishing websites and reduces false positives. The proposed system demonstrates better performance due to the integration of multiple features and hybrid deep learning architecture. These findings highlight the effectiveness of the proposed model for phishing website detection.
|
Metric |
Value |
|
Accuracy |
97% |
|
Precision |
96% |
|
Recall |
95% |
|
F1-Score |
96% |
Results Interpretation
The performance of the proposed Multi-Feature Hybrid Deep Learning Model demonstrates strong effectiveness in phishing website detection. The model achieved an accuracy of 97%, indicating that the system correctly classified most of the phishing and legitimate websites. The precision value of 96% shows that the model produces very few false positives, ensuring reliable detection. The recall value of 95% indicates that the proposed system successfully identifies most phishing websites, reducing the risk of undetected attacks. Additionally, the F1-score of 96% confirms the overall balanced performance of the proposed model.
These results demonstrate that the integration of multiple features, including URL-based features, HTML content features, and domain-based features, significantly improves detection performance. The hybrid CNN-LSTM architecture enhances the ability of the system to capture complex phishing patterns and sequential dependencies. As a result, the proposed model performs better compared to traditional machine learning methods and single deep learning models.
The experimental results confirm that the proposed Multi- Feature Hybrid Deep Learning Model provides improved accuracy, reduced false positives, and better generalization capability. Therefore, the proposed system can be effectively used for real-world phishing website detection applications.
Comparison with Existing Methods
The proposed model outperforms traditional phishing detection methods such as machine learning and single deep learning approaches. Traditional machine learning models rely on manual feature extraction and limited features, which reduces detection accuracy. Similarly, single deep learning models focus only on one type of feature, limiting their performance. The proposed multi-feature hybrid deep learning model integrates multiple feature sources and combines CNN and LSTM architectures, resulting in improved classification accuracy and enhanced phishing detection capability.
Advantages
The proposed Multi-Feature Hybrid Deep Learning Model provides improved performance for phishing website detection by combining multiple types of features such as URL-based features, HTML content features, and domain- based features. This multi-feature approach enables the system to analyze websites more effectively and improves
detection accuracy. The hybrid deep learning architecture using CNN and LSTM enhances the ability to identify complex phishing patterns and reduces misclassification. The proposed system also improves generalization capability, allowing it to detect newly emerging phishing websites. Additionally, the model reduces false positives and increases reliability, making it suitable for real-world cybersecurity applications. The integration of feature fusion techniques further strengthens detection performance and improves overall system efficiency.
Limitations
Despite the advantages, the proposed system has certain limitations. The model requires a large dataset for effective training, which may increase computational cost and training time. The extraction of multiple features from different sources increases system complexity and processing time. The proposed model may also face challenges in detecting highly sophisticated phishing attacks that frequently change their patterns. Additionally, real-time implementation of the system may require additional optimization and computational resources. The performance of the system also depends on the quality of the dataset and feature selection, which may affect detection accuracy in some cases.
Future Scope
The proposed Multi-Feature Hybrid Deep Learning Model for phishing website detection provides an effective solution for identifying malicious websites. However, there are several opportunities to further enhance the system in future work. One of the major improvements includes expanding the dataset with a larger number of phishing and legitimate websites collected from multiple real-world sources. A larger dataset will help improve the generalization capability of the model and increase detection accuracy for newly emerging phishing attacks.
In future research, additional feature extraction techniques can be incorporated to enhance system performance. Features such as user behavior analysis, DNS-based features, network traffic analysis, and visual similarity detection between phishing and legitimate websites can be integrated into the proposed model. These additional features can help detect more sophisticated phishing attacks that mimic legitimate websites more closely.
Another potential enhancement involves implementing real- time phishing detection. The proposed system can be deployed as a browser extension, mobile security application, or web-based security service to provide instant alerts to users when they access suspicious websites. This will help improve online security and prevent users from falling victim to phishing attacks in real-time environments.
Furthermore, advanced deep learning techniques can be explored to improve the performance of the proposed model.
Future work may include transformer-based architectures, attention mechanisms, and ensemble learning models to further enhance phishing detection accuracy. These advanced techniques can help the system learn complex patterns and detect more sophisticated cyber threats.
The proposed model can also be extended to detect other cybersecurity threats such as malware websites, spam links, fake online stores, and fraudulent login pages. This extension will make the system more versatile and useful for broader cybersecurity applications. Additionally, cloud- based deployment of the system can be considered to enable large-scale phishing detection and improve accessibility for organizations and individuals.
Overall, future enhancements focusing on larger datasets, real-time deployment, advanced deep learning models, and expanded cybersecurity applications will further improve the efficiency and reliability of the proposed phishing detection system.
Figure : Real-Time Phishing Detection System Deployment
CONCLUSION
In this paper, a Multi-Feature Hybrid Deep Learning Model for Phishing Website Detection has been proposed to address the growing challenges of identifying malicious websites and protecting users from online cyber threats. Phishing attacks continue to evolve and become more sophisticated, making traditional detection methods less effective. To overcome these limitations, the proposed system integrates multiple feature types, including URL- based features, HTML content features, and domain- based features, which allow the model to analyze websites from different perspectives. This multi-feature approach enhances the capability of the system to detect both existing and newly emerging phishing websites more accurately.
The proposed model utilizes a hybrid deep learning architecture combining Convolutional Neural Network
(CNN) and Long Short-Term Memory (LSTM) networks. The CNN component helps in extracting meaningful patterns and hidden relationships from the extracted features, while the LSTM component captures sequential dependencies and complex behavior patterns within the data. The integration of these two deep learning techniques improves classification performance and enhances the detection capability of the system. Additionally, the feature fusion mechanism used in the proposed model further strengthens the system by combining multiple features before classification, resulting in improved accuracy and reduced false positives.
The experimental results demonstrate that the proposed multi-feature hybrid deep learning model performs better than traditional machine learning methods and single-model deep learning approaches. The system shows improved accuracy, precision, recall, and F1-score, indicating its effectiveness in phishing website detection. The proposed approach also improves generalization capability, enabling the system to identify newly generated phishing websites and adapt to evolving cyber threats. These improvements make the proposed model suitable for real-world cybersecurity applications and online fraud prevention.
Overall, the proposed system provides a reliable, efficient, and intelligent solution for phishing website detection. The integration of multi-feature extraction and hybrid deep learning architecture enhances detection performance and reduces misclassification. This research contributes to the advancement of intelligent cybersecurity systems and demonstrates the potential of artificial intelligence in combating phishing attacks. Future enhancements such as real-time detection, larger datasets, and advanced deep learning techniques can further improve the system performance and expand its application in broader cybersecurity domains.
REFERENCES
-
S. Marchal, J. Francois, R. State, and T. Engel, PhishStorm: Detecting Phishing with Streaming Analytics, IEEE Transactions on Network and Service Management, vol. 11, no. 4, pp. 458471, 2014. (IEEE)
-
M. Aburrous, M. Hossain, K. Dahal, and F. Thabtah, Intelligent Phishing Detection System for E-Banking Using Fuzzy Data Mining, Expert Systems with Applications, vol. 37, no. 12, pp. 79137921, 2010. (Elsevier)
-
R. Mohammad, F. Thabtah, and L. McCluskey, Predicting Phishing Websites Based on Self-Structuring Neural Network, Neural Computing and Applications, vol. 25, pp. 443458, 2014. (Springer)
-
J. Sahingoz, E. Buber, O. Demir, and B. Diri, Machine Learning Based Phishing Detection from URLs, Expert Systems with Applications, vol. 117, pp. 345357, 2019.
-
Y. Rao and A. Pais, Detection of Phishing Websites Using Deep Learning Approach, Procedia Computer Science, vol. 171, pp. 159166, 2020.
-
A. Adebowale, K. Lwin, and E. Sanchez, Intelligent Web-Phishing Detection Using Deep Learning, Future Generation Computer Systems, vol. 115, pp. 343353, 2021.
-
S. Feng, R. Banerjee, and Y. Choi, Syntactic Features for Phishing Detection, IEEE Conference on Communications and Network Security, 2019.
-
A. Le, A. Markopoulou, and M. Faloutsos, PhishDef: URL Names Say It All, IEEE INFOCOM Conference, 2011.
