Fake Job Posting Detection Using Machine Learning and Natural Language Processing

Yahya Shaikh; Shakila Siddavatam

doi:10.17577/IJERTCONV14IS020085

NCRTCS - 2026 (Volume 14 – Issue 02)

Fake Job Posting Detection Using Machine Learning and Natural Language Processing

DOI : 10.17577/IJERTCONV14IS020085

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 153
Authors : Yahya Shaikh, Shakila Siddavatam
Paper ID : IJERTCONV14IS020085
Volume & Issue : Volume 14, Issue 02, NCRTCS – 2026
Published (First Online) : 21-04-2026
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Fake Job Posting Detection Using Machine Learning and Natural Language Processing

Yahya Shaikh Department of Computer Science Abeda Inamdar Senior College

Pune, India

Shakila Siddavatam Department of Computer Science Abeda Inamdar Senior College Pune, India

Abstract – The rapid expansion of online recruitment platforms has significantly improved accessibility to employment opportunities; however, it has also led to a substantial rise in fake job postings. These fraudulent advertisements are designed to mislead job seekers, often resulting in financial loss, identity theft, and erosion of trust in digital hiring systems. Existing manual verification methods used by job portals are inefficient, time-consuming, and unable to scale with the growing volume and sophistication of online job scams. This research proposes an automated fake job posting detection system using Machine Learning (ML) and Natural Language Processing (NLP) techniques to effectively classify job advertisements as genuine or fraudulent. The system is trained using the publicly available Kaggle Fake Job Postings dataset. Data preprocessing techniques such as text normalization, tokenization, stopword removal, lemmatization, and TF-IDF vectorization are applied to extract meaningful features from job descriptions. Multiple classification algorithms including Logistic Regression, Random Forest, and XGBoost are implemented and evaluated using performance metrics such as accuracy, precision, recall, F1-score, and ROC-AUC. To address the limitations of black-box models, the proposed system incorporates SHAP (SHapley Additive Explanations) to provide transparent and interpretable predictions. The final model is deployed using a Flask-based backend integrated with a React frontend, enabling real-time job post analysis through a user-friendly web interface. The outcome of this work demonstrates that an explainable and scalable ML-based solution can significantly enhance online job security and assist job seekers in identifying fraudulent job advertisements effectively.

Keywords – Fake Job Posting Detection, Machine Learning, Natural Language Processing, Recruitment Fraud, Explainable AI, SHAP

INTRODUCTION

Online recruitment platforms have become a primary medium for job searching due to their accessibility, efficiency, and global reach. However, the rapid growth of these platforms has also led to a significant increase in fake job postings that exploit job seekers through deceptive advertisements offering unrealistic salaries, remote work opportunities, or well-known company names. Such fraudulent job postings are often used to collect sensitive personal information, demand illegal payments, or conduct identity theft, resulting in financial loss and erosion of trust in digital hiring systems [1], [2]. Existing methods used by job portals to detect fake job postings largely rely on manual moderation and rule-based filtering techniques. These approaches are inefficient, time-consuming, and unable to scale with the continuously increasing volume and evolving complexity of online recruitment fraud. Furthermore, many

automated detection systems lack transparency and operate as black-box models, limiting interpretability and user confidence in their predictions [3]. Therefore, there is a critical need for an automated, scalable, and explainable fake job posting detection system using Machine Learning and Natural Language Processing techniques to enhance online recruitment security and protect job seekers.
This research proposes an automated Fake Job Posting Detection System using Machine Learning and Natural Language Processing techniques. The system processes job advertisement data through text preprocessing, feature extraction using TF-IDF, and classification using algorithms such as Logistic Regression, Random Forest, and XGBoost. To address interpretability issues, SHAP-based explainable AI techniques are integrated to justify prediction outcomes. The solution is implemented using a Flask-based backend and a React-based frontend, enabling real-time job post analysis through a user-friendly interface. The proposed system offers a scalable, accurate, and transparent approach to identifying fraudulent job postings and can be integrated into existing recruitment platforms to enhance digital job security.
LITERATURE REVIEW

Online recruitment platforms have become an essential medium for connecting job seekers with employment opportunities due to their efficiency and global reach. However, the increasing reliance on these platforms has also led to a significant rise in fake job postings, creating serious concerns related to fraud, identity theft, and loss of trust in digital hiring systems. This literature review examines existing research on fake job posting detection and highlights the challenges addressed by prior studies. Early research in this domain primarily focused on identifying fraudulent job postings using traditional machine learning algorithms such as Logistic Regression, Naïve Bayes, and Support Vector Machines. These studies analyzed textual features, job metadata, and company-related attributes to classify postings as genuine or fake. Although these methods demonstrated moderate accuracy, their effectiveness was limited by feature sparsity, dataset imbalance, and the inability to capture complex linguistic patterns present in deceptive job advertisements [1]. Subsequent studies introduced ensemble learning techniques such as Random Forest and Gradient Boosting to improve classification performance. Research utilizing the Kaggle Fake Job Postings dataset emphasized the importance of preprocessing techniques including text normalization, stopword removal, lemmatization, and TF-IDF vectorization. These approaches significantly enhanced predictive accuracy but largely relied on black-box models, which reduced transparency and user trust in the systems decisions [2]. Recent advancements have shifted toward explainable artificial intelligence to address interpretability concerns. Studies integrating SHAP and LIME techniques provided insights into feature importance and model decision- making processes. While these methods improved transparency, their integration into real-time, user-oriented applications remains limited [3]. This research builds upon existing work by proposing an automated, scalable, and explainable fake job posting detection system. By combining machine learning, natural language processing, and explainable AI techniques within a deployable web-based framework, the proposed system addresses accuracy, interpretability, and practical usability, thereby contributing to safer online recruitment environments.

Research Gap

Although existing studies have applied machine learning and ensemble techniques to detect fake job postings, most

approaches primarily emphasize classification accuracy while giving limited attention to interpretability and practical deployment. Many proposed models function as black-box systems, thereby reducing transparency and user trust in prediction outcomes. Furthermore, prior research predominantly focuses on offline dataset evaluation rather than implementing scalable, real-time web-based detection frameworks.

Therefore, there is a need for an automated, scalable, and explainable fake job posting detection system that integrates machine learning, natural language processing, and interpretable AI techniques within a deployable architecture to enhance usability and trust in online recruitment platforms.

METHODOLOGY (DEVELOPMENT PROCESS)

Design of Research

To develop an effective and scalable solution for detecting fake job postings, this research adopts a design and development based methodology. The approach focuses on building, evaluating, and refining a machine learningdriven detection system using structured and unstructured job advertisement data. The research integrates theoretical concepts from existing fraud detection studies with practical implementation of Natural Language Processing and classification algorithms. A systematic process of data preprocessing, feature extraction, model training, evaluation, and deployment is followed to ensure accuracy and reliability. The methodology emphasizes explainability and real-time usability to address the limitations of conventional manual and rule-based detection systems used in online recruitment platforms.
Information Gathering
- Secondary Data
  
  Secondary data was collected through an extensive review of scholarly journals, conference papers, and online research articles related to fake job detection, recruitment fraud, and machine learning-based text classification. Publicly available datasets, particularly the Kaggle Fake Job Postings dataset, were analyzed to understand data attributes and fraud patterns.
- Technical Research
Technical research involved evaluating machine learning algorithms, text preprocessing techniques, feature extraction methods, and explainable AI frameworks. Best practices for model evaluation, deployment, and web-based integration were studied to ensure system scalability, transparency, and real-time performance in detecting fraudulent job advertisements
Architecture of the System

The proposed system follows a modular architecture designed for scalability and real-time operation. Data preprocessing and model training are performed using Python-based machine learning libraries. The trained model is deployed using a Flask backend that handles prediction requests and communication between system components. A React-based frontend provides a user-friendly interface for submitting job postings and viewing classification results. The architecture supports text preprocessing, feature extraction using TF-IDF, model inference, and explainability through SHAP visualizations. This layered architecture ensures efficient data flow, ease of maintenance, and seamless integration with online recruitment platforms for detecting fake job postings.

Figure 1 System Architecture

Technologies Used

The Fake Job Posting Detection system is developed using a modern and reliable technology stack to ensure accuracy, scalability, performance, and security. The selected technologies support efficient text processing, machine learning inference, and secure web-based deployment. Table 1 presents the key technologies utilized at various levels of the system architecture.

Table 1: Technology Stack for Fake Job Posting Detection System

Component	Technology Used
Frontend	React.js, HTML, CSS, JavaScript
Backend	Python, Flask Framework
Machine Learning	Scikit-learn
Natural Language Processing	NLTK, TF-IDF Vectorization
Database	PostgreSQL
Explainability	SHAP (SHapley Additive exPlanations)
Authentication	JSON Web Token (JWT)
API Communication	REST API
Web Scraping	Selenium, BeautifulSoup
Deployment	Flask API

User Interface (UI) & Screenshots

The Fake Job Posting Detection system is designed with an easy-to-operate and user-friendly interface to allow smooth interaction for users. The user interface is responsive in nature and can be accessed from both desktop and mobile devices. The primary design principles of the interface include simplicity, clarity, and effectiveness, which help users easily analyze job postings and understand prediction results.
1. User Interface Overview
  
  The system provides the following user interfaces:
  - Homepage:
    
    Provides an overview of the application, including system description, features, and options for user registration and login, as shown in Figure 2.
  - Sign-Up Page:
    
    Allows new users to register themselves in the system to access the fake job posting detection functionality. The Sign-Up Page are illustrated in Figures 3.
  - Login Page:
    
    A common login interface for registered users to securely access the application, as depicted in Figures 4.
  - Job Analysis Dashboard:
    
    Enables users to enter job posting details such as job title, job description, requirements, and company information for analysis. The Job Analysis Dashboard is shown in Figure 5.
  - Prediction Result Interface:
    
    Displays the result of the analysis by classifying the job posting as real or fake along with prediction confidence. The Prediction Result Interface is illustrated in Figure 6.
  - Explainability Interface:
  Presents SHAP-based explanations highlighting important words and features that influence the models prediction, as shown in Figure 7.
2. UI Screenshots

The following figures illustrate the key user interface screens of the Fake Job Posting Detection system

Figure No.	Description
Figure 2	Homepage displaying system overview, features, and register/login options
Figure 3	Sign up page for new user registration
Figure 4	Login page for registered users
Figure 5	Job analysis dashboard for entering job posting details
Figure 6	Prediction result page showing real or fake classification & Explainability screen displaying SHAP-based feature influence
Figure 7	User dashboard displaying analyzed job postings & Analysis history or result summary screen

Table 2: Description of System Interface Screens and Database Components

Figure 2

Figure 3

Figure 4

Figure 5

Figure 6

Figure 7

DISCUSSION
- Integration with Live Job Platforms:
  
  The system can be connected to APIs of platforms such as LinkedIn and Indeed for real-time monitoring of job postings.
- Multilingual Support:
  
  Extending the system to support multiple languages will improve global usability.
- Advanced Deep Learning Models:
  
  Future versions may incorporate transformer-based models such as BERT or RoBERTa for improved semantic understanding.
- Browser and Mobile Extensions:
Developing browser plugins or mobile applications will enable users to instantly verify job authenticity.
CONCLUSION

Online recruitment platforms have significantly simplified job searching; however, they have also created opportunities for fraudulent job postings that exploit job seekers. Manual moderation systems are inadequate to address the scale and evolving sophistication of job scams. To address this issue, this research proposed and implemented a Fake Job Posting Detection system using Machine Learning and Natural Language Processing techniques. The system analyzes job- related textual features and classifies postings as real or fake using a lightweight and interpretable Logistic Regression model. The integration of SHAP explainability ensures transparency in decision-making, enabling users to understand the factors influencing predictions. Although the system has certain limitations, such as dataset dependency and language constraints, it represents a reliable and scalable solution to a critical real-world problem. With further enhancements such as live platform integration, multilingual support, and advanced models, the proposed system has the potential to significantly improve online recruitment safety and user trust.
REFERENCES

Raj, M., et al., Fake Job Posting Detection Using Ensemble Machine Learning Models, International Journal of Data Science, 2023.
Pillai, S., Fake Job Identification Using Bidirectional LSTM and Word2Vec, Journal of Intelligent Computing, 2023.
Ullah, A., Jamjoom, A., Ensemble-Based Approaches for Job Scam Detection, IEEE Access, 2023.
Kaggle, Real or Fake Job Posting Prediction Dataset, https://www.kaggle.com/shivamb/realor-fake-fake-jobposting-prediction
Pedregosa, F., et al., Scikit-learn: Machine Learning in Python, Journal of Machine Learning Research, Vol. 12, pp. 28252830, 2011.
Lundberg, S. M., Lee, S.-I., A Unified Approach to Interpreting Model Predictions, Advances in Neural Information Processing Systems (NeurIPS), 2017.
Bird, S., Klein, E., Loper, E., Natural Language Processing with Python,

OReilly Media, 2009.

Fake Job Posting Detection Using Machine Learning and Natural Language Processing

Yahya Shaikh Department of Computer Science Abeda Inamdar Senior College

Shakila Siddavatam Department of Computer Science Abeda Inamdar Senior College Pune, India

INTRODUCTION

Problem Statement

Significance

Proposed Solution

LITERATURE REVIEW

Research Gap

METHODOLOGY (DEVELOPMENT PROCESS)

Design of Research

Information Gathering

Secondary Data

Technical Research

Architecture of the System

Figure 1 System Architecture

Technologies Used

Table 1: Technology Stack for Fake Job Posting Detection System

User Interface (UI) & Screenshots

User Interface Overview

Homepage:

Sign-Up Page:

Login Page:

Job Analysis Dashboard:

Prediction Result Interface:

Explainability Interface:

UI Screenshots

DISCUSSION

Strengths of the System

Automated Detection of Fraudulent Job Postings: The system effectively identifies fake job advertisements using machine learning and natural language processing techniques, reducing dependence on manual verification.

Improved User Safety:

Lightweight and Efficient Model:

Explainable Predictions:

Scalable Web-Based Solution:

Enhanced Trust in Online Recruitment Platforms: The system contributes to rebuilding user trust by providing reliable and interpretable job authenticity assessments.

Challenges and Limitations

Dataset Dependency:

Language Constraint: The current implementation supports only English-language job postings, limiting its applicability in multilingual environments.

Domain Generalization: Job postings from platforms or regions significantly different from the training data may affect prediction accuracy.

No Direct Integration with Live Job APIs:

Future Scope

Integration with Live Job Platforms:

Multilingual Support:

Advanced Deep Learning Models:

Browser and Mobile Extensions:

CONCLUSION

REFERENCES