🏆
Global Scientific Platform
Serving Researchers Since 2012

Fake Job Posting Detection Using Machine Learning and Natural Language Processing

DOI : 10.17577/IJERTCONV14IS020085
Download Full-Text PDF Cite this Publication

Text Only Version

Fake Job Posting Detection Using Machine Learning and Natural Language Processing

Yahya Shaikh Department of Computer Science Abeda Inamdar Senior College

Pune, India

Shakila Siddavatam Department of Computer Science Abeda Inamdar Senior College Pune, India

Abstract – The rapid expansion of online recruitment platforms has significantly improved accessibility to employment opportunities; however, it has also led to a substantial rise in fake job postings. These fraudulent advertisements are designed to mislead job seekers, often resulting in financial loss, identity theft, and erosion of trust in digital hiring systems. Existing manual verification methods used by job portals are inefficient, time-consuming, and unable to scale with the growing volume and sophistication of online job scams. This research proposes an automated fake job posting detection system using Machine Learning (ML) and Natural Language Processing (NLP) techniques to effectively classify job advertisements as genuine or fraudulent. The system is trained using the publicly available Kaggle Fake Job Postings dataset. Data preprocessing techniques such as text normalization, tokenization, stopword removal, lemmatization, and TF-IDF vectorization are applied to extract meaningful features from job descriptions. Multiple classification algorithms including Logistic Regression, Random Forest, and XGBoost are implemented and evaluated using performance metrics such as accuracy, precision, recall, F1-score, and ROC-AUC. To address the limitations of black-box models, the proposed system incorporates SHAP (SHapley Additive Explanations) to provide transparent and interpretable predictions. The final model is deployed using a Flask-based backend integrated with a React frontend, enabling real-time job post analysis through a user-friendly web interface. The outcome of this work demonstrates that an explainable and scalable ML-based solution can significantly enhance online job security and assist job seekers in identifying fraudulent job advertisements effectively.

Keywords – Fake Job Posting Detection, Machine Learning, Natural Language Processing, Recruitment Fraud, Explainable AI, SHAP

  1. INTRODUCTION

    Online recruitment platforms have become a primary medium for job searching due to their accessibility, efficiency, and global reach. However, the rapid growth of these platforms has also led to a significant increase in fake job postings that exploit job seekers through deceptive advertisements offering unrealistic salaries, remote work opportunities, or well-known company names. Such fraudulent job postings are often used to collect sensitive personal information, demand illegal payments, or conduct identity theft, resulting in financial loss and erosion of trust in digital hiring systems [1], [2]. Existing methods used by job portals to detect fake job postings largely rely on manual moderation and rule-based filtering techniques. These approaches are inefficient, time-consuming, and unable to scale with the continuously increasing volume and evolving complexity of online recruitment fraud. Furthermore, many

    automated detection systems lack transparency and operate as black-box models, limiting interpretability and user confidence in their predictions [3]. Therefore, there is a critical need for an automated, scalable, and explainable fake job posting detection system using Machine Learning and Natural Language Processing techniques to enhance online recruitment security and protect job seekers.

      1. Problem Statement

        The rapid expansion of online recruitment platforms has improved accessibility to employment opportunities but has also led to a substantial increase in fake job postings. These fraudulent advertisements are crafted to closely resemble legitimate job listings, making them difficult for job seekers to identify. Such postings are often used to collect sensitive personal information, demand illegal payments, or conduct identity theft, resulting in financial loss and reduced trust in digital recruitment systems [1], [2]. Existing verification mechanisms used by job portals rely largely on manual moderation and rule-based filtering, which are inadequate for handling the high volume and evolving complexity of online job fraud. These approaches lack scalability, adaptability, and transparency. Moreover, many automated detection systems operate as black-box models, limiting interpretability and user confidence [3]. This research addresses the need for an automated, scalable, and explainable fake job posting detection system using Machine Learning and Natural Language Processing techniques to enhance online recruitment security and protect job seekers.

      2. Significance

        The increasing occurrence of fake job postings on online recruitment platforms poses serious threats to job seekers, including financial fraud, identity theft, and misuse of personal information [1], [2]. Existing verification methods used by job portals are largely manual or rule-based, making them ineffective for large-scale and dynamic fraud detection. The absence of an intelligent, automated, and transparent detection mechanism reduces user trust in digital hiring systems [3]. The significance of this research lies in providing a scalable and accurate solution using Machine Learning and Natural Language Processing techniques. By incorporating explainable AI methods, the system enhances transparency and reliability of predictions. This research contributes to improving online recruitment security and supports the development of safer and more trustworthy digital employment platforms.

      3. Proposed Solution

    This research proposes an automated Fake Job Posting Detection System using Machine Learning and Natural Language Processing techniques. The system processes job advertisement data through text preprocessing, feature extraction using TF-IDF, and classification using algorithms such as Logistic Regression, Random Forest, and XGBoost. To address interpretability issues, SHAP-based explainable AI techniques are integrated to justify prediction outcomes. The solution is implemented using a Flask-based backend and a React-based frontend, enabling real-time job post analysis through a user-friendly interface. The proposed system offers a scalable, accurate, and transparent approach to identifying fraudulent job postings and can be integrated into existing recruitment platforms to enhance digital job security.

  2. LITERATURE REVIEW

    Online recruitment platforms have become an essential medium for connecting job seekers with employment opportunities due to their efficiency and global reach. However, the increasing reliance on these platforms has also led to a significant rise in fake job postings, creating serious concerns related to fraud, identity theft, and loss of trust in digital hiring systems. This literature review examines existing research on fake job posting detection and highlights the challenges addressed by prior studies. Early research in this domain primarily focused on identifying fraudulent job postings using traditional machine learning algorithms such as Logistic Regression, NaĂŻve Bayes, and Support Vector Machines. These studies analyzed textual features, job metadata, and company-related attributes to classify postings as genuine or fake. Although these methods demonstrated moderate accuracy, their effectiveness was limited by feature sparsity, dataset imbalance, and the inability to capture complex linguistic patterns present in deceptive job advertisements [1]. Subsequent studies introduced ensemble learning techniques such as Random Forest and Gradient Boosting to improve classification performance. Research utilizing the Kaggle Fake Job Postings dataset emphasized the importance of preprocessing techniques including text normalization, stopword removal, lemmatization, and TF-IDF vectorization. These approaches significantly enhanced predictive accuracy but largely relied on black-box models, which reduced transparency and user trust in the systems decisions [2]. Recent advancements have shifted toward explainable artificial intelligence to address interpretability concerns. Studies integrating SHAP and LIME techniques provided insights into feature importance and model decision- making processes. While these methods improved transparency, their integration into real-time, user-oriented applications remains limited [3]. This research builds upon existing work by proposing an automated, scalable, and explainable fake job posting detection system. By combining machine learning, natural language processing, and explainable AI techniques within a deployable web-based framework, the proposed system addresses accuracy, interpretability, and practical usability, thereby contributing to safer online recruitment environments.

    Research Gap

    Although existing studies have applied machine learning and ensemble techniques to detect fake job postings, most

    approaches primarily emphasize classification accuracy while giving limited attention to interpretability and practical deployment. Many proposed models function as black-box systems, thereby reducing transparency and user trust in prediction outcomes. Furthermore, prior research predominantly focuses on offline dataset evaluation rather than implementing scalable, real-time web-based detection frameworks.

    Therefore, there is a need for an automated, scalable, and explainable fake job posting detection system that integrates machine learning, natural language processing, and interpretable AI techniques within a deployable architecture to enhance usability and trust in online recruitment platforms.

  3. METHODOLOGY (DEVELOPMENT PROCESS)

      1. Design of Research

        To develop an effective and scalable solution for detecting fake job postings, this research adopts a design and development based methodology. The approach focuses on building, evaluating, and refining a machine learningdriven detection system using structured and unstructured job advertisement data. The research integrates theoretical concepts from existing fraud detection studies with practical implementation of Natural Language Processing and classification algorithms. A systematic process of data preprocessing, feature extraction, model training, evaluation, and deployment is followed to ensure accuracy and reliability. The methodology emphasizes explainability and real-time usability to address the limitations of conventional manual and rule-based detection systems used in online recruitment platforms.

      2. Information Gathering

        • Secondary Data

          Secondary data was collected through an extensive review of scholarly journals, conference papers, and online research articles related to fake job detection, recruitment fraud, and machine learning-based text classification. Publicly available datasets, particularly the Kaggle Fake Job Postings dataset, were analyzed to understand data attributes and fraud patterns.

        • Technical Research

        Technical research involved evaluating machine learning algorithms, text preprocessing techniques, feature extraction methods, and explainable AI frameworks. Best practices for model evaluation, deployment, and web-based integration were studied to ensure system scalability, transparency, and real-time performance in detecting fraudulent job advertisements

      3. Architecture of the System

    The proposed system follows a modular architecture designed for scalability and real-time operation. Data preprocessing and model training are performed using Python-based machine learning libraries. The trained model is deployed using a Flask backend that handles prediction requests and communication between system components. A React-based frontend provides a user-friendly interface for submitting job postings and viewing classification results. The architecture supports text preprocessing, feature extraction using TF-IDF, model inference, and explainability through SHAP visualizations. This layered architecture ensures efficient data flow, ease of maintenance, and seamless integration with online recruitment platforms for detecting fake job postings.

    Figure 1 System Architecture

      1. Technologies Used

        The Fake Job Posting Detection system is developed using a modern and reliable technology stack to ensure accuracy, scalability, performance, and security. The selected technologies support efficient text processing, machine learning inference, and secure web-based deployment. Table 1 presents the key technologies utilized at various levels of the system architecture.

        Table 1: Technology Stack for Fake Job Posting Detection System

        Component

        Technology Used

        Frontend

        React.js, HTML, CSS, JavaScript

        Backend

        Python, Flask Framework

        Machine Learning

        Scikit-learn

        Natural Language Processing

        NLTK, TF-IDF Vectorization

        Database

        PostgreSQL

        Explainability

        SHAP (SHapley Additive exPlanations)

        Authentication

        JSON Web Token (JWT)

        API Communication

        REST API

        Web Scraping

        Selenium, BeautifulSoup

        Deployment

        Flask API

      2. User Interface (UI) & Screenshots

        The Fake Job Posting Detection system is designed with an easy-to-operate and user-friendly interface to allow smooth interaction for users. The user interface is responsive in nature and can be accessed from both desktop and mobile devices. The primary design principles of the interface include simplicity, clarity, and effectiveness, which help users easily analyze job postings and understand prediction results.

        1. User Interface Overview

          The system provides the following user interfaces:

          • Homepage:

            Provides an overview of the application, including system description, features, and options for user registration and login, as shown in Figure 2.

          • Sign-Up Page:

            Allows new users to register themselves in the system to access the fake job posting detection functionality. The Sign-Up Page are illustrated in Figures 3.

          • Login Page:

            A common login interface for registered users to securely access the application, as depicted in Figures 4.

          • Job Analysis Dashboard:

            Enables users to enter job posting details such as job title, job description, requirements, and company information for analysis. The Job Analysis Dashboard is shown in Figure 5.

          • Prediction Result Interface:

            Displays the result of the analysis by classifying the job posting as real or fake along with prediction confidence. The Prediction Result Interface is illustrated in Figure 6.

          • Explainability Interface:

          Presents SHAP-based explanations highlighting important words and features that influence the models prediction, as shown in Figure 7.

        2. UI Screenshots

    The following figures illustrate the key user interface screens of the Fake Job Posting Detection system

    Figure No.

    Description

    Figure 2

    Homepage displaying system overview, features, and register/login options

    Figure 3

    Sign up page for new user registration

    Figure 4

    Login page for registered users

    Figure 5

    Job analysis dashboard for entering job posting details

    Figure 6

    Prediction result page showing real or fake classification & Explainability screen displaying SHAP-based feature influence

    Figure 7

    User dashboard displaying analyzed job postings & Analysis history or result summary screen

    Table 2: Description of System Interface Screens and Database Components

    Figure 2

    Figure 3

    Figure 4

    Figure 5

    Figure 6

    Figure 7

    1. DISCUSSION

        1. Strengths of the System

          • Automated Detection of Fraudulent Job Postings: The system effectively identifies fake job advertisements using machine learning and natural language processing techniques, reducing dependence on manual verification.

          • Improved User Safety:

            By flagging fraudulent job postings, the system helps protect job seekers from identity theft, financial scams, and misleading recruitment practices.

          • Lightweight and Efficient Model:

            The use of Logistic Regression ensures fast prediction with low computational overhead, making the system suitable for real-time deployment.

          • Explainable Predictions:

            The integration of SHAP provides transparent explanations for classification results, allowing users to understand why a job posting is labeled as real or fake.

          • Scalable Web-Based Solution:

            The Flask and React-based architecture supports easy scalability and integration with external platforms.

          • Enhanced Trust in Online Recruitment Platforms: The system contributes to rebuilding user trust by providing reliable and interpretable job authenticity assessments.

        2. Challenges and Limitations

          • Dataset Dependency:

            The model is trained on historical datasets, which may not capture newly emerging scam patterns without periodic retraining.

          • Language Constraint: The current implementation supports only English-language job postings, limiting its applicability in multilingual environments.

          • Domain Generalization: Job postings from platforms or regions significantly different from the training data may affect prediction accuracy.

          • No Direct Integration with Live Job APIs:

            The system currently relies on manual input rather than real-time data from job portals.

        3. Future Scope

      • Integration with Live Job Platforms:

        The system can be connected to APIs of platforms such as LinkedIn and Indeed for real-time monitoring of job postings.

      • Multilingual Support:

        Extending the system to support multiple languages will improve global usability.

      • Advanced Deep Learning Models:

        Future versions may incorporate transformer-based models such as BERT or RoBERTa for improved semantic understanding.

      • Browser and Mobile Extensions:

      Developing browser plugins or mobile applications will enable users to instantly verify job authenticity.

    2. CONCLUSION

      Online recruitment platforms have significantly simplified job searching; however, they have also created opportunities for fraudulent job postings that exploit job seekers. Manual moderation systems are inadequate to address the scale and evolving sophistication of job scams. To address this issue, this research proposed and implemented a Fake Job Posting Detection system using Machine Learning and Natural Language Processing techniques. The system analyzes job- related textual features and classifies postings as real or fake using a lightweight and interpretable Logistic Regression model. The integration of SHAP explainability ensures transparency in decision-making, enabling users to understand the factors influencing predictions. Although the system has certain limitations, such as dataset dependency and language constraints, it represents a reliable and scalable solution to a critical real-world problem. With further enhancements such as live platform integration, multilingual support, and advanced models, the proposed system has the potential to significantly improve online recruitment safety and user trust.

    3. REFERENCES

  1. Raj, M., et al., Fake Job Posting Detection Using Ensemble Machine Learning Models, International Journal of Data Science, 2023.

  2. Pillai, S., Fake Job Identification Using Bidirectional LSTM and Word2Vec, Journal of Intelligent Computing, 2023.

  3. Ullah, A., Jamjoom, A., Ensemble-Based Approaches for Job Scam Detection, IEEE Access, 2023.

  4. Kaggle, Real or Fake Job Posting Prediction Dataset, https://www.kaggle.com/shivamb/realor-fake-fake-jobposting-prediction

  5. Pedregosa, F., et al., Scikit-learn: Machine Learning in Python, Journal of Machine Learning Research, Vol. 12, pp. 28252830, 2011.

  6. Lundberg, S. M., Lee, S.-I., A Unified Approach to Interpreting Model Predictions, Advances in Neural Information Processing Systems (NeurIPS), 2017.

  7. Bird, S., Klein, E., Loper, E., Natural Language Processing with Python,

OReilly Media, 2009.