Trusted Academic Publisher
Serving Researchers Since 2012

A Comparative Study of Artificial Intelligence Models for Predicting Campus Placement Outcomes in Higher Education Institutions

DOI : 10.17577/IJERTCONV14IS020123
Download Full-Text PDF Cite this Publication

Text Only Version

A Comparative Study of Artificial Intelligence Models for Predicting Campus Placement Outcomes in Higher Education Institutions

Sneha Nanaso Nandre

Class: FY MSc Computer Application Institute: MIT Arts Commerce Science College Department: Computer Application

Prof. Amit Vilasrao Tale

Institute: MIT Arts Commerce Science College Department: Computer Application

Abstract – Campus placement outcomes are widely recognized as an important measure of graduate employability and the overall effectiveness of higher education institutions. The ability to anticipate placement results at an early stage allows institutions to proactively identify students who may benefit from targeted academic

reinforcement or skill development initiatives. In this study, a comparative analysis of multiple artificial intelligence and machine learning techniques is conducted to predict campus placement status and to estimate the likelihood of individual student placement.

The proposed framework utilizes a set of academically and professionally relevant attributes, including cumulative grade point average, internship exposure, project involvement, technical proficiency, communication ability, and the presence of academic backlogs. These features are commonly regarded as influential factors in employability assessment. Several classification modelsnamely Logistic Regression, Decision Tree, Support Vector Machine, Random Forest, and Gradient Boostingare implemented and systematically evaluated. Model effectiveness is assessed using widely adopted performance metrics such as accuracy, precision, recall, and F1-score to ensure a balanced evaluation.

Experimental findings reveal that ensemble-based learning techniques, particularly Random Forest and Gradient Boosting, consistently outperform individual classifiers in terms of predictive accuracy and stability. In contrast, Logistic Regression demonstrates comparatively lower accuracy but offers transparent probability estimation and model interpretability. The comparative results provide practical insights into the trade-offs between predictive performance and explainability, supporting the development of reliable, data-driven campus placement prediction systems for higher education institutions.

Keywords: AI Models, Campus Placement Prediction, Machine Learning, Logistic Regression, Higher Education Analytics

  1. INTRODUCTION

    Campus placement outcomes are widely regarded as a key indicator of graduate employability and institutional performance in higher education systems. With the rapid

    expansion of higher education and increasing competition in the employment market, institutions are under continuous pressure to improve placement success rates and align academic training with industry requirements. In this context, predictive analytics has gained significant attention as an effective means to identify students who are likely to succeed in recruitment processes, as well as those who may require early academic guidance or targeted skill- development interventions [1], [3].

    Higher education institutions generate large volumes of structured student data, including academic performance records, internship experience, project involvement, professional certifications, and communication competencies. When systematically analyzed, these datasets provide valuable insights into the factors that influence employability. Machine learning and data mining techniques have demonstrated strong potential in extracting hidden patterns from such data and supporting evidence-based decision-making in placement planning and student training programs [2], [6], [9].

    Previous research has explored the use of various classification algorithms for campus placement prediction, including Logistic Regression, Decision Trees, and Support Vector Machines [3], [4], [4]. More recent studies highlight the effectiveness of ensemble learning approaches, such as Random Forest and boosting techniques, which achieve higher predictive accuracy by modeling complex feature interactions and reducing overfitting [1], [8]. Despite their comparatively lower predictive performance, interpretable models like Logistic Regression remain important for understanding the contribution of individual academic and skill-based factors to placement decisions [4], [5].

    Motivated by these observations, the present study conducts a comprehensive comparative analysis of traditional machine learning and ensemble-based models to identify the most effective and practically applicable approach for predicting campus placement outcomes in higher education institutions.

  2. LITERATURE REVIEW

Initial research efforts in campus placement prediction predominantly employed statistical modeling techniques, particularly Logistic Regression, to estimate the probability of student placement based on academic indicators such as grade point average and backlog history. These approaches were valued for their mathematical simplicity and interpretability, enabling institutions to understand how individual variables influence placement outcomes. However, classical statistical models were constrained by strong linearity assumptions and limited capacity to represent complex interactions among multiple employability factors, which restricted their predictive effectiveness in real-world educational datasets [4],[5][4], [5][4],[5].

With the advancement of data mining and machine learning, researchers began adopting supervised classification algorithms such as Decision Trees, Support Vector Machines, and Naïve Bayes classifiers to enhance prediction accuracy. These models demonstrated improved performance by capturing nonlinear relationships between academic performance, internship exposure, technical skill development, and placement success. Studies applying such techniques reported higher accuracy than traditional statistical methods, particularly in datasets with heterogeneous feature distributions [2],[6],[4][2], [6], [4][2],[6],[4]. Nevertheless, single-model classifiers were often sensitive to noise, feature imbalance, and overfitting, limiting their generalizability across institutions.

Recent literature increasingly emphasizes ensemble learning techniques, including Random Forest and Gradient Boosting, as robust solutions for campus placement prediction. By aggregating multiple weak learners, ensemble models effectively reduce variance and improve predictive stability. Empirical studies consistently report that Random Forestbased approaches outperform individual classifiers in placement prediction tasks, particularly when handling high- dimensional and correlated features [3],[8][3], [8][3],[8].

Comparative studies further indicate that while ensemble methods achieve superior predictive accuracy, interpretable models such as Logistic Regression remain essential for probability estimation and decision support. However, many existing works rely on limited datasets, evaluate a narrow range of algorithms, or fail to analyze the trade-off between accuracy and interpretability. These limitations highlight the need for a comprehensive comparative evaluation of traditional and ensemble machine learning models under a unified experimental framework, which forms the primary motivation of the present study [1],[9][1], [9][1],[9].

Problem Statement

While machine learning has been increasingly explored for forecasting campus placement outcomes, its practical adoption within higher education institutions remains limited. A major challenge lies in the absence of a unified predictive framework that can reliably assess placement likelihood using a combination of academic achievement and employability-related skills. Much of the existing research concentrates n isolated algorithms or narrowly defined datasets, offering limited insight into how different models perform under comparable conditions. Additionally, many studies emphasize predictive accuracy without adequately addressing model transparency, which is essential for informed academic interventions. These shortcomings necessitate the development of a comprehensive and methodologically consistent framework that evaluates both traditional and ensemble machine learning approaches to identify a dependable and interpretable solution for campus placement prediction.

Research Objectives

This study seeks to establish a robust machine learning framework for analyzing and predicting campus placement outcomes based on student academic and skill-based attributes. The specific objectives are as follows:

  1. To design a structured dataset that captures critical academic performance indicators and employability- related characteristics influencing placement outcomes.

  2. To enhance dataset quality through systematic preprocessing and feature refinement techniques that support reliable model learning.

  3. To develop and train a diverse set of machine learning models, including both classical classifiers and ensemble-based methods, for placement outcome prediction.

  4. To assess and contrast model performance using multiple evaluation measures that collectively reflect prediction accuracy and consistency.

  5. To determine the most suitable predictive approach by examining the balance between model effectiveness and interpretability for institutional decision-making.

    Scope of the Study

    The scope of this research is confined to the application of supervised machine learning techniques for predicting campus placement outcomes using structured student data. The analysis incorporates academic performance metrics, experiential learning indicators, and skill-related attributes that are directly associated with employability. A comparative evaluation of selected classification and ensemble models is conducted to determine their relative effectiveness under identical experimental settings.

    The proposed framework is intended to support academic institutions in identifying students who may benefit from targeted guidance or training initiatives. However, the study does not consider external variables such as employer- specific hiring policies, institutional branding, or fluctuations in labor market demand.

    Technical Skills Communication Skills

    Backlogs Placement Status

    Technical proficiency score Communication ability score

    Number of academic backlogs Placement outcome (Placed / Not Placed)

    Research Gap

    Despite notable advancements in placement prediction research, several unresolved issues persist. Existing studies frequently lack methodological consistency in model comparison, often evaluating a limited number of algorithms

      1. Data Preprocessing

        To improve data consistency and model reliability, a comprehensive preprocessing pipeline was applied. Incomplete records were addressed using suitable imputation techniques, while duplicate entries were

        or omitting ensemble techniques that have shown promise in

        eliminated to prevent

        bias. Categorical variables were

        related domains. Furthermore, there is insufficient emphasis on integrating predictive outputs with interpretable insights that can guide academic planning and student mentoring.

        encoded into numerical form to ensure compatibility with machine learning algorithms. Numerical attributes were normalized to maintain uniform feature scales, which is

        Another gap lies in the limited exploration of placement

        particularly important

        for distance- and margin-based

        probability estimation alongside binary classification, which restricts the usefulness of predictions for personalized

        classifiers. Additionally, outlier analysis was conducted to minimize the influence of extreme values that could

        intervention strategies.

        Addressing these gaps requires a

        adversely affect model learning.

        holistic evaluation framework that combines predictive accuracy with interpretability across multiple machine

      2. Feature Selection Strategy

    learning paradigms. The present study responds to this need

    Feature selection was

    performed to identify the most

    by offering a structured and comparative analysis that advances both methodological rigor and practical applicability.

    1. METHODOLOGY

      This research employs a structured machine learningdriven methodology to predict campus placement outcomes using student academic and employability-related attributes. The adopted framework ensures methodological consistency, reproducibility, and objective comparison across predictive models. The overall process follows a standard supervised learning pipeline widely accepted in educational data mining and predictive analytics.

      3.1 Dataset Construction and Description

      A structured dataset was prepared using student-level academic records and employability indicators that are commonly associated with placement outcomes. The selected attributes reflect both academic performance and skill-based competencies relevant to recruitment processes. Each instance in the dataset corresponds to an individual student, while each attribute represents a measurable employability-related factor. The dataset is organized in tabular format to support supervised classification tasks.

      Table I Description of dataset attributes used for campus placement prediction.

      Feature Name Description

      CGPA Cumulative Grade Point Average

      Internships Internship experience

      Projects Number of academic projects

      informative attributes influencing placement outcomes. Correlation analysis, supported by domain knowledge, was used to remove redundant and weakly contributing features. This step reduced dimensionality, enhanced computational efficiency, and lowered the risk of model overfitting. Academic performance indicators and skill-based attributes were retained due to their strong relevance to employability assessment.

        1. Model Development and Training

          To enable a comprehensive comparative evaluation, five supervised machine learning models were implemented: Logistic Regression, Decision Tree, Support Vector Machine, Random Forest, and Gradient Boosting. Logistic Regression was selected for its interpretability and probability estimation capability, while Decision Tree and Support Vector Machine models were employed to capture nonlinear decision boundaries. Ensemble-based approaches, including Random Forest and Gradient Boosting, were used to improve predictive stability and generalization by aggregating multiple learners.

          The dataset was divided into training and testing subsets using an 80:20 split ratio. All models were trained and evaluated on identical data partitions to ensure fairness and consistency in performance comparison.

        2. Model Evaluation Metrics

          Model performance was evaluated using standard classification metrics, namely accuracy, precision, recall,

          and F1-score. These measures collectively assess overall prediction correctness, class-wise reliability, sensitivity to positive outcomes, and balanced performance. The evaluation framework facilitates identification of models that offer both high predictive accuracy and practical interpretability, which is essential for deployment in academic decision-support systems.

        3. Proposed Workflow of the Placement Prediction System

      The complete methodological workflow of the proposed campus placement prediction system is summarized as follows:

      Figure 1 Overall workflow of the proposed campus placemen prediction methodology.

      The workflow illustrates the sequential processing stages involved in transforming raw student data into meaningful placement predictions. The dataset attributes are summarized in Table I, while the comparative performance results of the implemented machine learning models are presented in Table II.

    2. EXPERIMENTAL RESULTS AND DISCUSSION This section presents a comprehensive evaluation of the machine learning models developed for campus placement prediction. The dataset was divided into training and testing

      subsets using an 80:20 split ratio to ensure unbiased assessment of generalization performance. Five supervised learning modelsLogistic Regression, Decision Tree, Support Vector Machine (SVM), Random Forest, and Gradient Boostingwere trained using identical data partitions and evaluated using accuracy, precision, recall, and F1-score.

        1. Model-wise Behavioural Analysis

          1. Logistic Regression

            Figure 2 illustrates the probability curve produced by the Logistic Regression model with respect to a standardized academic feature (e.g., CGPA). The sigmoid-shaped curve demonstrates how placement probability increases smoothly as the feature value improves. Student data points are overlaid to show actual placement outcomes. This visualization highlights the interpretability of Logistic Regression, as it provides explicit probability estimates that are useful for academic decision-support and early intervention planning. However, the linear decision boundary limits its ability to capture complex relationships among multiple employability factors.

            Figure 2 Logistic RegressionBased Placement Probability Curve Illustrating Model Interpretability

          2. Decision Tree

            Figure 3 depicts the hierarchical decision regions learned by the Decision Tree classifier using two representative features. The axis-aligned splits illustrate how the model partitions the feature space into distinct decision regions. While Decision Trees are intuitive and easy to interpret, the sharp boundaries indicate sensitivity to data variations, which can lead to overfitting and reduced generalization performance.

            Figure 3 Decision Tree Classification Regions Showing Hierarchical Feature Splits

          3. Support Vector Machine (SVM)

            Figure 4 presents the nonlinear decision boundary generated by the Support Vector Machine. The curved margin demonstrates the models capability to separate placed and non-placed students in a higher-dimensional feature space. Compared to linear models, SVM effectively captures nonlinear patterns, resulting in improved accuracy; however, its interpretability is lower, and performance is sensitive to kernel and hyperparameter selection.

            Figure 4 Support Vector Machine Decision Boundary Demonstrating Nonlinear Class Separation

          4. Random Forest

            Figure 5 illustrates feature importance scores derived from the Random Forest model. The results indicate that academic performance indicators such as CGPA contribute more significantly to placement prediction than individual

            skill metrics, although both play an important role. By aggregating multiple decision trees, Random Forest reduces variance and improves robustness, leading to superior predictive performance compared to single-tree models.

            Figure 5 Random Forest Feature Importance Analysis for Campus Placement Prediction

          5. Gradient Boosting

            Figure 6 shows the decision regions formed by the Gradient Boosting model. The visualization reflects the sequential error-correction mechanism of boosting, where successive learners focus on misclassified samples. This results in refined decision boundaries and improved classification accuracy, particularly in regions where other models struggle.

            Figure 6 Gradient Boosting Decision Regions Illustrating Sequential Error Correction

        2. Quantitative Performance Comparison

          The quantitative performance of all models is summarized in Table II.

          Table II. Comparative Performance of Machine Learning Models

          Model

          Accuracy

          (%)

          Precision

          Recall

          F1-

          score

          Logistic

          Regression

          82.4

          0.81

          0.80

          0.80

          Decision Tree

          84.1

          0.83

          0.82

          0.82

          Support

          Vector Machine

          85.6

          0.85

          0.84

          0.84

          Random

          Forest

          89.2

          0.88

          0.88

          0.88

          Gradient

          Boosting

          90.1

          0.89

          0.89

          0.89

          The results clearly indicate that ensemble-based methods outperform traditional classifiers across all evaluation metrics. Gradient Boosting achieved the highest accuracy (90.1%), followed by Random Forest (89.2%). Logistic Regression exhibited the lowest accuracy but maintained strong interpretability and stable probability estimation.

        3. Accuracy Visualization and Discussion

      Figure 4 presents a bar chart comparing the accuracy of all implemented models. The visualization clearly shows the

      Overall, the experimental findings demonstrate that Gradient Boosting is the most effective model for campus placement prediction, while Logistic Regression remains valuable for explainability and probability-based decision-making. The results suggest that a combined analytical strategy leveraging ensemble models for prediction and interpretable models for insightoffers a practical and effective solution for real-world deployment in higher education institutions.

    3. CONCLUSION

      This study conducted a comparative evaluation of multiple machine learning and ensemble models for predicting campus placement outcomes using academic and employability-related student attributes. The experimental analysis demonstrated that ensemble-based approaches, particularly Gradient Boosting and Random Forest, consistently outperform traditional classifiers in terms of predictive accuracy and robustness. Among all evaluated models, Gradient Boosting achieved the highest overall performance, highlighting its effectiveness in capturing complex relationships among placement-related factors.

      Although Logistic Regression exhibited lower accuracy, it provided transparent probability estimation and strong interpretability, making it valuable for academic decision- support and early intervention planning. Decision Tree and

      performance gap between ensemble approaches and

      standalone classifiers. The superior accuracy of Gradient Boosting and Random Forest can be attributed to their ability to reduce overfitting and effectively model complex interactions among academic and skill-based features.

      Figure 4 Accuracy Comparison of Machine Learning Models for Campus Placement Prediction

      Support Vector Machine models showed moderate performance, indicating their usefulness in handling nonlinear patterns but with limitations when compared to ensemble techniques.

      Overall, the findings suggest that combining high-accuracy ensemble models with interpretable classifiers offers a balanced and practical framework for campus placement prediction. The proposed approach can support higher education institutions in identifying students requiring targeted academic or skill-based interventions, thereby improving employability outcomes. Future research may extend this work by incorporating larger datasets, additional behavioral features, and real-time predictive analytics.

    4. REFERECES

  1. V. S. Agrawal and S. S. Kadam, Predictive analysis of campus placement using machine learning algorithms, Journal of IoT and Machine Learning, 2024.

  2. P. Manimaran, R. Kumar, and S. Devi, Predicting the eligibility of placement for students using data mining techniques, International Journal of Health Sciences, 2022.

  3. V. N. Rao and P. Dhanalakshmi, Campus placement prediction using machine learning, International Journal of Intelligent Systems and Applications in Engineering, 2022.

  4. C. K. Sekhar and K. S. Kumar, Undergraduate student campus placement determination using logistic regression, International Journal of Intelligent Systems and Applications in Engineering, 2022.

  5. T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. New York, NY, USA: Springer, 2009.

  6. I. H. Witten, E. Frank, and M. A. Hall, Data Mining: Practical Machine Learning Tools and Techniques, 3rd ed. Burlington, MA, USA: Morgan Kaufmann, 2011.

[4] S. B. Kotsiantis, Supervised machine learning: A review of classification techniques, Informatica, vol. 31, no. 3, pp. 249268, 2004.

  1. L. Breiman, Random forests, Machine Learning, vol. 45, no. 1, pp. 532, 2001.

  2. J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques, 3rd ed. Waltham, MA, USA: Morgan Kaufmann, 2012.