A Comparative Study of Artificial Intelligence Models for Predicting Campus Placement Outcomes in Higher Education Institutions

Sneha Nanaso Nandre; Prof. Amit Vilasrao Tale

doi:10.17577/IJERTCONV14IS020123

NCRTCS - 2026 (Volume 14 – Issue 02)

A Comparative Study of Artificial Intelligence Models for Predicting Campus Placement Outcomes in Higher Education Institutions

DOI : 10.17577/IJERTCONV14IS020123

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 24
Authors : Sneha Nanaso Nandre, Prof. Amit Vilasrao Tale
Paper ID : IJERTCONV14IS020123
Volume & Issue : Volume 14, Issue 02, NCRTCS – 2026
Published (First Online) : 21-04-2026
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

A Comparative Study of Artificial Intelligence Models for Predicting Campus Placement Outcomes in Higher Education Institutions

Sneha Nanaso Nandre

Class: FY MSc Computer Application Institute: MIT Arts Commerce Science College Department: Computer Application

Prof. Amit Vilasrao Tale

Institute: MIT Arts Commerce Science College Department: Computer Application

Abstract – Campus placement outcomes are widely recognized as an important measure of graduate employability and the overall effectiveness of higher education institutions. The ability to anticipate placement results at an early stage allows institutions to proactively identify students who may benefit from targeted academic

reinforcement or skill development initiatives. In this study, a comparative analysis of multiple artificial intelligence and machine learning techniques is conducted to predict campus placement status and to estimate the likelihood of individual student placement.

The proposed framework utilizes a set of academically and professionally relevant attributes, including cumulative grade point average, internship exposure, project involvement, technical proficiency, communication ability, and the presence of academic backlogs. These features are commonly regarded as influential factors in employability assessment. Several classification modelsnamely Logistic Regression, Decision Tree, Support Vector Machine, Random Forest, and Gradient Boostingare implemented and systematically evaluated. Model effectiveness is assessed using widely adopted performance metrics such as accuracy, precision, recall, and F1-score to ensure a balanced evaluation.

Experimental findings reveal that ensemble-based learning techniques, particularly Random Forest and Gradient Boosting, consistently outperform individual classifiers in terms of predictive accuracy and stability. In contrast, Logistic Regression demonstrates comparatively lower accuracy but offers transparent probability estimation and model interpretability. The comparative results provide practical insights into the trade-offs between predictive performance and explainability, supporting the development of reliable, data-driven campus placement prediction systems for higher education institutions.

Keywords: AI Models, Campus Placement Prediction, Machine Learning, Logistic Regression, Higher Education Analytics

INTRODUCTION

Campus placement outcomes are widely regarded as a key indicator of graduate employability and institutional performance in higher education systems. With the rapid

expansion of higher education and increasing competition in the employment market, institutions are under continuous pressure to improve placement success rates and align academic training with industry requirements. In this context, predictive analytics has gained significant attention as an effective means to identify students who are likely to succeed in recruitment processes, as well as those who may require early academic guidance or targeted skill- development interventions [1], [3].

Higher education institutions generate large volumes of structured student data, including academic performance records, internship experience, project involvement, professional certifications, and communication competencies. When systematically analyzed, these datasets provide valuable insights into the factors that influence employability. Machine learning and data mining techniques have demonstrated strong potential in extracting hidden patterns from such data and supporting evidence-based decision-making in placement planning and student training programs [2], [6], [9].

Previous research has explored the use of various classification algorithms for campus placement prediction, including Logistic Regression, Decision Trees, and Support Vector Machines [3], [4], [4]. More recent studies highlight the effectiveness of ensemble learning approaches, such as Random Forest and boosting techniques, which achieve higher predictive accuracy by modeling complex feature interactions and reducing overfitting [1], [8]. Despite their comparatively lower predictive performance, interpretable models like Logistic Regression remain important for understanding the contribution of individual academic and skill-based factors to placement decisions [4], [5].

Motivated by these observations, the present study conducts a comprehensive comparative analysis of traditional machine learning and ensemble-based models to identify the most effective and practically applicable approach for predicting campus placement outcomes in higher education institutions.
LITERATURE REVIEW

Initial research efforts in campus placement prediction predominantly employed statistical modeling techniques, particularly Logistic Regression, to estimate the probability of student placement based on academic indicators such as grade point average and backlog history. These approaches were valued for their mathematical simplicity and interpretability, enabling institutions to understand how individual variables influence placement outcomes. However, classical statistical models were constrained by strong linearity assumptions and limited capacity to represent complex interactions among multiple employability factors, which restricted their predictive effectiveness in real-world educational datasets [4],[5][4], [5][4],[5].

With the advancement of data mining and machine learning, researchers began adopting supervised classification algorithms such as Decision Trees, Support Vector Machines, and Naïve Bayes classifiers to enhance prediction accuracy. These models demonstrated improved performance by capturing nonlinear relationships between academic performance, internship exposure, technical skill development, and placement success. Studies applying such techniques reported higher accuracy than traditional statistical methods, particularly in datasets with heterogeneous feature distributions [2],[6],[4][2], [6], [4][2],[6],[4]. Nevertheless, single-model classifiers were often sensitive to noise, feature imbalance, and overfitting, limiting their generalizability across institutions.

Recent literature increasingly emphasizes ensemble learning techniques, including Random Forest and Gradient Boosting, as robust solutions for campus placement prediction. By aggregating multiple weak learners, ensemble models effectively reduce variance and improve predictive stability. Empirical studies consistently report that Random Forestbased approaches outperform individual classifiers in placement prediction tasks, particularly when handling high- dimensional and correlated features [3],[8][3], [8][3],[8].

Comparative studies further indicate that while ensemble methods achieve superior predictive accuracy, interpretable models such as Logistic Regression remain essential for probability estimation and decision support. However, many existing works rely on limited datasets, evaluate a narrow range of algorithms, or fail to analyze the trade-off between accuracy and interpretability. These limitations highlight the need for a comprehensive comparative evaluation of traditional and ensemble machine learning models under a unified experimental framework, which forms the primary motivation of the present study [1],[9][1], [9][1],[9].

Problem Statement

While machine learning has been increasingly explored for forecasting campus placement outcomes, its practical adoption within higher education institutions remains limited. A major challenge lies in the absence of a unified predictive framework that can reliably assess placement likelihood using a combination of academic achievement and employability-related skills. Much of the existing research concentrates n isolated algorithms or narrowly defined datasets, offering limited insight into how different models perform under comparable conditions. Additionally, many studies emphasize predictive accuracy without adequately addressing model transparency, which is essential for informed academic interventions. These shortcomings necessitate the development of a comprehensive and methodologically consistent framework that evaluates both traditional and ensemble machine learning approaches to identify a dependable and interpretable solution for campus placement prediction.

Research Objectives

This study seeks to establish a robust machine learning framework for analyzing and predicting campus placement outcomes based on student academic and skill-based attributes. The specific objectives are as follows:

To design a structured dataset that captures critical academic performance indicators and employability- related characteristics influencing placement outcomes.
To enhance dataset quality through systematic preprocessing and feature refinement techniques that support reliable model learning.
To develop and train a diverse set of machine learning models, including both classical classifiers and ensemble-based methods, for placement outcome prediction.
To assess and contrast model performance using multiple evaluation measures that collectively reflect prediction accuracy and consistency.

To determine the most suitable predictive approach by examining the balance between model effectiveness and interpretability for institutional decision-making.

Scope of the Study

The scope of this research is confined to the application of supervised machine learning techniques for predicting campus placement outcomes using structured student data. The analysis incorporates academic performance metrics, experiential learning indicators, and skill-related attributes that are directly associated with employability. A comparative evaluation of selected classification and ensemble models is conducted to determine their relative effectiveness under identical experimental settings.

The proposed framework is intended to support academic institutions in identifying students who may benefit from targeted guidance or training initiatives. However, the study does not consider external variables such as employer- specific hiring policies, institutional branding, or fluctuations in labor market demand.

Technical Skills Communication Skills

Backlogs Placement Status

Technical proficiency score Communication ability score

Number of academic backlogs Placement outcome (Placed / Not Placed)

Research Gap

Despite notable advancements in placement prediction research, several unresolved issues persist. Existing studies frequently lack methodological consistency in model comparison, often evaluating a limited number of algorithms

Data Preprocessing

To improve data consistency and model reliability, a comprehensive preprocessing pipeline was applied. Incomplete records were addressed using suitable imputation techniques, while duplicate entries were

or omitting ensemble techniques that have shown promise in

eliminated to prevent

bias. Categorical variables were

related domains. Furthermore, there is insufficient emphasis on integrating predictive outputs with interpretable insights that can guide academic planning and student mentoring.

encoded into numerical form to ensure compatibility with machine learning algorithms. Numerical attributes were normalized to maintain uniform feature scales, which is

Another gap lies in the limited exploration of placement

particularly important

for distance- and margin-based

probability estimation alongside binary classification, which restricts the usefulness of predictions for personalized

classifiers. Additionally, outlier analysis was conducted to minimize the influence of extreme values that could

intervention strategies.

Addressing these gaps requires a

adversely affect model learning.

holistic evaluation framework that combines predictive accuracy with interpretability across multiple machine
Feature Selection Strategy

learning paradigms. The present study responds to this need

Feature selection was

performed to identify the most

by offering a structured and comparative analysis that advances both methodological rigor and practical applicability.

METHODOLOGY

This research employs a structured machine learningdriven methodology to predict campus placement outcomes using student academic and employability-related attributes. The adopted framework ensures methodological consistency, reproducibility, and objective comparison across predictive models. The overall process follows a standard supervised learning pipeline widely accepted in educational data mining and predictive analytics.

3.1 Dataset Construction and Description

A structured dataset was prepared using student-level academic records and employability indicators that are commonly associated with placement outcomes. The selected attributes reflect both academic performance and skill-based competencies relevant to recruitment processes. Each instance in the dataset corresponds to an individual student, while each attribute represents a measurable employability-related factor. The dataset is organized in tabular format to support supervised classification tasks.

Table I Description of dataset attributes used for campus placement prediction.

Feature Name Description

CGPA Cumulative Grade Point Average

Internships Internship experience

Projects Number of academic projects

informative attributes influencing placement outcomes. Correlation analysis, supported by domain knowledge, was used to remove redundant and weakly contributing features. This step reduced dimensionality, enhanced computational efficiency, and lowered the risk of model overfitting. Academic performance indicators and skill-based attributes were retained due to their strong relevance to employability assessment.
The complete methodological workflow of the proposed campus placement prediction system is summarized as follows:

Figure 1 Overall workflow of the proposed campus placemen prediction methodology.

The workflow illustrates the sequential processing stages involved in transforming raw student data into meaningful placement predictions. The dataset attributes are summarized in Table I, while the comparative performance results of the implemented machine learning models are presented in Table II.

EXPERIMENTAL RESULTS AND DISCUSSION This section presents a comprehensive evaluation of the machine learning models developed for campus placement prediction. The dataset was divided into training and testing

subsets using an 80:20 split ratio to ensure unbiased assessment of generalization performance. Five supervised learning modelsLogistic Regression, Decision Tree, Support Vector Machine (SVM), Random Forest, and Gradient Boostingwere trained using identical data partitions and evaluated using accuracy, precision, recall, and F1-score.

Model-wise Behavioural Analysis
1. Logistic Regression
  
  Figure 2 illustrates the probability curve produced by the Logistic Regression model with respect to a standardized academic feature (e.g., CGPA). The sigmoid-shaped curve demonstrates how placement probability increases smoothly as the feature value improves. Student data points are overlaid to show actual placement outcomes. This visualization highlights the interpretability of Logistic Regression, as it provides explicit probability estimates that are useful for academic decision-support and early intervention planning. However, the linear decision boundary limits its ability to capture complex relationships among multiple employability factors.
  
  Figure 2 Logistic RegressionBased Placement Probability Curve Illustrating Model Interpretability
2. Decision Tree
  
  Figure 3 depicts the hierarchical decision regions learned by the Decision Tree classifier using two representative features. The axis-aligned splits illustrate how the model partitions the feature space into distinct decision regions. While Decision Trees are intuitive and easy to interpret, the sharp boundaries indicate sensitivity to data variations, which can lead to overfitting and reduced generalization performance.
  
  Figure 3 Decision Tree Classification Regions Showing Hierarchical Feature Splits
3. Support Vector Machine (SVM)
  
  Figure 4 presents the nonlinear decision boundary generated by the Support Vector Machine. The curved margin demonstrates the models capability to separate placed and non-placed students in a higher-dimensional feature space. Compared to linear models, SVM effectively captures nonlinear patterns, resulting in improved accuracy; however, its interpretability is lower, and performance is sensitive to kernel and hyperparameter selection.
  
  Figure 4 Support Vector Machine Decision Boundary Demonstrating Nonlinear Class Separation
4. Random Forest
  
  Figure 5 illustrates feature importance scores derived from the Random Forest model. The results indicate that academic performance indicators such as CGPA contribute more significantly to placement prediction than individual
  
  skill metrics, although both play an important role. By aggregating multiple decision trees, Random Forest reduces variance and improves robustness, leading to superior predictive performance compared to single-tree models.
  
  Figure 5 Random Forest Feature Importance Analysis for Campus Placement Prediction
5. Gradient Boosting
  
  Figure 6 shows the decision regions formed by the Gradient Boosting model. The visualization reflects the sequential error-correction mechanism of boosting, where successive learners focus on misclassified samples. This results in refined decision boundaries and improved classification accuracy, particularly in regions where other models struggle.
  
  Figure 6 Gradient Boosting Decision Regions Illustrating Sequential Error Correction

Quantitative Performance Comparison

The quantitative performance of all models is summarized in Table II.

Table II. Comparative Performance of Machine Learning Models

Model	Accuracy (%)	Precision	Recall	F1- score
Logistic Regression	82.4	0.81	0.80	0.80
Decision Tree	84.1	0.83	0.82	0.82
Support Vector Machine	85.6	0.85	0.84	0.84
Random Forest	89.2	0.88	0.88	0.88
Gradient Boosting	90.1	0.89	0.89	0.89

The results clearly indicate that ensemble-based methods outperform traditional classifiers across all evaluation metrics. Gradient Boosting achieved the highest accuracy (90.1%), followed by Random Forest (89.2%). Logistic Regression exhibited the lowest accuracy but maintained strong interpretability and stable probability estimation.

Accuracy Visualization and Discussion

Figure 4 presents a bar chart comparing the accuracy of all implemented models. The visualization clearly shows the

Overall, the experimental findings demonstrate that Gradient Boosting is the most effective model for campus placement prediction, while Logistic Regression remains valuable for explainability and probability-based decision-making. The results suggest that a combined analytical strategy leveraging ensemble models for prediction and interpretable models for insightoffers a practical and effective solution for real-world deployment in higher education institutions.

CONCLUSION

This study conducted a comparative evaluation of multiple machine learning and ensemble models for predicting campus placement outcomes using academic and employability-related student attributes. The experimental analysis demonstrated that ensemble-based approaches, particularly Gradient Boosting and Random Forest, consistently outperform traditional classifiers in terms of predictive accuracy and robustness. Among all evaluated models, Gradient Boosting achieved the highest overall performance, highlighting its effectiveness in capturing complex relationships among placement-related factors.

Although Logistic Regression exhibited lower accuracy, it provided transparent probability estimation and strong interpretability, making it valuable for academic decision- support and early intervention planning. Decision Tree and

performance gap between ensemble approaches and

standalone classifiers. The superior accuracy of Gradient Boosting and Random Forest can be attributed to their ability to reduce overfitting and effectively model complex interactions among academic and skill-based features.

Figure 4 Accuracy Comparison of Machine Learning Models for Campus Placement Prediction

Support Vector Machine models showed moderate performance, indicating their usefulness in handling nonlinear patterns but with limitations when compared to ensemble techniques.

Overall, the findings suggest that combining high-accuracy ensemble models with interpretable classifiers offers a balanced and practical framework for campus placement prediction. The proposed approach can support higher education institutions in identifying students requiring targeted academic or skill-based interventions, thereby improving employability outcomes. Future research may extend this work by incorporating larger datasets, additional behavioral features, and real-time predictive analytics.
REFERECES

V. S. Agrawal and S. S. Kadam, Predictive analysis of campus placement using machine learning algorithms, Journal of IoT and Machine Learning, 2024.
P. Manimaran, R. Kumar, and S. Devi, Predicting the eligibility of placement for students using data mining techniques, International Journal of Health Sciences, 2022.
V. N. Rao and P. Dhanalakshmi, Campus placement prediction using machine learning, International Journal of Intelligent Systems and Applications in Engineering, 2022.
C. K. Sekhar and K. S. Kumar, Undergraduate student campus placement determination using logistic regression, International Journal of Intelligent Systems and Applications in Engineering, 2022.
T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. New York, NY, USA: Springer, 2009.
I. H. Witten, E. Frank, and M. A. Hall, Data Mining: Practical Machine Learning Tools and Techniques, 3rd ed. Burlington, MA, USA: Morgan Kaufmann, 2011.

[4] S. B. Kotsiantis, Supervised machine learning: A review of classification techniques, Informatica, vol. 31, no. 3, pp. 249268, 2004.

L. Breiman, Random forests, Machine Learning, vol. 45, no. 1, pp. 532, 2001.
J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques, 3rd ed. Waltham, MA, USA: Morgan Kaufmann, 2012.

NCRTCS - 2026 (Volume 14 – Issue 02)

A Comparative Study of Artificial Intelligence Models for Predicting Campus Placement Outcomes in Higher Education Institutions

A Comparative Study of Artificial Intelligence Models for Predicting Campus Placement Outcomes in Higher Education Institutions

reinforcement or skill development initiatives. In this study, a comparative analysis of multiple artificial intelligence and machine learning techniques is conducted to predict campus placement status and to estimate the likelihood of individual student placement.

Keywords: AI Models, Campus Placement Prediction, Machine Learning, Logistic Regression, Higher Education Analytics

INTRODUCTION

LITERATURE REVIEW

Problem Statement

Research Objectives

Scope of the Study

Research Gap

Data Preprocessing

Feature Selection Strategy

METHODOLOGY

3.1 Dataset Construction and Description

Feature Name Description

Model Development and Training

Model Evaluation Metrics

Proposed Workflow of the Placement Prediction System

EXPERIMENTAL RESULTS AND DISCUSSION This section presents a comprehensive evaluation of the machine learning models developed for campus placement prediction. The dataset was divided into training and testing

Model-wise Behavioural Analysis

Logistic Regression

Decision Tree

Support Vector Machine (SVM)

Random Forest

Gradient Boosting

Quantitative Performance Comparison

Accuracy Visualization and Discussion

CONCLUSION

REFERECES