DOI : 10.17577/IJERTCONV14IS020123- Open Access

- Authors : Sneha Nanaso Nandre, Prof. Amit Vilasrao Tale
- Paper ID : IJERTCONV14IS020123
- Volume & Issue : Volume 14, Issue 02, NCRTCS – 2026
- Published (First Online) : 21-04-2026
- ISSN (Online) : 2278-0181
- Publisher Name : IJERT
- License:
This work is licensed under a Creative Commons Attribution 4.0 International License
A Comparative Study of Artificial Intelligence Models for Predicting Campus Placement Outcomes in Higher Education Institutions
Sneha Nanaso Nandre
Class: FY MSc Computer Application Institute: MIT Arts Commerce Science College Department: Computer Application
Prof. Amit Vilasrao Tale
Institute: MIT Arts Commerce Science College Department: Computer Application
Abstract – Campus placement outcomes are widely recognized as an important measure of graduate employability and the overall effectiveness of higher education institutions. The ability to anticipate placement results at an early stage allows institutions to proactively identify students who may benefit from targeted academic
reinforcement or skill development initiatives. In this study, a comparative analysis of multiple artificial intelligence and machine learning techniques is conducted to predict campus placement status and to estimate the likelihood of individual student placement.
The proposed framework utilizes a set of academically and professionally relevant attributes, including cumulative grade point average, internship exposure, project involvement, technical proficiency, communication ability, and the presence of academic backlogs. These features are commonly regarded as influential factors in employability assessment. Several classification modelsnamely Logistic Regression, Decision Tree, Support Vector Machine, Random Forest, and Gradient Boostingare implemented and systematically evaluated. Model effectiveness is assessed using widely adopted performance metrics such as accuracy, precision, recall, and F1-score to ensure a balanced evaluation.
Experimental findings reveal that ensemble-based learning techniques, particularly Random Forest and Gradient Boosting, consistently outperform individual classifiers in terms of predictive accuracy and stability. In contrast, Logistic Regression demonstrates comparatively lower accuracy but offers transparent probability estimation and model interpretability. The comparative results provide practical insights into the trade-offs between predictive performance and explainability, supporting the development of reliable, data-driven campus placement prediction systems for higher education institutions.
Keywords: AI Models, Campus Placement Prediction, Machine Learning, Logistic Regression, Higher Education Analytics
-
INTRODUCTION
Campus placement outcomes are widely regarded as a key indicator of graduate employability and institutional performance in higher education systems. With the rapid
expansion of higher education and increasing competition in the employment market, institutions are under continuous pressure to improve placement success rates and align academic training with industry requirements. In this context, predictive analytics has gained significant attention as an effective means to identify students who are likely to succeed in recruitment processes, as well as those who may require early academic guidance or targeted skill- development interventions [1], [3].
Higher education institutions generate large volumes of structured student data, including academic performance records, internship experience, project involvement, professional certifications, and communication competencies. When systematically analyzed, these datasets provide valuable insights into the factors that influence employability. Machine learning and data mining techniques have demonstrated strong potential in extracting hidden patterns from such data and supporting evidence-based decision-making in placement planning and student training programs [2], [6], [9].
Previous research has explored the use of various classification algorithms for campus placement prediction, including Logistic Regression, Decision Trees, and Support Vector Machines [3], [4], [4]. More recent studies highlight the effectiveness of ensemble learning approaches, such as Random Forest and boosting techniques, which achieve higher predictive accuracy by modeling complex feature interactions and reducing overfitting [1], [8]. Despite their comparatively lower predictive performance, interpretable models like Logistic Regression remain important for understanding the contribution of individual academic and skill-based factors to placement decisions [4], [5].
Motivated by these observations, the present study conducts a comprehensive comparative analysis of traditional machine learning and ensemble-based models to identify the most effective and practically applicable approach for predicting campus placement outcomes in higher education institutions.
-
LITERATURE REVIEW
Initial research efforts in campus placement prediction predominantly employed statistical modeling techniques, particularly Logistic Regression, to estimate the probability of student placement based on academic indicators such as grade point average and backlog history. These approaches were valued for their mathematical simplicity and interpretability, enabling institutions to understand how individual variables influence placement outcomes. However, classical statistical models were constrained by strong linearity assumptions and limited capacity to represent complex interactions among multiple employability factors, which restricted their predictive effectiveness in real-world educational datasets [4],[5][4], [5][4],[5].
With the advancement of data mining and machine learning, researchers began adopting supervised classification algorithms such as Decision Trees, Support Vector Machines, and Naïve Bayes classifiers to enhance prediction accuracy. These models demonstrated improved performance by capturing nonlinear relationships between academic performance, internship exposure, technical skill development, and placement success. Studies applying such techniques reported higher accuracy than traditional statistical methods, particularly in datasets with heterogeneous feature distributions [2],[6],[4][2], [6], [4][2],[6],[4]. Nevertheless, single-model classifiers were often sensitive to noise, feature imbalance, and overfitting, limiting their generalizability across institutions.
Recent literature increasingly emphasizes ensemble learning techniques, including Random Forest and Gradient Boosting, as robust solutions for campus placement prediction. By aggregating multiple weak learners, ensemble models effectively reduce variance and improve predictive stability. Empirical studies consistently report that Random Forestbased approaches outperform individual classifiers in placement prediction tasks, particularly when handling high- dimensional and correlated features [3],[8][3], [8][3],[8].
Comparative studies further indicate that while ensemble methods achieve superior predictive accuracy, interpretable models such as Logistic Regression remain essential for probability estimation and decision support. However, many existing works rely on limited datasets, evaluate a narrow range of algorithms, or fail to analyze the trade-off between accuracy and interpretability. These limitations highlight the need for a comprehensive comparative evaluation of traditional and ensemble machine learning models under a unified experimental framework, which forms the primary motivation of the present study [1],[9][1], [9][1],[9].
Problem Statement
While machine learning has been increasingly explored for forecasting campus placement outcomes, its practical adoption within higher education institutions remains limited. A major challenge lies in the absence of a unified predictive framework that can reliably assess placement likelihood using a combination of academic achievement and employability-related skills. Much of the existing research concentrates n isolated algorithms or narrowly defined datasets, offering limited insight into how different models perform under comparable conditions. Additionally, many studies emphasize predictive accuracy without adequately addressing model transparency, which is essential for informed academic interventions. These shortcomings necessitate the development of a comprehensive and methodologically consistent framework that evaluates both traditional and ensemble machine learning approaches to identify a dependable and interpretable solution for campus placement prediction.
Research Objectives
This study seeks to establish a robust machine learning framework for analyzing and predicting campus placement outcomes based on student academic and skill-based attributes. The specific objectives are as follows:
-
To design a structured dataset that captures critical academic performance indicators and employability- related characteristics influencing placement outcomes.
-
To enhance dataset quality through systematic preprocessing and feature refinement techniques that support reliable model learning.
-
To develop and train a diverse set of machine learning models, including both classical classifiers and ensemble-based methods, for placement outcome prediction.
-
To assess and contrast model performance using multiple evaluation measures that collectively reflect prediction accuracy and consistency.
-
To determine the most suitable predictive approach by examining the balance between model effectiveness and interpretability for institutional decision-making.
Scope of the Study
The scope of this research is confined to the application of supervised machine learning techniques for predicting campus placement outcomes using structured student data. The analysis incorporates academic performance metrics, experiential learning indicators, and skill-related attributes that are directly associated with employability. A comparative evaluation of selected classification and ensemble models is conducted to determine their relative effectiveness under identical experimental settings.
The proposed framework is intended to support academic institutions in identifying students who may benefit from targeted guidance or training initiatives. However, the study does not consider external variables such as employer- specific hiring policies, institutional branding, or fluctuations in labor market demand.
Technical Skills Communication Skills
Backlogs Placement Status
Technical proficiency score Communication ability score
Number of academic backlogs Placement outcome (Placed / Not Placed)
Research Gap
Despite notable advancements in placement prediction research, several unresolved issues persist. Existing studies frequently lack methodological consistency in model comparison, often evaluating a limited number of algorithms
-
Data Preprocessing
To improve data consistency and model reliability, a comprehensive preprocessing pipeline was applied. Incomplete records were addressed using suitable imputation techniques, while duplicate entries were
or omitting ensemble techniques that have shown promise in
eliminated to prevent
bias. Categorical variables were
related domains. Furthermore, there is insufficient emphasis on integrating predictive outputs with interpretable insights that can guide academic planning and student mentoring.
encoded into numerical form to ensure compatibility with machine learning algorithms. Numerical attributes were normalized to maintain uniform feature scales, which is
Another gap lies in the limited exploration of placement
particularly important
for distance- and margin-based
probability estimation alongside binary classification, which restricts the usefulness of predictions for personalized
classifiers. Additionally, outlier analysis was conducted to minimize the influence of extreme values that could
intervention strategies.
Addressing these gaps requires a
adversely affect model learning.
holistic evaluation framework that combines predictive accuracy with interpretability across multiple machine
-
Feature Selection Strategy
learning paradigms. The present study responds to this need
Feature selection was
performed to identify the most
by offering a structured and comparative analysis that advances both methodological rigor and practical applicability.
-
METHODOLOGY
This research employs a structured machine learningdriven methodology to predict campus placement outcomes using student academic and employability-related attributes. The adopted framework ensures methodological consistency, reproducibility, and objective comparison across predictive models. The overall process follows a standard supervised learning pipeline widely accepted in educational data mining and predictive analytics.
3.1 Dataset Construction and Description
A structured dataset was prepared using student-level academic records and employability indicators that are commonly associated with placement outcomes. The selected attributes reflect both academic performance and skill-based competencies relevant to recruitment processes. Each instance in the dataset corresponds to an individual student, while each attribute represents a measurable employability-related factor. The dataset is organized in tabular format to support supervised classification tasks.
Table I Description of dataset attributes used for campus placement prediction.
Feature Name Description
CGPA Cumulative Grade Point Average
Internships Internship experience
Projects Number of academic projects
informative attributes influencing placement outcomes. Correlation analysis, supported by domain knowledge, was used to remove redundant and weakly contributing features. This step reduced dimensionality, enhanced computational efficiency, and lowered the risk of model overfitting. Academic performance indicators and skill-based attributes were retained due to their strong relevance to employability assessment.
-
Model Development and Training
To enable a comprehensive comparative evaluation, five supervised machine learning models were implemented: Logistic Regression, Decision Tree, Support Vector Machine, Random Forest, and Gradient Boosting. Logistic Regression was selected for its interpretability and probability estimation capability, while Decision Tree and Support Vector Machine models were employed to capture nonlinear decision boundaries. Ensemble-based approaches, including Random Forest and Gradient Boosting, were used to improve predictive stability and generalization by aggregating multiple learners.
The dataset was divided into training and testing subsets using an 80:20 split ratio. All models were trained and evaluated on identical data partitions to ensure fairness and consistency in performance comparison.
-
Model Evaluation Metrics
Model performance was evaluated using standard classification metrics, namely accuracy, precision, recall,
and F1-score. These measures collectively assess overall prediction correctness, class-wise reliability, sensitivity to positive outcomes, and balanced performance. The evaluation framework facilitates identification of models that offer both high predictive accuracy and practical interpretability, which is essential for deployment in academic decision-support systems.
-
Proposed Workflow of the Placement Prediction System
The complete methodological workflow of the proposed campus placement prediction system is summarized as follows:
Figure 1 Overall workflow of the proposed campus placemen prediction methodology.
The workflow illustrates the sequential processing stages involved in transforming raw student data into meaningful placement predictions. The dataset attributes are summarized in Table I, while the comparative performance results of the implemented machine learning models are presented in Table II.
-
-
EXPERIMENTAL RESULTS AND DISCUSSION This section presents a comprehensive evaluation of the machine learning models developed for campus placement prediction. The dataset was divided into training and testing
subsets using an 80:20 split ratio to ensure unbiased assessment of generalization performance. Five supervised learning modelsLogistic Regression, Decision Tree, Support Vector Machine (SVM), Random Forest, and Gradient Boostingwere trained using identical data partitions and evaluated using accuracy, precision, recall, and F1-score.
-
Model-wise Behavioural Analysis
-
Logistic Regression
Figure 2 illustrates the probability curve produced by the Logistic Regression model with respect to a standardized academic feature (e.g., CGPA). The sigmoid-shaped curve demonstrates how placement probability increases smoothly as the feature value improves. Student data points are overlaid to show actual placement outcomes. This visualization highlights the interpretability of Logistic Regression, as it provides explicit probability estimates that are useful for academic decision-support and early intervention planning. However, the linear decision boundary limits its ability to capture complex relationships among multiple employability factors.
Figure 2 Logistic RegressionBased Placement Probability Curve Illustrating Model Interpretability
-
Decision Tree
Figure 3 depicts the hierarchical decision regions learned by the Decision Tree classifier using two representative features. The axis-aligned splits illustrate how the model partitions the feature space into distinct decision regions. While Decision Trees are intuitive and easy to interpret, the sharp boundaries indicate sensitivity to data variations, which can lead to overfitting and reduced generalization performance.
Figure 3 Decision Tree Classification Regions Showing Hierarchical Feature Splits
-
Support Vector Machine (SVM)
Figure 4 presents the nonlinear decision boundary generated by the Support Vector Machine. The curved margin demonstrates the models capability to separate placed and non-placed students in a higher-dimensional feature space. Compared to linear models, SVM effectively captures nonlinear patterns, resulting in improved accuracy; however, its interpretability is lower, and performance is sensitive to kernel and hyperparameter selection.
Figure 4 Support Vector Machine Decision Boundary Demonstrating Nonlinear Class Separation
-
Random Forest
Figure 5 illustrates feature importance scores derived from the Random Forest model. The results indicate that academic performance indicators such as CGPA contribute more significantly to placement prediction than individual
skill metrics, although both play an important role. By aggregating multiple decision trees, Random Forest reduces variance and improves robustness, leading to superior predictive performance compared to single-tree models.
Figure 5 Random Forest Feature Importance Analysis for Campus Placement Prediction
-
Gradient Boosting
Figure 6 shows the decision regions formed by the Gradient Boosting model. The visualization reflects the sequential error-correction mechanism of boosting, where successive learners focus on misclassified samples. This results in refined decision boundaries and improved classification accuracy, particularly in regions where other models struggle.
Figure 6 Gradient Boosting Decision Regions Illustrating Sequential Error Correction
-
-
Quantitative Performance Comparison
The quantitative performance of all models is summarized in Table II.
Table II. Comparative Performance of Machine Learning Models
Model
Accuracy
(%)
Precision
Recall
F1-
score
Logistic
Regression
82.4
0.81
0.80
0.80
Decision Tree
84.1
0.83
0.82
0.82
Support
Vector Machine
85.6
0.85
0.84
0.84
Random
Forest
89.2
0.88
0.88
0.88
Gradient
Boosting
90.1
0.89
0.89
0.89
The results clearly indicate that ensemble-based methods outperform traditional classifiers across all evaluation metrics. Gradient Boosting achieved the highest accuracy (90.1%), followed by Random Forest (89.2%). Logistic Regression exhibited the lowest accuracy but maintained strong interpretability and stable probability estimation.
-
Accuracy Visualization and Discussion
Figure 4 presents a bar chart comparing the accuracy of all implemented models. The visualization clearly shows the
Overall, the experimental findings demonstrate that Gradient Boosting is the most effective model for campus placement prediction, while Logistic Regression remains valuable for explainability and probability-based decision-making. The results suggest that a combined analytical strategy leveraging ensemble models for prediction and interpretable models for insightoffers a practical and effective solution for real-world deployment in higher education institutions.
-
-
CONCLUSION
This study conducted a comparative evaluation of multiple machine learning and ensemble models for predicting campus placement outcomes using academic and employability-related student attributes. The experimental analysis demonstrated that ensemble-based approaches, particularly Gradient Boosting and Random Forest, consistently outperform traditional classifiers in terms of predictive accuracy and robustness. Among all evaluated models, Gradient Boosting achieved the highest overall performance, highlighting its effectiveness in capturing complex relationships among placement-related factors.
Although Logistic Regression exhibited lower accuracy, it provided transparent probability estimation and strong interpretability, making it valuable for academic decision- support and early intervention planning. Decision Tree and
performance gap between ensemble approaches and
standalone classifiers. The superior accuracy of Gradient Boosting and Random Forest can be attributed to their ability to reduce overfitting and effectively model complex interactions among academic and skill-based features.
Figure 4 Accuracy Comparison of Machine Learning Models for Campus Placement Prediction
Support Vector Machine models showed moderate performance, indicating their usefulness in handling nonlinear patterns but with limitations when compared to ensemble techniques.
Overall, the findings suggest that combining high-accuracy ensemble models with interpretable classifiers offers a balanced and practical framework for campus placement prediction. The proposed approach can support higher education institutions in identifying students requiring targeted academic or skill-based interventions, thereby improving employability outcomes. Future research may extend this work by incorporating larger datasets, additional behavioral features, and real-time predictive analytics.
-
REFERECES
-
-
V. S. Agrawal and S. S. Kadam, Predictive analysis of campus placement using machine learning algorithms, Journal of IoT and Machine Learning, 2024.
-
P. Manimaran, R. Kumar, and S. Devi, Predicting the eligibility of placement for students using data mining techniques, International Journal of Health Sciences, 2022.
-
V. N. Rao and P. Dhanalakshmi, Campus placement prediction using machine learning, International Journal of Intelligent Systems and Applications in Engineering, 2022.
-
C. K. Sekhar and K. S. Kumar, Undergraduate student campus placement determination using logistic regression, International Journal of Intelligent Systems and Applications in Engineering, 2022.
-
T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. New York, NY, USA: Springer, 2009.
-
I. H. Witten, E. Frank, and M. A. Hall, Data Mining: Practical Machine Learning Tools and Techniques, 3rd ed. Burlington, MA, USA: Morgan Kaufmann, 2011.
-
L. Breiman, Random forests, Machine Learning, vol. 45, no. 1, pp. 532, 2001.
-
J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques, 3rd ed. Waltham, MA, USA: Morgan Kaufmann, 2012.
