International Academic Platform
Serving Researchers Since 2012

Lab Evaluation And Assessment Platform

DOI : https://doi.org/10.5281/zenodo.19973739
Download Full-Text PDF Cite this Publication

Text Only Version

Lab Evaluation And Assessment Platform

Sadhiya Sakeer

Dept of Computer Science and Engineering KMEA Engineering College (KTU) Aluva, Ernakulam

Suphin Hassainar

Dept of Computer Science and Engineering KMEA Engineering College (KTU) Aluva, Ernakulam

Mohammed Ansil

Dept of Computer Science and Engineering KMEA Engineering College (KTU) Aluva, Ernakulam

Akif Anvar

Dept of Computer Science and Engineering KMEA Engineering College (KTU) Aluva, Ernakulam

Sajeena M. K.

Dept of Computer Science and Engineering KMEA Engineering College (KTU) Aluva, Ernakulam

Abstract – In college programming labs, students often copy code from peers or on- line sources, which prevents genuine learning and makes it difcult for teachers to assess individual understanding. Manual evaluation of student outputs further in- creases teachers workload and slows down feedback. To address these challenges, this project proposes a smart lab evaluation and assessment platform that provides a controlled coding environment where students log in using their college credentials and can access only the experiment assigned for the day. Copy paste functional- ity is disabled to encourage original problem solving, supported by a built in hint system that guides students without revealing full solutions. An integrated AI mod- ule evaluates the submitted code by checking its logic and output accuracy, and once validated, the task is marked as complete with results instantly displayed on a connected teacher mobile app. This ensures real time monitoring, faster decision making, and reduced manual correction efforts. By combining restricted access, AI based evaluation, and instant teacher updates, the system enhances the authenticity of student submissions, streamlines daily assessments, and demonstrates how tech- nology can improve learning outcomes while reducing the burden on educators in lab based environments.

Index TermsLab evaluation and assessment platform (LEAP), articial intelligence (AI).

  1. Introduction

    Programming laboratories play a critical role in cultivat-ing students computational thinking, coding prociency, and problem-solving abilities. However, a signicant challenge increasingly observed in technical institutions is students growing dependency on peers, online resources, or pre existing solutions to complete laboratory tasks. Such reliance restricts students engagement in independent problem solving and impedes the development of essential programming skills. Prior research on automated assessment systems similarly highlights concerns about academic integrity and the lack of authentic learning in open programming environments [1].

    Frequently, student submissions exhibit structural similari-ties, differing only in minor aspects such as variable names or formatting. This practice complicates the manual detection of plagiarism and contributes to inconsistencies in the evaluation process. Studies reveal that instructors often invest consid-erable time reviewing repetitive code submissions, resulting in delayed feedback and increased cognitive workload [2]. Timely and meaningful feedback is crucial for reinforcing programming concepts, yet it becomes challenging to provide in such environments.

    Traditional laboratory setups commonly permit unrestricted internet access, copy-paste operations, and peer sharing of les. These unregulated conditions foster opportunities for academic malpractice and limit students opportunities to gain hands on coding experience. Research in AI enabled educa-tional systems and learning analytics emphasizes that, without structured monitoring, students tend to take shortcuts instead of engaging in genuine reasoning and algorithmic thinking [3]. This deciency in foundational practice negatively impacts stu-dents performance in programming examinations, technical interviews, and real world software development.

    Various solutions have been proposed to mitigate these challenges, including automated assessment platforms [1], educational code review tools [2], AI-driven skill development interventions [3], and visualization based learning environ-ments [4]. While each of these approaches offers valuable functionalities, none fully integrate controlled access, pla-giarism prevention, iterative feedback, and comprehensive performance monitoring within a unied intelligent platform specically tailored for programming laboratories in higher education.

    The need for such an integrated system is underscored by the widening gap between the expected programming com-

    petence and the actual skill levels demonstrated by students in academic laboratories. Despite the availability of advanced learning resources and automated tools, many students fail to develop the independence necessary to write original code, relying excessively on external assistance and unregulated lab practices. This issue not only undermines academic in-tegrity but also weakens students conceptual understanding of programming principles, which are critical for success in advanced coursework and professional settings.

    From an instructional standpoint, the volume of submissions and prevalence of near identical code place a signicant burden on educators, complicating consistent evaluation and timely feedback. Existing assessment tools predominantly emphasize post-submission grading and lack capabilities to actively guide students during the coding process or regulate lab activities in real time. Consequently, current methods are insufcient in fostering authentic learning experiences and sustained skill development.

    Moreover, the increasing focus on outcome-based education and accreditation standards calls for transparent, fair, and mea-surable evaluation mechanisms in programming laboratories. There is a pressing demand for an intelligent platform capable of assessing not only nal code outputs but also monitoring the coding process, discouraging malpractice, and facilitating iterative learning through structured feedback. These factors collectively motivate the development of a controlled and intelligent lab evaluation platform designed to enhance both the quality of education and accountability in programming courses.

    This research proposes such a platform, integrating con-trolled coding environments, plagiarism detection, AI-driven iterative feedback, and real time instructor monitoring to create an educational ecosystem that supports genuine student en-gagement and effective teaching. By addressing the limitations of traditional laboratory practices, the proposed system aims to bridge the gap between educational objectives and student outcomes, fostering stronger programming competencies and academic integrity.

    1. MOTIVATION

      The motivation for this research arises from the widening gap between expected programming competence and the actual skill level demonstrated by students in academic laboratory environments. Despite the availability of advanced learning resources and automated tools, a signicant number of students fail to develop independent coding abilities due to excessive reliance on external assistance and unregulated lab practices. This not only compromises academic integrity but also weak-ens students foundational understanding of programming con-cepts, which are critical for advanced coursework and industry readiness.

      From an instructional perspective, the increasing volume of submissions and the prevalence of near-identical code signif-icantly burden instructors, making consistent evaluation and timely feedback difcult to achieve. Existing tools primaily focus on post-submission assessment and lack mechanisms

      to guide students during the coding process or regulate lab activities in real time. As a result, current approaches fall short in promoting authentic learning and sustained skill development.

      Furthermore, the growing emphasis on outcome-based ed-ucation and accreditation standards demands transparent, fair, and measurable evaluation mechanisms within programming laboratories. There is a clear need for an intelligent platform that not only evaluates nal outputs but also monitors the coding process, discourages malpractice, and supports iterative learning through structured feedback. These considerations strongly motivate the development of a controlled and intel-ligent lab evaluation system that enhances both educational quality and learning accountability.

    2. CONTRIBUTIONS

      The key contributions of this work are summarized below:

      • Designed a controlled lab environment that restricts copy paste actions, unauthorized browsing, and external code imports to ensure genuine code development.

      • Developed an automated evaluation engine that assesses logic, syntax, and structural correctness while reducing in-structor workload during large scale submissions.

      • Implemented plagiarism prevention mechanisms lever-aging similarity detection and code behavior analysis to ensure academic integrity.

      • Integrated performance monitoring and feedback modules, enabling instructors to track learning progress and students to iteratively rene their solutions.

    With these features, the proposed platform provides a reliable, transparent, and educationally effective solution to improving programming education. The remainder of this paper is structured as follows: Section II reviews the related literature. Section III describes the overall system architecture and methodological framework. Section IV presents experi-mental evaluation and discussion. Finally, Section V concludes the study and highlights future research directions.

  2. RELATED WORKS

    Recent research on automated assessment, code review ser-vices, AI interventions, and visualization tools highlights both progress and persistent gaps in laboratory based programming education. Cipriano et al. introduced the Drop Project (DP), an open source automated assessment tool that combines unit test-ing (JUnit), style checking (Checkstyle), and coverage analysis (JaCoCo) within a Maven/Spring Boot/MySQL pipeline to deliver near-instant compound feedback for Java/Kotlin as-signments. DPs long-term adoption, with more than 50,000 submissions, demonstrates improved student motivation, fair-ness, and reduced grading load, but the system remains tightly coupled to the JVM/Maven ecosystem and lacks built in AI driven analytics.

    Complementing automated grading, Beattie et al. presented a cloud-based Code Review as an Educational Service plat-form that uses AST analysis to identify code smells, stylistic violations, and security issues. The service, built with Java,

    Spring Boot, React, and MongoDB, supports pedagogical goals such as self-directed learning and alignment with in-dustry practices. However, it remains a proof of concept with limited IDE integration, small-scale usability testing, and integration challenges such as inconsistent behavior with GitHub.

    At a broader level, Manorat et al. conducted a systematic literature review that maps AI applications in programming education over the past decade. Their analysis identies a sig-nicant post 2020 surge in AI usage for plagiarism detection, automated evaluation, adaptive hinting, personalization, and real time classroom support growth driven largely by advances in large language models and the shift toward remote learning. Although the review highlights AIs capacity to reduce faculty workload and deliver timely, individualized support, it also notes widespread inconsistency in evaluation methodologies and frequent underreporting of preprocessing strategies and class-imbalance handling.

    Lai et al. evaluated PVLS, a dynamic code visualization tool designed for novice C programmers. By transforming source code into animated owcharts and dynamically tracing variable values, PVLS supports comprehension of program ow and state changes. Controlled pre or post testing and perception surveys show that the tool improves debugging effectiveness and reduces student anxiety, though the relatively small sample size and short study duration limit generalizabil-ity.

    Taken together, these works provide essential components for modern programming education infrastructure: automated grading pipelines, AST based feedback mechanisms, com-prehensive AI taxonomies, and visualization environments. However, notable gaps remain for real world laboratory set-tings. Drop Project offers robust automated evaluation but is limited by language and build system constraints and lacks AI enhanced plagiarism or behavior monitoring. Code review platforms deliver rich stylistic and security insights but require stronger IDE and CI integration along with broader empirical validation. AI focused surveys reveal inconsistent prepro-cessing practices, limited mitigation of class imbalance, and minimal emphasis on explainability. Visualization tools like PVLS improve conceptual understanding but do not address access control, plagiarism prevention, or large scale automated evaluation needs.

    Motivated by these limitations, our proposed Lab Evaluation and Assessment Platform aims to unify controlled coding environments, automated multi level assessment, plagiarism prevention mechanisms (including both similarity based and behavior based approaches), performance monitoring, and pedagogically grounded feedback loops bringing together the strengths of prior work while addressing their unmet chal-lenges.

  3. METHODOLOGY

    The methodology adopted in this study follows a structured and systematic approach to designing an intelligent and secure

    lab evaluation platform for programming education. It inte-grates pedagogically aligned learning principles with modern automated assessment techniques to address the limitations present in traditional laboratory environments. The proposed framework is built on four key components: identifying chal-lenges in current lab practices, establishing a controlled and integrity focused coding environment, implementing an AI enhanced evaluation and feedback mechanism, and enabling real time instructor monitoring through a dedicated mobile interface. Together, these components ensure that the system supports authentic student learning, efcient evaluation, and effective instructional oversight.

    1. Problem Analysis

      Traditional programming laboratories typically rely on stan-dard text editors or IDEs where students can freely use copy paste operations, import external les, and access internet based resources. While convenient, these unrestricted capa-bilities enable students to reuse existing solutions instead of writing their own code. As a result, many students produce structurally similar submissions without understanding the underlying logic or algorithms, leading to supercial learning and reduced problem solving ability.

      From the instructors perspective, this workow adds signif-icant manual burden. Teachers must open each students le, execute the code, verify correctness, and check for logical issues all of which are time-consuming in large classes. Feedback is often delayed, and evaluations may become incon-sistent. Moreover, instructors have no mechanism to observe students real time progress, making it difcult to identify which students are struggling, idle, or repeatedly encountering similar errors.

      In addition, the absence of integrated assessment and ana-lytics tools in traditional laboratory setups retricts instructors from gaining actionable insights into student performance trends. Without systematic tracking of coding attempts, error patterns, or time spent on tasks, it becomes difcult to evaluate students learning progress objectively. This lack of data driven evaluation limits the ability to design targeted reme-dial actions or improve laboratory pedagogy. Incorporating intelligent monitoring and automated analysis mechanisms can signicantly enhance both teaching efciency and learning effectiveness by enabling continuous assessment and timely instructional support.

    2. Overview of the Proposed System

      To address these limitations, the proposed platform intro-duces an intelligent, secure, and student centered laboratory environment integrated with automated evaluation capabilities. Students access the system using institutional login credentials and complete their assigned experiments within a controlled coding interface that restricts unauthorized actions such as copy paste, external le imports, and unrestricted internet ac-cess. This structured environment ensures that code is written independently, thereby enhancing conceptual understanding and encouraging genuine problem solving engagement.

      Upon submission, the system performs an AI based analysis of the students code to assess correctness, logical struc-ture, and adherence to expected outcomes. It then generates structured, meaningful feedback and updates the instructors monitoring dashboard in real time. This seamless integration bridges the gap between student activity and instructor over-sight, enabling timely intervention, consistent evaluation, and improved learning outcomes across the laboratory course.

      Fig. 1. System Architecture

    3. Data Flow Diagram

      Fig. 2 illustrates the Data Flow Diagram (DFD) of the proposed LEAP platform, showing the movement of data be-tween external entities, system processes, and databases. The primary external entities are the Student and the Teacher. The process begins when the student provides login credentials to the User Authentication module. These credentials are veried using the User Database, and upon successful validation, an authenticated session is created.

      Once authenticated, the student submits source code through the Code Submission module. The submitted code is stored in the Submission Database and forwarded to the AI Code Evaluation module for analysis. The evaluation results are then saved in the Result Database. The Result Generation module processes this evaluation data to generate structured feedback and marks, which are returned to the student. Simultaneously, performance reports are made available to the teacher for monitoring and academic assessment. This structured data ow ensures secure authentication, systematic storage, automated evaluation, and efcient reporting within the platform.

      Fig. 2. Data Flow Diagram

    4. Use Case Diagram

      Fig. 3 illustrates the Use Case Diagram of the proposed LEAP system, highlighting the interaction between different actors and the system. The primary actors include the Student, Teacher, and Head of Department (HOD). Students can log in, perform experiments, and submit code through the platform. Teachers are responsible for evaluating code and monitoring student performance, while the HOD has supervisory access to oversee overall system activities and academic progress. The diagram provides a high-level functional representation of user interactions and clearly denes the system boundary.

      Fig. 3. Use Case Diagram

    5. Controlled Coding Environment

      The controlled coding environment forms the backbone of the proposed platform. It is designed to ensure academic integrity, support focused learning, and maintain uniform working conditions for all students. The environment includes mechanisms to disable copy paste operations, block external le imports, and prevent internet assistance. This encourages students to manually write code and develop logical reasoning skills.

      All submitted programs are executed in a secure environ-ment to ensure safety, consistency, and fairness. The editor provides only minimal syntax hints, encouraging students to identify and correct errors independently rather than relying on automated xes. Additionally, detailed activity logs such as number of compile attempts, time spent per activity, and error history are recorded to support continuous monitoring and identify students who require additional support.

    6. Desktop Application Methodology

      The desktop application is developed using Electron, Re-act, TypeScript, and Vite, providing a cross-platform, high-performance desktop environment:

      • Electron: Provides native desktop capabilities.

      • React: Manages UI components such as dashboards, simu-lated code editors, guided hint panels, and grading views.

      • TypeScript: Ensures type safety and reduces runtime errors.

      • Vite: Enables fast builds and development. Students interact with a controlled, multi le code editor where submissions are stored as JSON. Routing is handled using React Router for smooth navigation

        Fig. 4. Controlled Coding Environment Workow

    7. Student Desktop Modules

      1. Student Login and Authentication Module: Students log in using institutional credentials provided by the college. Authentication ensures that each submission is uniquely as-sociated with a registered student and prevents unauthorized access. This module also enables session tracking and secure activity logging.

      2. Lab Selection Module: After successful authentication, students are presented with a list of laboratories assigned to their course, such as Compiler Design, Python Programming, Data Structures, or Object Oriented Programming. Access to laboratories is governed by institutional enrollment data, ensuring that students can view only the labs relevant to their academic program. Each laboratory contains a predened set of experiments congured and scheduled by the instructor, al-lowing students to clearly understand the scope and objectives of the selected lab before proceeding.

      3. Experiment Allocation Module: For each laboratory ses-sion, the system dynamically activates only those experiments that are scheduled for the specic day or session. This con-trolled allocation prevents students from accessing future, in-complete, or unrelated experiments, thereby enforcing a struc-tured and curriculum aligned learning sequence. By limiting access based on time and instructor conguration, the module ensures uniform progress across the class while maintaining academic discipline within the laboratory environment.

      4. Guided Hint and Assistance Module: To enhance learn-ing without compromising academic integrity, the platform provides guided, context-sensitive hints when students en-counter errors during coding. These hints are generated based on common syntactic mistakes, logical aws, or incomplete implementations, and are designed to prompt critical thinking rather than reveal complete solutions. By encouraging students to analyze and iteratively rene their code, this module sup-ports independent problem-solving, reduces frustration, and fosters deeper conceptual understanding.

    8. AI-Based Evaluation Engine

      We integrated the Gemini model into our LEAP system to perform intelligent code evaluation. When a student submits their code, the backend (Node.js) collects the problem state-ment, expected output, and the submitted program, then sends this data to the Gemini API using a structured prompt. The

      model analyzes the logic, syntax, correctness, and efciency of the code instead ofonly checking the nal output. This allows the system to perform deeper evaluation similar to a human reviewer.

      The response from Gemini includes detailed feedback, scor-ing, and suggestions for improvement. The backend processes this response, stores the results in the database, and displays them on the student dashboard in real time. This AI-based evaluation reduces teacher workload, provides instant feedback to students, and makes the system scalable for handling multiple submissions efciently.

    9. Teacher Monitoring and Mobile Application Modules

      To support instructors, the platform includes a dedicated mobile application that provides real time visibility into stu-dent activity. Teachers receive instant updates on submission status, completion progress, repeated errors, and potential code similarity alerts. These insights help instructors identify students who are stuck or disengaged during the lab session.

      To support instructors, the platform includes a dedicated mobile application with the following modules:

      The mobile application is built using React Native with Expo, following a mobile-rst design approach. Key features include:

      • Navigation via @react-navigation/native.

      • Local storage and session persistence using AsyncStorage.

      • Bulk student uploads via SheetJS (xlsx).

      • Filehandling using Expo Document Picker,Image Picker,andFile System APIs.

        Instructors gain real time visibility into student activity and lab progress, including submission status, completion, repeated errors, inactivity, and similarity alerts.

        1. Teacher Login Module: Instructors authenticate using ofcial credentials, ensuring secure access to lab data and student performance information. The system supports multi factor authentication for added security and maintains detailed login logs for auditing purposes. Role-based access control ensures that only authorized personnel can view sensitive data, and session timeouts prevent unauthorized access in case of inactivity.

        2. Lab Dashboard Module: The dashboard displays all assigned laboratories in a clean and organized interface. In-structors can view key metrics such as total experiments, student progress summaries, and pending tasks at a glance. Selecting a specic lab opens detailed information about ongoing and completed experiments, including experiment de-scriptions, deadlines, and associated resources. The dashboard also allows ltering and sorting of labs based on course, batch, or experiment type, enabling quick navigation.

        3. Experiment Monitoring Module: For each experiment, instructors can view which students have completed the task, who is currently working, and who has not yet started. Submission timestamps, progress bars, and status indicators are displayed in real time. Instructors can access detailed logs of student activity, including code submissions, error reports, and attempts, helping identify areas where students

          Fig. 5. presents the monitoring and reporting workow along with the sequence of real-time updates delivered to teachers.

          may need additional guidance. This module also supports real time intervention, allowing instructors to provide immediate feedback or assistance.

        4. Alert and Notication Module: The system generates alerts for delayed submissions, repeated compilation failures, high similarity scores, or inactivity during lab sessions. Noti-cations are sent through multiple channels, such as email, mobile push notications, and in app messages. The alert system is congurable, allowing instructors to set thresholds and priorities for different types of alerts. This ensures that potential issues are highlighted promptly, enabling timely intervention to support student learning outcomes.

        5. Reporting and Analytics Module: Automatic reports summarize student performance, common errors, completion rates, and overall lab effectiveness. The system generates vi-sual dashboards and downloadable reports in multiple formats, assisting instructors in data driven decision making. Advanced analytics track trends over time, highlight high-performing and at risk students, and identify frequently encountered chal-lenges in experiments. These insights help instructors rene curriculum design, tailor interventions, and improve overall lab efciency and student outcomes.

    10. Experimental Setup and Partial Settings

      The experimental evaluation of the proposed platform was conducted in a controlled academic laboratory environment in-volving undergraduate programming courses. The system was deployed on institutional desktops with restricted administra-tive privileges. A predened set of programming experiments was congured for each lab session, and student activity was monitored throughout the session duration.

      Partial settings included controlled test cases, xed time windows for submission, and limited hint availability to assess the platforms impact on independent problem-solving behav-ior. Instructor feedback and system generated analytics were collected to evaluate usability, effectiveness, and scalability.

    11. Workow Summary

      The overall workow of the proposed system captures the complete end to end process of learning and evaluation. The

      sequence begins with student authentication and task retrieval, followed by code development within a controlled environ-ment. After submission, the AI driven evaluation module as-sesses the program and provides actionable feedback. Students can then revise and resubmit their work based on the hints received. In parallel, the teacher dashboard is continuously updated with real-time information, enabling efcient over-sight and timely intervention. The system ultimately generates comprehensive performance reports that assist instructors in monitoring progress, identifying learning gaps, and making informed instructional decisions.

    12. Activity Diagram

    The activity diagram represents the operational ow of the system starting from student login to AI-based evaluation and teacher monitoring.

    Fig. 6. Activity Diagram

  4. Results and Discussion

    The LEAP system was evaluated in a controlled academic laboratory environment involving undergraduate programming courses. The evaluation focused on system reliability, AI-based assessment performance, user workow efciency, and real-time monitoring effectiveness. The platform was deployed across desktop and mobile environments, integrating an AI-driven evaluation engine powered by the Code Llama model for automated code analysis.

    1. AI Evaluation Accuracy

      The AI-based evaluation engine demonstrated an accuracy of 100% under controlled experimental conditions. Accuracy was determined by comparing AI-generated evaluation results with manual grading performed by instructors across multiple programming experiments. All submissions were correctly assessed in terms of logical correctness, expected output

      verication, and structural compliance with the problem spec-ications. No discrepancies were observed between AI evalu-ation outcomes and instructor validation within predened test conditions.

      This result indicates that the AI module performs deter-ministic validation aligned with structured grading rubrics. The carefully designed prompt engineering strategy ensured systematic evaluation of syntax correctness, logical implemen-tation, output matching, and adherence to problem constraints. The use of predened test cases signicantly contributed to achieving consistent and reliable grading performance.

    2. Performance Metrics

      Table I presents the quantitativ evaluation metrics of the AI-based grading engine under structured testing conditions.

      TABLE I

      Performance Metrics of AI Evaluation Engine

      Metric

      Value

      Accuracy

      100%

      Precision

      1.00

      Recall

      1.00

      F1-Score

      1.00

      The results indicate perfect classication performance within the controlled evaluation environment, with no false positives or false negatives recorded.

    3. Confusion Matrix Analysis

      Table II illustrates the confusion matrix of the AI-based evaluation system.

      TABLE II

      Confusion Matrix for AI Evaluation

      Predicted Correct

      Predicted Incorrect

      Actual Correct

      50

      0

      Actual Incorrect

      0

      30

      The confusion matrix conrms ideal classication perfor-mance under predened validation rules. All correct submis-sions were accurately identied, and all incorrect submissions were properly classied. This further validates the determin-istic grading capability of the AI module within structured testing scenarios.

    4. System Performance and Workow Efciency

      The platform successfully digitized and streamlined labora-tory operations. Key observations include:

      • Signicant reduction in manual grading workload.

      • Instant automated feedback generation for students.

      • Real-time monitoring via the mobile dashboard.

      • Secure and integrity-focused coding environment.

        The mobile-rst teacher application enabled real-time vis-ibility into student progress, submission status, and perfor-mance analytics. Bulk student import through Excel integra-tion improved administrative efciency and reduced laboratory

        setup time. The desktop-based code editor incorporated copy-paste restrictions to enhance academic integrity and discourage unauthorized solution reuse.

        The backend architecture, built using Node.js, Express, MongoDB, and JWT authentication, demonstrated stable ses-sion management, secure role-based access control, and reli-able submission storage. The upsert-based submission mecha-nism prevented duplicate entries while supporting draft auto-save functionality.

    5. Code Llama Model Performance Analysis

      The AI evaluation engine utilizes Code Llama, a high-performance open-source large language model optimized for code generation and inlling tasks. Model performance scales with parameter size, with larger variants (34B and 70B) demonstrating superior logical reasoning and structured code synthesis capabilities.

      On the HumanEval benchmark (pass@1 metric), the 34B base model achieves approximately 48.8% accuracy, while ne-tuned variants such as CodeFuse CodeLlama-34B report performance as high as 74.4%. The 70B variant achieves approximately 57% accuracy on Python-based evaluations, outperforming earlier LLaMA architectures and approaching the performance of larger proprietary models.

      Instruction-tuned variants (CodeLlama-Instruct 7B, 13B, 34B, and 70B) are optimized for natural language under-standing and structured response generation. Additionally, the extended context window (up to 16,384 tokens, and up to 48k tokens in certain implementations) enables effective processing of large code segments and multi-le program structures.

      Performance is inuenced by model size, domain-specic ne-tuning, quantization strategy, and input context length. Larger parameter models generally yield higher accuracy, while aggressive quantization may introduce minor perfor-mance degradation. These characteristics justify the integration of Code Llama within the LEAP platform for automated grading and debugging assistance.

    6. Discussion

      The experimental ndings indicate that AI-driven evaluation within a controlled coding environment signicantly enhances laboratory efciency and grading consistency. The observed 100% accuracy under structured test conditions demonstrates that AI-assisted grading can reliably replicate deterministic instructor-based evaluation when clear validation criteria are dened.

      Real-time monitoring further improved instructional respon-siveness by enabling instructors to identify students encoun-tering repeated compilation errors or delayed submissions. The availability of structured analytics supports data-driven pedagogical decisions and targeted academic intervention.

      It is important to note that the reported performance reects structured and predened evaluation conditions. Future work may involve adaptive rubric learning, similarity detection mechanisms, plagiarism analysis, and scalability testing across larger institutional deployments.

    7. Overall Impact

      The LEAP platform demonstrates that integrating AI-based code evaluation with secure desktop environments and mobile monitoring applications can:

      • Improve grading consistency

      • Enhance feedback quality

      • Reduce instructor workload

      • Promote authentic student learning

      • Digitize laboratory administration

    The system proves to be scalable, secure, and academically aligned with modern programming education requirements.

  5. CONCLUSION AND FUTURE DIRECTIONS

    This paper presents a controlled and intelligent labora-tory evaluation platform aimed at enhancing programming education through secure coding environments, AI driven assessment, iterative feedback, and real time instructor mon-itoring. The experimental deployment demonstrates that the platform effectively addresses key limitations of traditional programming laboratories, including student over-reliance on external resources, inconsistent evaluation practices, delayed feedback, and limited instructor visibility. The controlled desktop interface ensures that students engage in independent coding, while the AI based feedback module supports iterative learning without revealing complete solutions. The teacher-facing mobile application improves instructional efciency by providing real-time dashboards, alerts for incomplete or problematic submissions, and comprehensive performance an-alytics, enabling timely interventions and focused guidance.

    The results show that the proposed platform not only promotes academic integrity but also encourages active learn-ing and problem solving, leading to improved conceptual understanding and engagement. The structured allocation of experiments, combined with contextual hints and performance monitoring, fosters an environment where students can develop coding prociency, logical reasoning skills, and condence in their programming abilities.

    Future research directions include expanding the platform to support collaborative programming and team based projects, allowing students to work together while still maintaining controlled and monitored environments. Integration of ad-vanced learning analytics and adaptive AI could provide per-sonalized feedback based on individual student performance, identifying specic strengths and weaknesses to guide targeted interventions. Additionally, exploring the systems scalability across multiple institutions and programming courses will be critical for understanding its broader impact on curriculum effectiveness. Longitudinal studies could also investigate the long-term effects on student learning outcomes, technical interview performance, and real world software development skills. Finally, incorporating visualizaton tools and gamica-tion elements may further enhance engagement, motivation, and learning efciency in programming laboratories.

    In summary, the proposed platform offers a comprehensive, scalable, and pedagogically effective solution for modern pro-

    gramming education. By integrating security, AI based evalu-ation, iterative feedback, and instructor oversight, it provides a foundation for future intelligent educational systems capable of improving both teaching and learning outcomes in higher education.

  6. ACKNOWLEDGEMENT

    The authors would like to express their sincere gratitude to their project guide, Sajeena M K, for the valuable guid-ance, continuous support, and insightful suggestions provided throughout this work. The authors also thank the faculty of the Department of Computer Science and Engineering, KMEA Engineering College, Ernakulam, for their support

  7. REFERENCES

  1. Bruno Pereira Cipriano, Nuno Fachada, and Pedro Alves (2022). Drop Project: An Automatic Assessment Tool for Programming Assignments.

  2. Matthew Beattie, Moira Watson, Desmond Greer, Bee-Yen Toh, and Zheng Li (2025). Code-Review-as-an-Educational-Service: A Tool for Java Code Review in Pro-gramming Education.

  3. Manorat, Tuarob, and Pongpaichet (2025). Articial In-telligence in Computer Programming Education: A Systematic Literature Review.

  4. Lai, Lin, and You (2025). Development and Evaluation of a Dynamic Code Visualization System for C Programming Education: The PVLS Approach.

  5. BlueOptima (2023). How Poor Code Quality Can Grind Development to a Halt: A Deep Dive.

  6. Alves NS, Mendes TS, de Mendonc¸a MG, Sp´nola RO, Shull F, Seaman C (2016). Identication and Management of Technical Debt: A Systematic Mapping Study.

  7. Tsipenyuk K, Chess B, McGraw G (2005). Seven Per-nicious Kingdoms: A Taxonomy of Software Security Errors.

  8. Hermans F, Aivaloglou E (2016). Do Code Smells Hamper Novice Programming? A Controlled Experiment on Scratch Programs.