🌏
Global Research Platform
Serving Researchers Since 2012

INTEPREP: Ai Mock Interview Platform

DOI : https://doi.org/10.5281/zenodo.18901315
Download Full-Text PDF Cite this Publication

Text Only Version

INTEPREP: Ai Mock Interview Platform

Aakarsh Srivastava

Department of Information Technology Shri Ramswaroop Memorial College of Engineering and Management (SRMCEM) Lucknow, India

Abhay Jaiswal

Department of Information Technology Shri Ramswaroop Memorial College of Engineering and Management (SRMCEM) Lucknow, India

Prabhat Kumar Yadav

Department of Information Technology Shri Ramswaroop Memorial College of Engineering and Management (SRMCEM) Lucknow, India

Abstract – The disconnect between academic preparation and professional requirements often manifests acutely during job interviews. Interview anxiety, coupled with a lack of accessible, objective feedback mechanisms, significantly hinders candidate performance. This paper presents INTEPREP, an innovative web application designed to bridge this gap through realistic, AI-powered mock interview simulations. The system leverages a modern tech stack comprising React 18, Supabase serverless architecture, the browser-native Web Speech API, and a dedicated Large Language Model (LLM) gateway. By providing real-time speech-to-text transcription and utilizing advanced natural language processing to evaluate responses against industry standards (such as the STAR method), INTEPREP offers scalable, actionable insights into candidate content quality and verbal delivery. This research details the system's architectural design, data processing pipeline, and its potential to democratize elite-level interview coaching.

Keywords – AI, React, Supabase, LLM, real-time

  1. INTRODUCTION

    The employment interview remains the definitive gatekeeper in hiring processes globally. However, it is a deeply stressful event; studies indicate that significant interview anxiety can negatively skew performance, regardless of a candidate's actual qualifications [2]. Traditional preparation methods peer practice or hired coachessuffer from critical limitations: they are either operationally unscalable, financially inaccessible, or plagued by subjective, inconsistent feedback. INTEPREP addresses this fundamental "preparation gap." The objective is to develop a high-fidelity automated simulation that mimics the pressure of a real interview while providing the granular, data-driven analysis that human interviewers rarely offer. By integrating recent advancements in browser-based speech recognition with the cognitive capabilities of generative AI, this platform

    aims to provide a consistent, on-demand environment for behavioral and technical interview rehearsal across various industry domains.

  2. LITERATURE REVIEW

    The development of INTEPREP is grounded in three primary theoretical domains. These domains are as follows:

    1. Communication Apprehension and Performance Research by [1] established a strong negative correlation between communication apprehension and interview success. The theoretical basis for mock interviews rests on "systematic

      desensitization"the concept that repeated, safe exposure to a stressor (the interview environment) reduces anxiety over time.

    2. Intelligent Tutoring Systems (ITS) The validity of using AI as a coach is supported by research in educational psychology. [3] demonstrated that well-designed Intelligent Tutoring Systems could achieve student outcomes comparable to human tutors. INTEPREP applies ITS principles to career development, moving beyond static question banks to dynamic, interactive assessment.

    3. Advances in ASR and NLP Assessment Modern Automatic Speech Recognition (ASR) has achieved near-human accuracy in controlled environments [4]. Furthermore, Natural Language Processing (NLP) has matured from simple keyword matching to complex semantic analysis, enabling automated systems to evaluate response structure, sentiment, and technical depth in a manner previously reserved for human experts use.

  3. METHODOLOGY

    The proposed system follows a modular pipeline from image capture to cursor control. Each stage is designed to be computationally efficient to support real-time operation.

    1. System Overview

      INTEPREP operates on a decoupled, serverless client- server architecture designed for high concurrency and low latency. The frontend is a responsive single-page application built with React 18 and TypeScript, ensuring type safety and a dynamic user interface. The backend utilizes Supabase for authentication, data persistence (PostgreSQL), and Edge Functions to orchestrate AI operations without managing server infrastructure. The core intelligence is derived from an integration with a Large Language Model (LLM) gateway, which processes transcribed text to generate evaluations.

    2. Major Components

      1. Client Interface Layer (React/Redux): Manages the user experience, handles media permissions (microphone/camera access), displays real-time transcription streaming, and renders the final visualization dashboards.

      2. Speech Services Layer (Web Speech API): Utilizes the browser-native Speech Recognition interface for converting spoken language into text. It is configured for continuous results, providing immediate visual feedback to the user while they speak.

      3. Backend Orchestration Layer (Supabase):

        1. Auth: Handles secure user session management via JWTs.

        2. Database: Stores user profiles, session metadata, and historical performance scores, secured by Postgres Row Level Security (RLS) policies.

        3. Edge Functions: Serverless compute instances that act as secure middleware between the client and the AI model.

          1. AI Analysis Gateway: An external service that interfaces with high-parameter LLMs (e.g., GPT-4 series). It receives raw transcripts and context prompts, returning structured JSON evaluations.

    3. Processing Pipeline

      The data lifecycle during an interview session follows a strict linear progression:

      1. Initialization: The user selects a domain (e.g., "Software Engineering"). The system queries the database for a curated set of questions balanced by difficulty.

      2. Audio Capture & Transcription: As the user

        responds to a prompt, the browsers microphone stream is intercepted by the Web Speech API. Audio is processed locally or via browser-supported cloud services into text buffers.

      3. Buffering & Transmission: Upon detecting silence or manual completion, the finalized transcript string is securely transmitted via HTTPS to a Supabase Edge Function.

      4. Cognitive Evaluation: The Edge Function constructs a complex prompt containing: the original question, the user's transcribed answer, the target persona (e.g., "Senior Recruiter"), and evaluation criteria (e.g., STAR method compliance). This prompt is sent to the LLM Gateway.

      5. Structured Output Generation: The LLM analyzes the text and returns a structured JSON object containing scores (0-100) for dimensions like Relevance, Clarity, and Technical Depth, along with specific qualitative feedback strings.

      6. Persistence ≈ Visualization: The JSON data is stored in Supabase and immediately returned to the React frontend to populate the Results Dashboard.

    4. Calibration Procedure

      From the user's perspective, the procedure follows four distinct phases:

      1. Setup Phase: User registration and selection of interview parameters (role, experience level).

      2. Simulation Phase: The user enters a video-enabled interface. Questions are presented visually and via

        Text-to-Speech (TTS). The user records their response, observing real-time transcription.

      3. Processing Phase: A brief waiting period (typically 3-5 seconds) between questions where data is pipelined to the AI gateway.

      4. Feedback Phase: Post-session, the user receives a comprehensive report breaking down their performance across tracked metrics, with actionable suggestions for improvement.

    5. System Flow Representation

      Fig 1. Overall system architecture of INTEPREP

  4. RESULTS AND DISCUSSION

    Initial development and testing of the INTEPREP platform have yielded promising technical metrics:

      1. ASR-Accuracy: Utilizing the Chrome implementation of the Web Speech API, transcription accuracy in quiet environments exceeds 90% for native English speakers.

      2. System Latency: The critical path from finishing a

        spoken response to receiving AI feedback averages between 3 to 5 seconds, maintaining the illusion of a fluid conversation.

      3. Scalability: The serverless architecture (Supabase Edge Functions) successfully manages spikes in concurrent usage without manual infrastructure provisioning.

    Discussion: The core achievement of INTEPREP is the successful automation of the STAR (Situation, Task, Action, Result) evaluation framework. By prompting the LLM to specifically identify these components within transcripts, the system provides feedback that is pedagogically sound for behavioral interviews. A limitation identified is the reliance on browser-specific implementations of the Web Speech API, which can lead to inconsistent experiences across different browsers (e.g., Safari vs. Chrome). Furthermore, the current iteration assesses verbal content strictly through text; tone of voice and prosody are lost in transcription.

  5. CONCLUSION AND FUTURE WORK

    INTEPREP demonstrates the viability of combining serverless web architecture with generative AI to create scalable, high-fidelity training tools. By providing an accessible platform for repeated, low-stakes practice with objective feedback, it addresses a critical need in career development.

    Future Work focuses on increasing immersion and analysis depth:

    1. Multimodal Analysis: Integrating client-side computer vision libraries (like TensorFlow.js) to analyze video feeds for non-verbal cues such as eye contact, posture, and facial expressions during responses.

    2. Multilingual Support: Moving beyond English to support major global languages, requiring the integration of more robust, language-agnostic ASR services.

    3. Adaptive Difficulty: Implementing a feedback loop where the AI analyzes previous answers to dynamically adjust the difficulty and technical focus of subsequent questions within the same session.

REFERENCES

  1. Ayres, J., & Crosby, S. (1995). Two studies concerning the predictive validity of the personal report of communication apprehension in employment interviews. Communication Research Reports, 12(2),

    145-151.

  2. McCarthy, J., & Goffin, R. (2004). Measuring job interview anxiety: Beyond weak knees and sweaty palms. Personnel Psychology, 57(3),607-637.

  3. VanLehn, K. (2011). The relative effectiveness of human tutoring, intelligent tutoring systems, and other tutoring systems. Educational Psychologist,46(4),197-221.

  4. Xiong, W., Droppo, J., Huang, X., Seide, F., Seltzer, M., Stolcke, A., … & Zweig, G. (2017). The Microsoft 2017 conversational speech recognition system. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5934-5938)