🏆
Trusted Engineering Publisher
Serving Researchers Since 2012

Intelligent Virtual Interview Assistant with Instant Performance Analysis

DOI : 10.17577/IJERTV15IS061076
Download Full-Text PDF Cite this Publication

Text Only Version

Intelligent Virtual Interview Assistant with Instant Performance Analysis

‌Prof. M. D. Ingle

‌Department of Computer Engineering JSPMS Jaywantrao Sawant College of Engineering, ‌Pune,India

Pournima Mali

Department of Computer Engineering JSPMS Jaywantrao Sawant College of Engineering, Pune,India

Jagruti More

Department of Computer Engineering JSPMS Jaywantrao Sawant College of Engineering, Pune, India

‌Anand Magar

Department of Computer Engineering JSPMS Jaywantrao Sawant College of Engineering, Pune, India

Kunal Patil

Department of Computer Engineering, JSPMS Jaywantrao Sawant College of Engineering Pune, India

Abstract – The recruitment process increasingly relies on artificial intelligence to improve efficiency, scalability, and fairness. This paper presents an AI-Powered Virtual Interviewer with Real-Time Feedback, an intelligent interview platform that evaluates candidates through speech, text, and facial-expression analysis. The system integrates speech recognition, natural language processing, computer vision, and large language models to conduct adaptive interviews and provide instant performance feedback.

Keywords-Artificial Intelligence, Virtual Interviewer, Real- Time Feedback, Natural Language Processing, Speech Analysis, Facial Expression Recognition, Large Language Models, Recruitment Automation.

  1. ‌INTRODUCTION

    Today’s competitive job market, interview performance plays a crucial role in determining career opportunities for students and job seekers. While candidates often possess the required technical knowledge, many struggle to effectively communicate their skills, manage nervousness, and respond confidently during interviews. Regular interview practice is essential for improving these skills; however, access to professional guidance and mock interview sessions is often limited due to time, cost, and availability constraints.

    Recent advancements in Artificial Intelligence (AI), Natural Language Processing (NLP), Speech Recognition, and Computer Vision have created new opportunities for developing intelligent learning systems that can simulate real- world interview environments. These technologies enable automated analysis of spoken responses, communication patterns, facial expressions, and overall interview performance, providing users with valuable insights into their strengths and areas for improvement.

    This paper presents an AI-Powered Virtual Interview Practice System with Real-Time Feedback, designed to help users

    prepare for interviews through interactive mock interview sessions. The system generates interview questions, evaluates responses, analyzes communication skills, and provides instant personalized feedback. By creating a realistic and engaging practice environment, the platform enables users to improve their confidence, communication abilities, and interview readiness.

    The proposed system combines speech-to-text conversion, large language models, facial expression analysis, and performance analytics to deliver a comprehensive interview preparation experience. Unlike traditional mock interview methods, the system offers continuous practice opportunities, objective evaluation, and actionable recommendations, making interview preparation more accessible, efficient, and effective for learners.

    The primary objective of this work is to develop an intelligent interview practice platform that supports self-assessment, skill enhancement, and continuous improvement through AI-driven feedback and performance tracking.

    1. For Candidates:

      • ‌No software installation needed runs entirely in- browser via a standard webcam and microphone

      • Simple 3-step flow: register start interview receive feedback report

      • Supports multiple languages and works on low- bandwidth connections (minimum 2 Mbps)

      • Feedback is displayed in plain language, no technical jargon, with specific timestamps pointing to moments in the interview.

    2. ‌Easy Interview Practice

      • ‌Users can begin a mock interview with a single click. The AI interviewer automatically generates questions and guides users throughout the interview process. No manual configuration or specialized training is required.

    3. Self-Learning Support

      • ‌The platform encourages self-learning by providing personalized recommendations and improvement tips. Users can practice repeatedly and monitor their communication development through continuous feedback and assessment.

  2. ‌SOFTWARE AND HARDWARE REQUIREMENTS

    1. Software Specifications

      Component

      Specification

      Operating System

      Windows 10/11 or Ubuntu Linux

      Programming Language

      Python 3.10+

      Frontend

      HTML, CSS, JavaScript, Streamlit

      Backend Framework

      Flask / FastAPI

      Database

      MongoDB / SQLite

      ML Frameworks

      TensorFlow, PyTorch, Scikit-learn

      NLP Libraries

      NLTK, SpaCy, Hugging Face

      CV Libraries

      OpenCV, MediaPipe

      Speech Libraries

      SpeechRecognition, Librosa, PyAudio

      Version Control

      Git / GitHub

    2. ‌Hardware Specifications

      Component

      Specification

      Processor

      Intel Core i5 (8th Gen+) or AMD Ryzen 5

      RAM

      Minimum 8 GB (16 GB Recommended)

      Storage

      256 GB SSD or higher

      GPU

      NVIDIA GTX 1650+ (for DL inference)

      Webcam

      HD Camera for facial analysis

      Microphone

      Noise-cancelling microphone

      Internet

      Stable broadband connection

  3. ‌SYSTEM DESIGN AND PERFORMANCE EVALUATION

    1. The development of the AI-Powered Virtual Interviewer with Real-Time Feedback began with identifying the need for an accessible platform that helps users improve their communication and interview skills through regular practice. Many students and job-seekers face difficulties in expressing their thoughts confidently during interviews due to a lack of preparation and constructive feedback. To address this challenge, the project was designed to simulate a real interview environment where users can practice answering questions and receive immediate feedback on their performance.

    2. ‌Units

      The AI-Powered Virtual Interviewer with Real-Time Feedback uses various measurement units to evaluate user performance and system effectiveness. Interview duration is measured in seconds () and minutes (min) to track the time taken by users to answer questions. Speech processing components analyze speaking rate in words per minute (WPM), which helps determine the user’s fluency and pace of communication.

      Performance metrics such as communication quality, grammar accuracy, vocabulary usage, confidence level, and overall interview performance are represented as percentage scores (%). Speech-to-text accuracy is also measured in percentage form to evaluate the effectiveness of voice recognition. System response time, which indicates the time required to process a user’s response and generate feedback, is measured in milliseconds (ms). Data storage and processing requirements are expressed in megabytes (MB) and gigabytes (GB) depending on the amount of interview data maintained by the system.

    3. ‌Equations

      The overall performance of a user during a mock interview can be calculated by combining multiple communication- related parameters such as fluency, grammar, confidence, and vocabulary usage. The overall communication score is computed as:

      CS = (WF Ă— F + WG Ă— G + WC Ă— C + WV Ă— V) / (WF + WG

      + WC + WV) (1)

      where,

      CS = Communication Score

      F = Fluency Score

      G = Grammar Score C = Confidence Score V = Vocabulary Score

      WF = Weight assigned to Fluency WG = Weight assigned to Grammar WC = Weight assigned to Confidence WV = Weight assigned to Vocabulary

      ‌Equation (1) calculates the weighted average of all communication parameters to generate a final communication score for the user. The resulting score is expressed as a percentage and is used to provide real-time feedback and performance analysis during interview-practice sessions.

    4. Some Common Mistakes

      ‌During interview-practice sessions, users often make several communication-related mistakes that negatively affect their overall performance. One of the most common issues is speaking too quickly or too slowly, which can reduce the clarity and effectiveness of communication. Many users also use filler words such as “um,” “uh,” “like,” and “you know” excessively, making their responses appear less confident and less professional‌

      • Users participating in mock interviews often make several communication-related mistakes that can negatively affect their interview performance. One of the most common issues is the excessive use of filler words such as “um,” “uh,” and “like,” which reduce the clarity and professionalism of responses. Research in automated interview analysis has shown that speech patterns and verbal fluency significantly influence communication effectiveness and interviewer perception [2].

      • Another common mistake is providing irrelevant or incomplete answers that fail to address the interview question properly. Many users struggle to organize their thoughts logically, resulting in responses that lack structure and coherence. Intelligent interview-training systems can identify such deficiencies and provide personalized recommendations for improvement [1], [4].

      • ‌Grammar mistakes, limited vocabulary, and improper sentence construction are also frequently observed during interview practice sessions. Natural Language Processing techniques and transformer-based language models can effectively analyze textual responses and assess linguistic quality, helping users improve their communication skills [7], [8], [10].

      • Poor pronunciation and unclear speech delivery can further impact communication quality. Modern speech- recognition systems are capable of converting spoken responses into text and identifying speech-related issues with high accuracy, enabling real-time performance evaluation and feedback generation [6].

      • Lack of confidence is another major challenge faced by users during interviews. Long pauses, hesitation, nervousness, and inconsistent speaking patterns often reduce overall performance. Conversational coaching

        systems and AI-based interview trainers have demonstrated the ability to detect such behavioral indicators and assist users in developing greater confidence through repeated practice and constructive feedback [3], [4].

      • The proposed AI-Powered Virtual Interviewer with Real- Time Feedback automatically detects these common mistakes and provides personalized suggestions for improvement. By evaluating fluency, grammar, vocabulary usage, response relevance, and confidence, the system helps users strengthen their communication abilities and become better prepared for real-world interview situations [1], [2], [3].

  4. TECHNOLOGIES AND ARCHITECTURE

    1. Technology Stack

      TABLE I. AI VIRTUAL INTERVIEWER TECHNOLOGY STACK

      Layer

      Technology

      Version

      Function

      User Interface

      React.js / Flutter Web

      React 18

      Candidate- facing interview portal

      Authentication

      OAuth 2.0 / JWT

      OAuth 2.0

      Secure session management

      Speech-to-Text

      OpenAI Whisper

      Whisper v3

      Real-time speech transcription

      Question Engine

      GPT-4o (fine- tuned)

      GPT-4o

      Dynamic question generation

      Answer Evaluation

      BERT + Rubric Scorer

      BERT-large

      Semantic answer quality scoring

      Sentiment Analysis

      RoBERTa / DeepFace

      RoBERTa-base

      Facial & textual sentiment

      Backend

      FastAPI (Python)

      Python 3.11

      API

      orchestration

      Database

      PostgreSQL + Redis

      PG 16 / Redis 7

      Session storage & caching

      Feedback Engine

      GPT-4o +

      custom prompts

      GPT-4o

      Structured post- interview report

      Deployment

      Docker + AWS Lambda

      Docker 24

      Scalable cloud deployment

    2. Architecture Diagram

      Fig. 1 illustrates the architecture of the AI-Powered Virtual Interviewer with Real-Time Feedback. The system generates interview questions based on the selected domain and conducts an interactive interview session. User responses are analyzed using communication analysis, sentiment detection, grammar checking, and pronunciation evaluation modules. Based on the analysis, the system generates follow-up questions and provides real-time feedback. Finally, the analytics and reporting module creates performance reports that help users improve their communication skills and interview readiness.

      candidate records and supports efficient retrieval of interview- related data. A performance radar chart displays multi- dimensional evaluation across technical, communication, consistency, structure, and confidence axes.

      ‌Fig. 1. Architecture Diagram

    3. Interview Configuration Interface

      The interview configuration interface allows administrators to manage interview settings and customize parameters. This module supports configuration of question domains (Java, Python, JavaScript, C++, Data Structures, Machine Learning, etc.), difficulty levels (Easy, Medium, Hard), experience levels (Fresher, 12 yrs, 3+ yrs), and time windows. The system generates a mission preview describing the interview style and coaching focus.

    4. ‌Qustion Analysis Interface

    ‌The question analysis interface evaluates candidate

  5. COMPARATIVE METHODS

    ANALYSIS OF EXISTING

    responses on a per-question basis, displaying strengths and weaknesses for each answer. Scores are provided on a 1/10 scale with an option for deep analysis. The interface presents

    TABLE II. Comparative Study of Existing AI Interview Systems

    Author / Source

    Year

    Key Advantage

    Limitation

    Vanderbilt RASL Lab

    2023

    Multimodal interaction; specialized training

    Limited to specific groups; no real-time feedback

    Lee et al.

    2024

    Hybrid AI improves accuracy

    No stress/non- verbal analysis

    Zhang et al. (SimInterview)

    2025

    Multilingual; scalable

    No emotional/non- verbal feedback

    Kumar & Sharma

    2025

    Feasibility demonstrated

    Small sample; limited conversation flow

    ‌VIII. RESULTS AND DISCUSSION

    1. Dashboard Interface

      The dashboard interface serves as the central control panel of the proposed system. It provides a consolidated view of interview activities, candidate statistics, analysis summaries, and system operations. It enables administrators and users to monitor interview sessions and access different functionalities efficiently. Key metrics displayed include total interviews, completion rate, in-progress sessions, and average confidence score.

    2. ‌Profile Management Interface

      The profile management interface handles candidate and administrator profile information. It allows users to manage personal details, interview history, account information, and performance records. The interface maintains organized

      question-wise breakdowns covering answer structure, contextual relevance, and technical depth, enabling candidates to identify specific areas for improvement.

      1. ‌System Performance

      The implementation results demonstrate that the proposed system successfully performs automated interview management and intelligent candidate evaluation. The integration of AI-based technologies such as speech processing, facial emotion recognition, and NLP improves the overall efficiency and reliability of interview analysis. The modular architecture supports scalability and future enhancements.

  6. ‌ACKNOWLEDGMENT

    We sincerely thank our project guide, Prof. M. D. Ingle, for his valuable guidance, support, and encouragement throughout the development of the project AI-Powered Virtual Interviewer with Real-Time Feedback.

    We are grateful to our Project Coordinator, Prof. Pooja Barve, and Head of Department, Prof. S. B. Chaudhari, for their continuous support and motivation. We also thank the faculty members of the Department of Computer Engineering for their guidance and assistance.

  7. ‌REFERENCES

  1. L. Hemamou, G. Felhi, V. Vandenbussche, J.-C. Martin, and C. Clavel, HireNet: A hierarchical attention model for the automatic analysis of asynchronous video job interviews, Proc. AAAI Conf. Artificial Intelligence, vol. 33, no. 1, pp. 661-668, 2019.

  2. I. Naim, M. I. Tanveer, D. Gildea, and M. E. Hoque, Automated analysis and prediction of job interview performance, IEEE Trans. Affective Computing, vol. 9, no. 2, pp. 191-204, Apr.-Jun. 2018.

  3. M. E. Hoque, M. Courgeon, J.-C. Martin, B. Mutlu, and

    R. W. Picard, MACH: My automated conversation coach, in Proc. ACM Int. Joint Conf. Pervasive and Ubiquitous Computing, 2013, pp. 697-706.

  4. L. Chen and K. Jokinen, Spoken dialogue systems for job interview training, in Proc. Workshop Spoken Dialogue Systems Technology, 2011, pp. 1-10.

  5. Y. Guo, Z. Zhang, and S. Zhao, Automated interview question generation using retrieval-augmented large language models, arXiv preprint arXiv:2312.04345, 2023.

  6. A. Radford, J. W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever, Robust speech recognition via large- scale weak supervision, in Proc. Int. Conf. Machine Learning, vol. 202, 2023, pp. 28492-28518.

  7. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, BERT: Pre-training of deep bidirectional transformers for language understanding, in Proc. NAACL-HLT, 2019, pp. 4171-4186.

  8. Y. Liu et al., RoBERTa: A robustly optimized BERT pretraining approach, arXiv preprint arXiv:1907.11692, 2019.