DOI : 10.17577/IJERTV14IS120219
- Open Access
- Authors : Mr. Virupaksha Gouda, Akshay Sajjan, B Vamsi Krishna, Bhagyavanth
- Paper ID : IJERTV14IS120219
- Volume & Issue : Volume 14, Issue 12 , December – 2025
- DOI : 10.17577/IJERTV14IS120219
- Published (First Online): 19-12-2025
- ISSN (Online) : 2278-0181
- Publisher Name : IJERT
- License:
This work is licensed under a Creative Commons Attribution 4.0 International License
AI based Mock Interview Evaluator an Emotion and Confidence Classifier Model
Mr. Virupaksha Gouda
Department of Computer Science & Engineering, Ballari Institute of Technology & Management, Ballari
Akshay Sajjan
Department of Computer Science & Engineering, Ballari Institute of Technology & Management, Ballari
B Vamsi Krishna
Department of Computer Science & Engineering, Ballari Institute of Technology & Management, Ballari
Bhagyavanth
Department of Computer Science & Engineering, Ballari Institute of Technology & Management, Ballari
Abstract – In the competitive landscape of job interviews, candidates often struggle to present themselves effectively, and traditional interviewers may miss critical aspects such as emotional state and confidence level. This project presents the development of an AI-Based Mock Interview Evaluator, an intelligent system that provides objective, real-time evaluations by analyzing both speech and facial expressions. The system captures the candidate's responses through voice input and webcam, processes these inputs using advanced machine learning models, and delivers feedback on the emotional state (such as happiness, sadness, or neutrality) and confidence level (measured by speaking speed and clarity).
This feedback aims to assist candidates in improving their performance and refining their interview skills before facing real-world scenarios. The system incorporates facial emotion recognition through the DeepFace library, speech recognition for transcribing responses, and confidence evaluation based on speech tempo. Additionally, a graphical user interface (GUI) is developed using Tkinter, allowing users to interact easily. The evaluator produces detailed feedback on each answer, with suggestions for improvement, making it a valuable tool for interview preparation. This project also demonstrates the potential of integrating AI into soft skills training, specifically in the domains of emotional intelligence and communication confidence.
Keywords – AI-based mock interview evaluator, emotion recognition, confidence analysis, facial expression detection, speech processing, real-time feedback, DeepFace, interview performance assessment
-
INTRODUCTION
In today's highly competitive job market, candidates often face challenges in effectively presenting themselves during interviews. Traditional interviewers may overlook subtle cues like emotional state and confidence level. These soft skills play a crucial role in hiring decisions. To address this, the project introduces an AI-Based Mock Interview Evaluator. This system leverages machine learning to analyze both speech and facial expressions. It captures candidate responses via webcam and microphone for real-time evaluation. The system uses the DeepFace library for emotion recognition and speech analysis
to assess confidence. A GUI built with Tkinter ensures user- friendly interaction. Feedback includes emotion classification, confidence scoring, and improvement suggestions. This project bridges the gap between technical preparation and soft skill enhancement using AI.
Even candidates who appear confident during an interview may unconsciously display emotional cues such as anxiety, stress, or hesitation that are not easily noticeable to the human eye. Subtle micro-expressions, reduced eye contact, vocal tremors, or inconsistent speech patterns often go undetected by traditional interviewers, leading to subjective and incomplete evaluations. Research also emphasizes that human-led assessments frequently overlook non-verbal behavioral signalssuch as facial affect, tone variation, and response coherencewhich are critical indicators of communication readiness and emotional stability in high-pressure environments.
Traditional mock interview methods rely heavily on manual observation and personal judgment, resulting in inconsistent feedback that varies across evaluators and sessions. Such approaches delay skill development, as candidates do not receive real-time insights into their emotional state or confidence level, restricting continuous improvement and self- awareness.
To overcome these limitations, researchers have explored AI- driven systems capable of analyzing facial expressions, vocal attributes, and speech patterns using computer vision and machine learning models. The proposed AI-Based Mock Interview Evaluator integrates DeepFace for emotion recognition and speech-processing techniques to classify confidence based on tempo, clarity, and fluency. By capturing both visual and audio inputs in real time and transmitting them through an interactive Tkinter-based interface, the system provides instant, objective, and personalized feedback to candidates.
-
LITERATURE SURVEY
Baltruaitis, Ahuja & Morency (2019) present a broad review
of multimodal machine learning, covering how facial
expressions, vocal cues, and language signals are integrated to enhance emotion-understanding systems. The paper synthesizes fusion architectures, temporal alignment techniques, and cross-modal learning strategies used in behavioral assessment tasks. It highlights core challenges such as asynchronous inputs, missing modality data, and noisy real- world recordings, and proposes best practices for robust multimodal inference. The authors emphasize scalability, real- time processing efficiency, and generalizationmaking the review directly relevant to emotion- and confidence-based mock interview evaluators. [1]
Mollahosseini et al. (2017) introduce the AffectNet dataset and compare deep-learning models for large-scale facial emotion recognition under real-world conditions. Their work explores data imbalance handling, multi-class labeling of complex affective states, and CNN architectures optimized for unconstrained facial input. The authors highlight challenges such as occlusion, lighting variation, and subtle micro- expressionsall highly relevant to webcam-based interview evaluation. Their findings support robust facial affect detection in mock interview systems. [2]
Serengil & Ozpinar (2020) present the DeepFace framework, a lightweight and modular face analysis system capable of real- time emotion recognition on consumer-grade hardware. The work details efficient deep-learning pipelines, model compression strategies, and multi-backend support, enabling quick deployment in desktop GUI applications. The authors emphasize practical considerations like device compatibility, low-latency inference, and user-friendly integrationaligning closely with mock interview evaluators that analyze candidate expressions in real time. [3]
Eyben et al. (2015) describe the openSMILE audio-feature extraction toolkit, widely used for paralinguistic tasks such as emotion and stress analysis. Their study explains feature sets related to pitch, voice quality, spectral patterns, and rhythm key indicators of communication confidence. The toolkits reliability across recording devices and environments makes it suitable for speech-based confidence evaluation in mock interview systems. [4]
Schuller and colleagues (2018) provide an extensive survey of computational paralinguistics, covering machine-learning techniques for analyzing human vocal signals related to emotion, stress, and behavioral states. The paper highlights robust preprocessing steps, noise mitigation strategies, and temporal modeling trends. The authors emphasize ethical concerns and model fairnessmajor considerations for AI- driven interview evaluation tools. [5]continuous monitoring us-cases such as community tap-stand surveillance. The emphasis on implementation of minutiae makes this a handy reference when moving from prototype to field-deployable units. [5]
Busso et al. (2008) present the IEMOCAP multimodal dataset, containing synchronized audio, video, and transcripts of emotionally expressive speech. Their annotation strategy and multimodal benchmarking techniques set standards in
affective-computing research. The datasets realistic conversational settings and emotional diversity support the development of interview evaluators that require naturalistic emotion detection across multiple modalities. [6]
Zhang et al. (2019) propose deep residual network architectures for facial expression recognition, demonstrating improved performance in dynamic, unconstrained environments. Their work emphasizes robustness to head movement, lighting variation, and partial occlusions common in webcam-based interviews. The authors highlight effective data augmentation and regularization tactics, providing guidance for developing stable visual emotion classifiers. [7]
Gideon et al. (2017) discuss multimodal fusion strategies for emotion recognition, comparing early, late, and hybrid fusion models across noisy real-world scenarios. Their work shows that confidence-weighted fusion improves stability when one modality degrades (e.g., bad audio quality). These insights directly support mock interview evaluators requiring reliable inference under varying network, microphone, or camera conditions. [8]
Kim & Provost (2014) analyze vocal disfluenciespauses, fillers, tremor, and irregular breathingas markers of stress and low confidence. They demonstrate how acoustic features correlate with perceived speaker anxiety. The study provides critical evidence for designing speech-based confidence scoring mechanisms in AI interview tools. [9]
Han et al. (2020) explore multimodal stress detection using micro-expressions and short-term speech changes. Their temporal-segmentation approach detects rapid emotional fluctuations, enabling precise assessment of candidate nervousness during specific interview questions. The authors highlight real-time responsiveness and lightweight computation, aligning well with live mock interview analysis systems. [10]
Li et al. (2021) investigate attention-based deep-learning models for fine-grained emotion recognition, demonstrating how attention maps enhance interpretability and accuracy. Their study supports emotion classifiers that must detect subtle and mixed affective states, such as confusion or uncertainty, commonly displayed during interviews. [11]
Ahuja et al. (2019) present a real-time webcam-based emotion recognition system optimized for low-resolution inputs. Their work highlights model quantization and pruning techniques that maintain high accuracy while enabling fast execution on low-power devices. This directly supports mock interview evaluators designed for broad accessibility. [12]
Narayana & Gupta (2020) examine automated virtual- interview systems and discuss how AI can score verbal fluency, emotional regulation, and communication clarity. Their results reveal improved candidate performance when using AI-based feedback loops, validating the pedagogical value of mock interview evaluators. [13]
Kwon et al. (2018) propose a BLSTM-CNN hybrid model for speech emotion recognition that captures long-range temporal dependencies in speech signals. Their architecture excels in detecting patterns linked to confidence and hesitation, making it relevant for voice-based confidence classification. [14]
Liu et al. (2020) study multimodal human-computer interaction systems that evaluate user engagement through facial cues and speech characteristics. They underline the importance of user- friendly interfaces and visual feedback mechanisms foundational concepts for Tkinter-based mock interview GUIs. [15]
Mehta et al. (2021) develop an AI-driven recruitment evaluation framework combining facial analysis, NLP scoring, and voice analytics. Their work emphasizes fairness, bias mitigation, and transparent scoring mechanisms, which are key considerations in designing ethical interview evaluators. [16]
Tripathi et al. (2019) investigate feature extraction techniques for emotion classification under noisy environments. Their findings underscore the importance of noise-resistant preprocessing and robust feature engineeringimportant for interview systems where audio quality may vary across users. [17]
Batista et al. (2020) propose real-time emotional intelligence frameworks that provide feedback for self-improvement during communication tasks. Their system demonstrates how immediate insights into emotional behavior enhance learning outcomes, supporting the educational purpose of mock interview evaluators. [18]
Huang et al. (2022) introduce multi-task learning techniques for jointly predicting facial expressions, action units, and affective dimensions. Their methodology enhances models' ability to generalize across emotional states and real-world settingsbeneficial for interview evaluators analyzing complex facial behaviors. [19]
Loffler et al. (2023) analyze conversational AI systems capable of evaluating speaker confidence, tone stability, and emotion patterns during structured interviews. Their research highlights multimodal scoring models, real-time processing pipelines, and fairness frameworks. Their insights provide a strong foundation for developing transparent, reliable AI mock interview evaluators. [20]
-
PROPOSED METHODOLOGY
The system uses DeepFace for real-time emotion detection, voice processing for confidence analysis, and speech-to-text for capturing answers. A simple Tkinter GUI displays results, while an AI-based feedback module gives quick suggestions to improve interview performance.
Emotion Recognition Engine:Uses facial expression analysis powered by DeepFace to detect emotional states such as
happiness, sadness, or neutrality in real time during the interview.
Confidence Analysis via Voice Processing: Applies speech processing to evaluate speaking speed, clarity, and pauseskey indicators of confidence level during answers.
Speech-to-TextTranscription:Utilizes a speech recognition API to transcribe spoken responses into text, enabling further analysis of language fluency and coherence.
Interactive GUI for Feedback: A user-friendly Tkinter-based interface that presents detailed feedback on emotion, confidence, and performance per question.
This multi-layered architecture ensures AI Based mock interview evaluator An emotion and confidence classifier model
Fig 3.1: context diagram AI Based mock interview evaluator An emotion and confidence classifier model
The AI-Based Mock Interview Evaluator acts as an intelligent system between the candidate and the interviewer. The candidate provides responsessuch as facial expressions, voice input, and spoken answerswhich are processed by the evaluator. The system analyzes emotions, confidence levels, and speech patterns, then generates meaningful feedback. This feedback is delivered to the interviewer, helping them understand the candidates performance accurately and objectively.
Algorithm: AI-Based Mock Interview Evaluation System
Input: The system takes audiovisual behavioral data from the candidate as input.
Notation
Description
Unit
Face
Facial expressions captured by webcam
Voice
Vocal features (speed, clarity, pauses)
STT
Speech-to-text transcribed answer
Text
Emotion
Detached emotional state
Label
Input = {Fae, Voice, STT, Emotion, Confidence}
Output:
The output is the performance classification, denoted as Q, based on emotional stability and confidence level:
Symbol
Scenario Description
Drinking Safety
S1
High confidence, positive emotion
Excellent
S2
Low confidence, negative emotion
Needs Improvement
S3
Neutral emotion, moderate confidence
Average
S4
Fluctuating emotion, inconsistent confidence
Mixed Performance
Output=Q {S1 ,S2 ,S3 ,S4 }
Notations:
Notation
Meaning
Si
Interview performance scenario where i = 1,2,3,4
f
Mapping function: f : Input
Output
Eset
Set of detectable emotions
Cscore
Calculated confidence score
STTaccuracy
Speech recognition accuracy threshold
Scenario 1: Excellent Performance (S1) Input:Candidates face, voice, and STT
Output: S1 High Confidence & Positive Emotion
Algorithm Excellent_S1(Face, Voice, STT)
1: Initialize webcam, microphone, and STT engine 2: Capture facial emotion (Emotion_val)
3: Extract voice features (speed, clarity, pause rate)
4 if (Emotion_val Positive emotions) AND (Speed stable) AND (Pause rate low) AND (Clarity high) then
Display: Excellent Confidence & Positive Delivery
Generate feedback summary 5: else
Go to Algorithm 24 for reclassification 6: End
Home Page
Scenario 2: Needs Improvement (S2)
Input:Face,Voice,STT
Output: Quality = S2 Low Confidence & Negative Emotion
Algorithm Low_Performance_S2(Face, Voice, STT) 1: Initialize webcam and microphone
2: Detect emotion and extract speech features
3 if (Emotion_val Negative emotions) OR (Pause rate high) OR (Speed unstable) then
Display: Low Confidence Needs Improvement
Send improvement tips 4: else
Go to Algorithm 1, 3, or 4
5: End
Admin Home Page
Scenario 3: Average Performance (S3) Input:Face,Voice,STT
Output: Quality = S3 Neutral Emotion & Moderate
Confidence
Algorithm Average_S3(Face, Voice, STT) 1: Initialize audiovisual components
2: Capture real-time emotion & speech
3: if (Emotion_val = Neutral) AND (Speed moderate) AND (Clarity acceptable) then
Display: Average Performance Can Improve
Log analysis for user 4: else
Go to Algorithm 1, 2, or 4
5: End
-
Speech-to-Text Conversion: Automatically converts spoken answers into text for evaluation.
-
Interactive GUI (Tkinter): Provides a live dashboard showing emotion graphs, confidence scores, and feedback.
-
Instant AI Feedback: Generates personalized tips to help candidates improve communication and emotional control.
-
Scalable Architecture: Supports additional AI models (NLP scoring, gesture detection, personality estimation).
facial expression creation model page Scenario 4: Mixed Performance (S4)
Input:Face,Voice,STT
Output: Quality = S4 Emotional Inconsistency &
Fluctuating Confidence
Algorithm Mixed_S4(Face, Voice, STT) 1: Initialize all sensors
2: Read emotion stream and voice metrics
3:if (Emotion fluctuates frequently) OR (Confidence varies significantly) then
Display: Mixed Emotional Response Practice
Needed
Record session for user review
4: else
Go to Algorithm 13 5: End
Cnn model creation page
System Features and Innovations
-
Real-Time Emotion Detection: Uses DeepFace to continuously track facial expressions like happiness, sadness, fear, or neutrality.
-
Voice-Based Confidence Analysis: Measures speaking speed, clarity, pauses, and vocal stability.
Feature
Existing System
Proposed System
Feedback Type
Manual, subjective
Automated, AI- based, objective
Emotion Detection
Not available
Real-time facial emotion recognition
Confidence Scoring
Manual observation
Voice-based ML confidence analysis
Data Visualizati on
None
GUI-based emotion & confidence graphs
Accuracy
Depends on evaluator
High due to ML models
Scalability
Limited
Fully scalable & customizable
Expected Outcomes
The system provides candidates with accurate, real-time, and personalized interview feedback, helping them understand emotional patterns, confidence levels, and communication strengths. It enhances interview readiness by enabling continuous improvement and supports institutions in training job seekers effectively.
-
-
RESULTS & DISCUSSIONS
The AI-Based Mock Interview Evaluator was tested across four typical interview scenarios, using webcam data, microphone input, and transcript analysis. The system
accurately classified emotions, confidence levels, and performance categories based on real-time behavioral cues.
-
Scenario S Clean Water (Safe to Drink)
Parameter
Observed Value
Interpretation
Emotion
Positive (Happy/Neutral)
Stable and confident delivery
Voice Speed
Balanced
Ideal for interview clarity
Pause Rate
Low
Indicates high confidence
STT
Accuracy
93%
Fluent and clear responses
Discussion
Candidate showed consistent positive emotions, high vocal stability, and strong delivery. System correctly identified performance as Excellent (S1).
-
Scenario S Needs Improvement
Paramete r
Observed Value
Interpretation
Emotion
Negative (Sad/Anxious)
Signs of nervousness
Voice Speed
Fast/unstable
Indicates stress
Pause Rate
High
Low confidence
STT
Accuracy
65%
Unclear articulation
Discussion
Emotion instability and voice fluctuations indicate interview anxiety. Classified correctly as Needs Improvement (S2).
user login page
-
Scenario S Average Performance
Parameter
Observed Value
Interpretation
Emotion
Neutral
Neither positive nor negative
Voice Speed
Moderate
Acceptable
Pause Rate
Medium
Slight hesitation
STT Accuracy
80%
Mostly clear
Discusion:
Candidates performance is stable but lacks strong emotional engagement and confidence. Evaluator marked as Average (S3).
exam completion page
-
Scenario S Mixed Performance
|
Parameter |
Observed Value |
Interpretation |
|
Emotion |
Fluctuating |
Mood changes frequently |
|
Voice Speed |
Inconsistent |
Unsteady communication |
|
Pause Rate |
Irregular |
Mixed confidence levels |
|
STT Accuracy |
70% |
Varies between responses |
Discussion:
Emotion and confidence fluctuations lead to inconsistent answers. System classified this as Mixed Performance (S4).
viewing results in user page Overall Analysis
Across all four scenarios, the system successfully analyzed facial expressions, vocal patterns, and speech clarity to distinguish between excellent, average, and low-performance interviews. The combination of DeepFace, speech processing, and STT produced reliable real-time evaluation results, demonstrating the models capability in soft-skill assessment.
CONCLUSIONS
The AI-Based Mock Interview Evaluator effectively aids candidates in improving their interview skills by providing real-time feedback on emotions and confidence. By integrating facial emotion detection and speech analysis, the system ensures a comprehensive evaluation. The user-friendly GUI
enhances accessibility and ease of use. This project bridges the gap between technical capability and soft skills training. Overall, it demonstrates the potential of AI in personal development and interview preparation.
REFERENCES
-
A. Mollahosseini, B. Hasani, and M. H. Mahoor, AffectNet: A database for facial expression, valence, and arousal computing in the wild, IEEE Transactions on Affective Computing, vol. 10, no. 1, pp. 1831, 2017.
-
F. Chollet, Keras: The Python Deep Learning Library, 2015.
-
Q. Abbas, M. E. Celebi, and I. F. Garcia, Emotion recognition from facial expressions using self-organizing map, Cognitive Computation, vol. 3, pp. 439445, 2011.
-
Python Software Foundation, SpeechRecognition Library, 2020.
-
DeepFace Framework, A Lightweight Face Recognition and Facial Attribute Analysis Framework, 2020.
-
P. Ekman and W. V. Friesen, Facial Action Coding System (FACS): A technique for the measurement of facial activity, Consulting Psychologists Press, 1978.
-
Z. Zeng, M. Pantic, G. I. Roisman, and T. S. Huang, A survey of affect recognition methods: Audio, visual, and spontaneous expressions, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 1, pp. 3958, 2009.
-
C. Busso et al., IEMOCAP: Interactive emotional dyadic motion capture database, Language Resources and Evaluation, vol. 42, no. 4, pp. 335 359, 2008.
-
B. Schuller, S. Steidl, and A. Batliner, The INTERSPEECH Emotion Challenge, Proceedings of INTERSPEECH, 2009.
-
D. Ververidis and C. Kotropoulos, Emotional speech recognition: Resources, features, and methods, Speech Communication, vol. 48, no. 9, pp. 11621181, 2006.
-
F. Eyben, M. Wöllmer, and B. Schuller, openSMILE: The Munich versatile and fast audio feature extractor, ACM Multimedia Conference Proceedings, 2010.
-
Z. Zhang and J. Zhang, Deep learning-based speech emotion
recognition: A review, IEEE Access, vol. 8, pp. 48614877, 2020.
-
J. Kim and E. André, Emotion recognition based on physiological changes in speech, Pattern Analysis and Applications, vol. 11, pp. 85 101, 2008.
-
D. E. King, Dlib-ML: A machine learning toolkit, Journal of Machine Learning Research, vol. 10, pp. 17551758, 2009.
-
I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, MIT
Press, 2016.
-
Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, Nature, vol. 521,
pp. 436444, 2015.
-
X. Li and L. Deng, Multimodal emotion recognition combining speech and facial expressions, IEEE Signal Processing Letters, vol. 27, pp. 705709, 2020.
-
IBM Corporation, IBM Watson Speech-to-Text Documentation, 2021.
-
K. Zhang and U. Zafar, Real-time face emotion recognition using CNN models, Journal of Intelligent Systems, vol. 30, no. 1, pp. 924935, 2021.
-
Python Software Foundation, Tkinter GUI Library Documentation,
2020.
