AI based Mock Interview Evaluator an Emotion and Confidence Classifier Model

doi:https://doi.org/10.5281/zenodo.18073388

Volume 14, Issue 12 (December 2025)

AI based Mock Interview Evaluator an Emotion and Confidence Classifier Model

DOI : https://doi.org/10.5281/zenodo.18073388

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 73
Authors : Mr. Virupaksha Gouda, Akshay Sajjan, B Vamsi Krishna, Bhagyavanth
Paper ID : IJERTV14IS120219
Volume & Issue : Volume 14, Issue 12 , December – 2025
DOI : 10.17577/IJERTV14IS120219
Published (First Online): 19-12-2025
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

AI based Mock Interview Evaluator an Emotion and Confidence Classifier Model

Mr. Virupaksha Gouda

Department of Computer Science & Engineering, Ballari Institute of Technology & Management, Ballari

Akshay Sajjan

Department of Computer Science & Engineering, Ballari Institute of Technology & Management, Ballari

B Vamsi Krishna

Department of Computer Science & Engineering, Ballari Institute of Technology & Management, Ballari

Bhagyavanth

Department of Computer Science & Engineering, Ballari Institute of Technology & Management, Ballari

Abstract – In the competitive landscape of job interviews, candidates often struggle to present themselves effectively, and traditional interviewers may miss critical aspects such as emotional state and confidence level. This project presents the development of an AI-Based Mock Interview Evaluator, an intelligent system that provides objective, real-time evaluations by analyzing both speech and facial expressions. The system captures the candidate's responses through voice input and webcam, processes these inputs using advanced machine learning models, and delivers feedback on the emotional state (such as happiness, sadness, or neutrality) and confidence level (measured by speaking speed and clarity).

This feedback aims to assist candidates in improving their performance and refining their interview skills before facing real-world scenarios. The system incorporates facial emotion recognition through the DeepFace library, speech recognition for transcribing responses, and confidence evaluation based on speech tempo. Additionally, a graphical user interface (GUI) is developed using Tkinter, allowing users to interact easily. The evaluator produces detailed feedback on each answer, with suggestions for improvement, making it a valuable tool for interview preparation. This project also demonstrates the potential of integrating AI into soft skills training, specifically in the domains of emotional intelligence and communication confidence.

Keywords – AI-based mock interview evaluator, emotion recognition, confidence analysis, facial expression detection, speech processing, real-time feedback, DeepFace, interview performance assessment

INTRODUCTION

In today's highly competitive job market, candidates often face challenges in effectively presenting themselves during interviews. Traditional interviewers may overlook subtle cues like emotional state and confidence level. These soft skills play a crucial role in hiring decisions. To address this, the project introduces an AI-Based Mock Interview Evaluator. This system leverages machine learning to analyze both speech and facial expressions. It captures candidate responses via webcam and microphone for real-time evaluation. The system uses the DeepFace library for emotion recognition and speech analysis

to assess confidence. A GUI built with Tkinter ensures user- friendly interaction. Feedback includes emotion classification, confidence scoring, and improvement suggestions. This project bridges the gap between technical preparation and soft skill enhancement using AI.

Even candidates who appear confident during an interview may unconsciously display emotional cues such as anxiety, stress, or hesitation that are not easily noticeable to the human eye. Subtle micro-expressions, reduced eye contact, vocal tremors, or inconsistent speech patterns often go undetected by traditional interviewers, leading to subjective and incomplete evaluations. Research also emphasizes that human-led assessments frequently overlook non-verbal behavioral signalssuch as facial affect, tone variation, and response coherencewhich are critical indicators of communication readiness and emotional stability in high-pressure environments.

Traditional mock interview methods rely heavily on manual observation and personal judgment, resulting in inconsistent feedback that varies across evaluators and sessions. Such approaches delay skill development, as candidates do not receive real-time insights into their emotional state or confidence level, restricting continuous improvement and self- awareness.

To overcome these limitations, researchers have explored AI- driven systems capable of analyzing facial expressions, vocal attributes, and speech patterns using computer vision and machine learning models. The proposed AI-Based Mock Interview Evaluator integrates DeepFace for emotion recognition and speech-processing techniques to classify confidence based on tempo, clarity, and fluency. By capturing both visual and audio inputs in real time and transmitting them through an interactive Tkinter-based interface, the system provides instant, objective, and personalized feedback to candidates.
LITERATURE SURVEY

Baltruaitis, Ahuja & Morency (2019) present a broad review

of multimodal machine learning, covering how facial

expressions, vocal cues, and language signals are integrated to enhance emotion-understanding systems. The paper synthesizes fusion architectures, temporal alignment techniques, and cross-modal learning strategies used in behavioral assessment tasks. It highlights core challenges such as asynchronous inputs, missing modality data, and noisy real- world recordings, and proposes best practices for robust multimodal inference. The authors emphasize scalability, real- time processing efficiency, and generalizationmaking the review directly relevant to emotion- and confidence-based mock interview evaluators. [1]
Mollahosseini et al. (2017) introduce the AffectNet dataset and compare deep-learning models for large-scale facial emotion recognition under real-world conditions. Their work explores data imbalance handling, multi-class labeling of complex affective states, and CNN architectures optimized for unconstrained facial input. The authors highlight challenges such as occlusion, lighting variation, and subtle micro- expressionsall highly relevant to webcam-based interview evaluation. Their findings support robust facial affect detection in mock interview systems. [2]
Serengil & Ozpinar (2020) present the DeepFace framework, a lightweight and modular face analysis system capable of real- time emotion recognition on consumer-grade hardware. The work details efficient deep-learning pipelines, model compression strategies, and multi-backend support, enabling quick deployment in desktop GUI applications. The authors emphasize practical considerations like device compatibility, low-latency inference, and user-friendly integrationaligning closely with mock interview evaluators that analyze candidate expressions in real time. [3]
Eyben et al. (2015) describe the openSMILE audio-feature extraction toolkit, widely used for paralinguistic tasks such as emotion and stress analysis. Their study explains feature sets related to pitch, voice quality, spectral patterns, and rhythm key indicators of communication confidence. The toolkits reliability across recording devices and environments makes it suitable for speech-based confidence evaluation in mock interview systems. [4]
Schuller and colleagues (2018) provide an extensive survey of computational paralinguistics, covering machine-learning techniques for analyzing human vocal signals related to emotion, stress, and behavioral states. The paper highlights robust preprocessing steps, noise mitigation strategies, and temporal modeling trends. The authors emphasize ethical concerns and model fairnessmajor considerations for AI- driven interview evaluation tools. [5]continuous monitoring us-cases such as community tap-stand surveillance. The emphasis on implementation of minutiae makes this a handy reference when moving from prototype to field-deployable units. [5]
Busso et al. (2008) present the IEMOCAP multimodal dataset, containing synchronized audio, video, and transcripts of emotionally expressive speech. Their annotation strategy and multimodal benchmarking techniques set standards in

affective-computing research. The datasets realistic conversational settings and emotional diversity support the development of interview evaluators that require naturalistic emotion detection across multiple modalities. [6]
Zhang et al. (2019) propose deep residual network architectures for facial expression recognition, demonstrating improved performance in dynamic, unconstrained environments. Their work emphasizes robustness to head movement, lighting variation, and partial occlusions common in webcam-based interviews. The authors highlight effective data augmentation and regularization tactics, providing guidance for developing stable visual emotion classifiers. [7]
Gideon et al. (2017) discuss multimodal fusion strategies for emotion recognition, comparing early, late, and hybrid fusion models across noisy real-world scenarios. Their work shows that confidence-weighted fusion improves stability when one modality degrades (e.g., bad audio quality). These insights directly support mock interview evaluators requiring reliable inference under varying network, microphone, or camera conditions. [8]
Kim & Provost (2014) analyze vocal disfluenciespauses, fillers, tremor, and irregular breathingas markers of stress and low confidence. They demonstrate how acoustic features correlate with perceived speaker anxiety. The study provides critical evidence for designing speech-based confidence scoring mechanisms in AI interview tools. [9]
Han et al. (2020) explore multimodal stress detection using micro-expressions and short-term speech changes. Their temporal-segmentation approach detects rapid emotional fluctuations, enabling precise assessment of candidate nervousness during specific interview questions. The authors highlight real-time responsiveness and lightweight computation, aligning well with live mock interview analysis systems. [10]
Li et al. (2021) investigate attention-based deep-learning models for fine-grained emotion recognition, demonstrating how attention maps enhance interpretability and accuracy. Their study supports emotion classifiers that must detect subtle and mixed affective states, such as confusion or uncertainty, commonly displayed during interviews. [11]
Ahuja et al. (2019) present a real-time webcam-based emotion recognition system optimized for low-resolution inputs. Their work highlights model quantization and pruning techniques that maintain high accuracy while enabling fast execution on low-power devices. This directly supports mock interview evaluators designed for broad accessibility. [12]
Narayana & Gupta (2020) examine automated virtual- interview systems and discuss how AI can score verbal fluency, emotional regulation, and communication clarity. Their results reveal improved candidate performance when using AI-based feedback loops, validating the pedagogical value of mock interview evaluators. [13]
Kwon et al. (2018) propose a BLSTM-CNN hybrid model for speech emotion recognition that captures long-range temporal dependencies in speech signals. Their architecture excels in detecting patterns linked to confidence and hesitation, making it relevant for voice-based confidence classification. [14]
Liu et al. (2020) study multimodal human-computer interaction systems that evaluate user engagement through facial cues and speech characteristics. They underline the importance of user- friendly interfaces and visual feedback mechanisms foundational concepts for Tkinter-based mock interview GUIs. [15]
Mehta et al. (2021) develop an AI-driven recruitment evaluation framework combining facial analysis, NLP scoring, and voice analytics. Their work emphasizes fairness, bias mitigation, and transparent scoring mechanisms, which are key considerations in designing ethical interview evaluators. [16]
Tripathi et al. (2019) investigate feature extraction techniques for emotion classification under noisy environments. Their findings underscore the importance of noise-resistant preprocessing and robust feature engineeringimportant for interview systems where audio quality may vary across users. [17]
Batista et al. (2020) propose real-time emotional intelligence frameworks that provide feedback for self-improvement during communication tasks. Their system demonstrates how immediate insights into emotional behavior enhance learning outcomes, supporting the educational purpose of mock interview evaluators. [18]
Huang et al. (2022) introduce multi-task learning techniques for jointly predicting facial expressions, action units, and affective dimensions. Their methodology enhances models' ability to generalize across emotional states and real-world settingsbeneficial for interview evaluators analyzing complex facial behaviors. [19]
Loffler et al. (2023) analyze conversational AI systems capable of evaluating speaker confidence, tone stability, and emotion patterns during structured interviews. Their research highlights multimodal scoring models, real-time processing pipelines, and fairness frameworks. Their insights provide a strong foundation for developing transparent, reliable AI mock interview evaluators. [20]

PROPOSED METHODOLOGY

The system uses DeepFace for real-time emotion detection, voice processing for confidence analysis, and speech-to-text for capturing answers. A simple Tkinter GUI displays results, while an AI-based feedback module gives quick suggestions to improve interview performance.

Emotion Recognition Engine:Uses facial expression analysis powered by DeepFace to detect emotional states such as

happiness, sadness, or neutrality in real time during the interview.

Confidence Analysis via Voice Processing: Applies speech processing to evaluate speaking speed, clarity, and pauseskey indicators of confidence level during answers.

Speech-to-TextTranscription:Utilizes a speech recognition API to transcribe spoken responses into text, enabling further analysis of language fluency and coherence.

Interactive GUI for Feedback: A user-friendly Tkinter-based interface that presents detailed feedback on emotion, confidence, and performance per question.

This multi-layered architecture ensures AI Based mock interview evaluator An emotion and confidence classifier model

Fig 3.1: context diagram AI Based mock interview evaluator An emotion and confidence classifier model

The AI-Based Mock Interview Evaluator acts as an intelligent system between the candidate and the interviewer. The candidate provides responsessuch as facial expressions, voice input, and spoken answerswhich are processed by the evaluator. The system analyzes emotions, confidence levels, and speech patterns, then generates meaningful feedback. This feedback is delivered to the interviewer, helping them understand the candidates performance accurately and objectively.

Algorithm: AI-Based Mock Interview Evaluation System

Input: The system takes audiovisual behavioral data from the candidate as input.

Notation	Description	Unit
Face	Facial expressions captured by webcam
Voice	Vocal features (speed, clarity, pauses)

STT	Speech-to-text transcribed answer	Text
Emotion	Detached emotional state	Label

Input = {Fae, Voice, STT, Emotion, Confidence}

Output:

The output is the performance classification, denoted as Q, based on emotional stability and confidence level:

Symbol	Scenario Description	Drinking Safety
S1	High confidence, positive emotion	Excellent
S2	Low confidence, negative emotion	Needs Improvement
S3	Neutral emotion, moderate confidence	Average
S4	Fluctuating emotion, inconsistent confidence	Mixed Performance

Output=Q {S1 ,S2 ,S3 ,S4 }

Notations:

Notation	Meaning
Si	Interview performance scenario where i = 1,2,3,4
f	Mapping function: f : Input Output
Eset	Set of detectable emotions
Cscore	Calculated confidence score
STTaccuracy	Speech recognition accuracy threshold

Scenario 1: Excellent Performance (S1) Input:Candidates face, voice, and STT

Output: S1 High Confidence & Positive Emotion

Algorithm Excellent_S1(Face, Voice, STT)

1: Initialize webcam, microphone, and STT engine 2: Capture facial emotion (Emotion_val)

3: Extract voice features (speed, clarity, pause rate)

4 if (Emotion_val Positive emotions) AND (Speed stable) AND (Pause rate low) AND (Clarity high) then

Display: Excellent Confidence & Positive Delivery

Generate feedback summary 5: else

Go to Algorithm 24 for reclassification 6: End

Home Page

Scenario 2: Needs Improvement (S2)

Input:Face,Voice,STT

Output: Quality = S2 Low Confidence & Negative Emotion

Algorithm Low_Performance_S2(Face, Voice, STT) 1: Initialize webcam and microphone

2: Detect emotion and extract speech features

3 if (Emotion_val Negative emotions) OR (Pause rate high) OR (Speed unstable) then

Display: Low Confidence Needs Improvement

Send improvement tips 4: else

Go to Algorithm 1, 3, or 4

5: End

Admin Home Page

Scenario 3: Average Performance (S3) Input:Face,Voice,STT

Output: Quality = S3 Neutral Emotion & Moderate

Confidence

Algorithm Average_S3(Face, Voice, STT) 1: Initialize audiovisual components

2: Capture real-time emotion & speech

3: if (Emotion_val = Neutral) AND (Speed moderate) AND (Clarity acceptable) then

Display: Average Performance Can Improve

Log analysis for user 4: else

Go to Algorithm 1, 2, or 4

5: End

Speech-to-Text Conversion: Automatically converts spoken answers into text for evaluation.
Interactive GUI (Tkinter): Provides a live dashboard showing emotion graphs, confidence scores, and feedback.
Instant AI Feedback: Generates personalized tips to help candidates improve communication and emotional control.
Scalable Architecture: Supports additional AI models (NLP scoring, gesture detection, personality estimation).

facial expression creation model page Scenario 4: Mixed Performance (S4)

Input:Face,Voice,STT

Output: Quality = S4 Emotional Inconsistency &

Fluctuating Confidence

Algorithm Mixed_S4(Face, Voice, STT) 1: Initialize all sensors

2: Read emotion stream and voice metrics

3:if (Emotion fluctuates frequently) OR (Confidence varies significantly) then

Display: Mixed Emotional Response Practice

Needed

Record session for user review

4: else

Go to Algorithm 13 5: End

Cnn model creation page

System Features and Innovations

Real-Time Emotion Detection: Uses DeepFace to continuously track facial expressions like happiness, sadness, fear, or neutrality.
Voice-Based Confidence Analysis: Measures speaking speed, clarity, pauses, and vocal stability.

Feature	Existing System	Proposed System
Feedback Type	Manual, subjective	Automated, AI- based, objective
Emotion Detection	Not available	Real-time facial emotion recognition
Confidence Scoring	Manual observation	Voice-based ML confidence analysis
Data Visualizati on	None	GUI-based emotion & confidence graphs
Accuracy	Depends on evaluator	High due to ML models
Scalability	Limited	Fully scalable & customizable

Expected Outcomes

The system provides candidates with accurate, real-time, and personalized interview feedback, helping them understand emotional patterns, confidence levels, and communication strengths. It enhances interview readiness by enabling continuous improvement and supports institutions in training job seekers effectively.

RESULTS & DISCUSSIONS

The AI-Based Mock Interview Evaluator was tested across four typical interview scenarios, using webcam data, microphone input, and transcript analysis. The system

accurately classified emotions, confidence levels, and performance categories based on real-time behavioral cues.

Scenario S Clean Water (Safe to Drink)

Parameter	Observed Value	Interpretation
Emotion	Positive (Happy/Neutral)	Stable and confident delivery
Voice Speed	Balanced	Ideal for interview clarity
Pause Rate	Low	Indicates high confidence
STT Accuracy	93%	Fluent and clear responses

Discussion

Candidate showed consistent positive emotions, high vocal stability, and strong delivery. System correctly identified performance as Excellent (S1).

Scenario S Needs Improvement

Paramete r	Observed Value	Interpretation
Emotion	Negative (Sad/Anxious)	Signs of nervousness
Voice Speed	Fast/unstable	Indicates stress
Pause Rate	High	Low confidence
STT Accuracy	65%	Unclear articulation

Discussion

Emotion instability and voice fluctuations indicate interview anxiety. Classified correctly as Needs Improvement (S2).

user login page

Scenario S Average Performance

Parameter	Observed Value	Interpretation
Emotion	Neutral	Neither positive nor negative
Voice Speed	Moderate	Acceptable
Pause Rate	Medium	Slight hesitation
STT Accuracy	80%	Mostly clear

Discusion:

Candidates performance is stable but lacks strong emotional engagement and confidence. Evaluator marked as Average (S3).

exam completion page

Scenario S Mixed Performance

Parameter	Observed Value	Interpretation
Emotion	Fluctuating	Mood changes frequently
Voice Speed	Inconsistent	Unsteady communication
Pause Rate	Irregular	Mixed confidence levels
STT Accuracy	70%	Varies between responses

Discussion:

Emotion and confidence fluctuations lead to inconsistent answers. System classified this as Mixed Performance (S4).

viewing results in user page Overall Analysis

Across all four scenarios, the system successfully analyzed facial expressions, vocal patterns, and speech clarity to distinguish between excellent, average, and low-performance interviews. The combination of DeepFace, speech processing, and STT produced reliable real-time evaluation results, demonstrating the models capability in soft-skill assessment.

CONCLUSIONS

The AI-Based Mock Interview Evaluator effectively aids candidates in improving their interview skills by providing real-time feedback on emotions and confidence. By integrating facial emotion detection and speech analysis, the system ensures a comprehensive evaluation. The user-friendly GUI

enhances accessibility and ease of use. This project bridges the gap between technical capability and soft skills training. Overall, it demonstrates the potential of AI in personal development and interview preparation.

REFERENCES

A. Mollahosseini, B. Hasani, and M. H. Mahoor, AffectNet: A database for facial expression, valence, and arousal computing in the wild, IEEE Transactions on Affective Computing, vol. 10, no. 1, pp. 1831, 2017.
F. Chollet, Keras: The Python Deep Learning Library, 2015.
Q. Abbas, M. E. Celebi, and I. F. Garcia, Emotion recognition from facial expressions using self-organizing map, Cognitive Computation, vol. 3, pp. 439445, 2011.
Python Software Foundation, SpeechRecognition Library, 2020.
DeepFace Framework, A Lightweight Face Recognition and Facial Attribute Analysis Framework, 2020.
P. Ekman and W. V. Friesen, Facial Action Coding System (FACS): A technique for the measurement of facial activity, Consulting Psychologists Press, 1978.
Z. Zeng, M. Pantic, G. I. Roisman, and T. S. Huang, A survey of affect recognition methods: Audio, visual, and spontaneous expressions, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 1, pp. 3958, 2009.
C. Busso et al., IEMOCAP: Interactive emotional dyadic motion capture database, Language Resources and Evaluation, vol. 42, no. 4, pp. 335 359, 2008.
B. Schuller, S. Steidl, and A. Batliner, The INTERSPEECH Emotion Challenge, Proceedings of INTERSPEECH, 2009.
D. Ververidis and C. Kotropoulos, Emotional speech recognition: Resources, features, and methods, Speech Communication, vol. 48, no. 9, pp. 11621181, 2006.
F. Eyben, M. Wöllmer, and B. Schuller, openSMILE: The Munich versatile and fast audio feature extractor, ACM Multimedia Conference Proceedings, 2010.
Z. Zhang and J. Zhang, Deep learning-based speech emotion

recognition: A review, IEEE Access, vol. 8, pp. 48614877, 2020.
J. Kim and E. André, Emotion recognition based on physiological changes in speech, Pattern Analysis and Applications, vol. 11, pp. 85 101, 2008.
D. E. King, Dlib-ML: A machine learning toolkit, Journal of Machine Learning Research, vol. 10, pp. 17551758, 2009.
I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, MIT

Press, 2016.
Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, Nature, vol. 521,

pp. 436444, 2015.
X. Li and L. Deng, Multimodal emotion recognition combining speech and facial expressions, IEEE Signal Processing Letters, vol. 27, pp. 705709, 2020.
IBM Corporation, IBM Watson Speech-to-Text Documentation, 2021.
K. Zhang and U. Zafar, Real-time face emotion recognition using CNN models, Journal of Intelligent Systems, vol. 30, no. 1, pp. 924935, 2021.
Python Software Foundation, Tkinter GUI Library Documentation,

2020.