Emotion-Aware Intelligent Learning System Using Deep Residual Networks for Classroom Emotion Analysis

Roshni; Harendra Singh

doi:10.5281/zenodo.20567331

Volume 15, Issue 05 (May 2026)

Emotion-Aware Intelligent Learning System Using Deep Residual Networks for Classroom Emotion Analysis

DOI : 10.5281/zenodo.20567331

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 24
Authors : Roshni, Harendra Singh
Paper ID : IJERTV15IS052680
Volume & Issue : Volume 15, Issue 05 , May – 2026
Published (First Online): 06-06-2026
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Emotion-Aware Intelligent Learning System Using Deep Residual Networks for Classroom Emotion Analysis

Roshni (1), Harendra Singh

(1,2) Sanjeev Agrawal Global Educational (SAGE) University Bhopal

Abstract – Emotion-aware educational systems have gained significant importance in intelligent learning environments because student emotions directly affect learning performance, concentration, and engagement. This paper proposes an intelligent classroom emotion analysis framework using deep learning architectures for automatic facial emotion recognition. The proposed system utilizes Convolutional Neural Networks (CNN) and ResNet18 models for multi-class emotion classification in smart classroom environments. Unlike traditional FER systems, this work focuses on adaptive learning applications and intelligent educational analysis. Experimental evaluation is performed using accuracy, precision, recall, F1-score, confusion matrix, and ROC curve analysis. Results demonstrate that the proposed residual learning-based framework achieves 90% accuracy and significantly improves classroom emotion prediction capability.

Keywords: Smart Classroom, Emotion Analysis, Adaptive Learning, Deep Learning, ResNet18, Artificial Intelligence

INTRODUCTION

The rapid growth of artificial intelligence and deep learning technologies has transformed modern educational systems. Intelligent classrooms and adaptive learning systems are increasingly integrating emotion recognition technologies to monitor student engagement and learning behavior.

Student emotions such as happiness, frustration, confusion, and excitement directly influence learning outcomes. Traditional educational systems are unable to automatically identify student emotional states during classroom sessions. Therefore, emotion-aware artificial intelligence systems are becoming important components of next-generation smart learning environments. Facial Emotion Recognition (FER) systems use computer vision and deep learning techniques to identify human emotions from facial expressions. Recent deep learning architectures such as CNN and ResNet have significantly improved FER accuracy.

Aly and Alotaibi (2025) proposed a hybrid deep learning framework for real-time emotion detection in online learning environments. Ayat et al. (2026) demonstrated that AI-powered emotion analysis can improve adaptive STEM education systems. Sharma and Mansotra (2019) presented a student emotion recognition framework for classroom environments using deep learning. This paper focuses on emotion-aware intelligent classroom analysis using deep residual learning architectures. Unlike Paper 1, which focuses mainly on comparative FER performance, this work emphasizes educational applications and adaptive learning analysis.

RELATED WORK

Emotion recognition using deep learning and artificial intelligence has become an important research area in educational technologies, adaptive learning systems, and intelligent tutoring environments. Researchers have explored multiple deep learning architectures and AI-based frameworks to improve emotion classification accuracy and adaptive learning performance.

The following table summarizes important research contributions related to emotion recognition and intelligent learning systems.

Authors	Technique / Model	Application Area	Key Findings
Aly and Alotaibi (2025)	Hybrid Deep Learning	Online Learning	Improved real-time emotion detection performance
Ayat et al. (2026)	AI-Powered Emotion Analysis	STEM Education	Enhanced student engagement and adaptive learning

Bala et al. (2022)	Emotion-Based Learner Categorization	E-Learning	Improved personalized learning systems
Bangar et al.	Machine Learning FER	Student Well-Being	Effective student emotion monitoring
Devasenapathy et al. (2025)	Deep Learning-Based Classroom Analysis	Smart Classroom	Improved classroom interaction analysis
Ge (2026)	ML + Clustering Algorithm	Emotion Recognition	Enhanced feature extraction performance
Gürüler and Osman Devrim (2017)	Facial Emotion Recognition	E-Learning Systems	Improved learner interaction
Ilyas et al. (2025)	AI-Powered Classroom Analysis	Adaptive Education	Improved learning outcomes
Khandekar (2026)	Multimodal Emotion Recognition	STEM Education	Personalized adaptive learning
Mohana and Subashini (2024)	Systematic Review of FER	Computer Vision	Deep learning outperformed traditional ML
Professor A. (2025)	Facial Expression Detection	Adaptive Teaching	Improved intelligent teaching systems
Raju et al. (2024)	Federated Deep Learning + SMOTE	Emotion Prediction	Reduced class imbalance
Sharma and Mansotra (2019)	CNN-Based FER	Classroom Environment	Improved student emotion recognition
Wu et al. (2026)	Emotionally Intelligent AI	Smart Learning	Enhanced adaptive learning systems

The literature review indicates that deep learning architectures such as CNN and residual networks have significantly improved FER performance. However, many existing systems still suffer from issues such as class imbalance, limited feature extraction capability, overfitting, and poor recognition of visually similar emotions. Therefore, this work focuses on developing an efficient emotion-aware intelligent learning framework using CNN and ResNet18 architectures.

PROPOSED METHODOLOGY

The proposed intelligent classroom emotion recognition framework was designed to automatically identify and classify student emotions using deep learning architectures. The system integrates image preprocessing, feature extraction, deep residual learning, and performance evaluation modules to improve classroom emotion analysis.

The framework begins with facial image acquisition from the emotion dataset. The collected images are first passed through preprocessing stages that include image resizing, normalization, and tensor conversion. These preprocessing operations help improve model convergence and reduce noise present in raw images.After preprocessing, the images are fed into deep learning architectures for feature extraction and classification. Two major deep learning models were utilized in this work: Convolutional Neural Network (CNN) and ResNet18.

The CNN model extracts low-level and high-level spatial features from facial images using convolution operations, activation functions, and pooling layers. Convolution layers detect important facial patterns such as edges, textures, eyebrows, mouth movement, and eye regions. Pooling layers reduce feature dimensionality and improve computational efficiency.

Convolution Equation

(, ) = ( )(, )

The Rectified Linear Unit (ReLU) activation function introduces non-linearity into the model and improves feature learning capability.

ReLU Activation Function

() = (0, )

The Softmax layer converts the extracted features into probability scores corresponding to different emotion classes.

Softmax Classification

() = () / ()

Although CNN models provide effective feature extraction capability, deeper architectures often suffer from vanishing gradient problems. To overcome this limitation, the ResNet18 architecture was implemented in the proposed framework.

ResNet18 introduces skip connections and residual blocks that allow efficient propagation of gradients through deeper layers. Residual learning improves convergence capability and enhances feature extraction for complex emotional patterns.

Residual Learning Equation

() = () +

where H(x) represents output mapping, F(x) denotes residual mapping, and x represents identity mapping.

The dataset used in this work consists of seven emotional classes including Angry, Disgust, Fear, Happy, Sad, Surprise, and Neutral. The dataset was divided into training, validation, and testing subsets to ensure proper model evaluation and generalization.

The models were trained using the Adam optimization algorithm because of its efficient convergence capability and adaptive learning mechanism. Different learning rates were used for CNN and ResNet18 models to optimize performance.

Table 3.1 Parameters for CNN and ResNet18

Parameter

CNN

ResNet18

Optimizer

Adam

Adam

Learning Rate

0.001

0.0001

Epochs

15

15

The proposed framework was designed not only for accurate emotion classification but also for intelligent educational applications such as adaptive learning systems, student engagement monitoring, and smart classroom analysis.
PERFORMANCE EVALUATION

The proposed intelligent emotion recognition framework was evaluated using multiple performance metrics to analyze classification capability, prediction consistency, and generalization performance. Performance evaluation is an important stage because it determines how effectively the deep learning models recognize and classify different emotional states.

Multiple evaluation metrics including accuracy, precision, recall, F1-score, confusion matrix analysis, and ROC-AUC analysis were used in this work. These metrics provide a detailed understanding of model performance from different perspectives.

Accuracy measures the overall correctness of the classification model by calculating the ratio of correctly predicted samples to the total number of samples.

Accuracy

= ( + )/( + + + )

Precision evaluates how accurately the model predicts positive emotion classes. Higher precision values indicate lower false positive predictions.

Precision

= /( + )

Recall measures the capability of the model to correctly identify actual positive emotion classes. High recall values indicate better sensitivity toward emotional patterns.

Recall

= /( + )

F1-score provides a balanced evaluation of precision and recall. It is particularly useful for multi-class emotion recognition tasks where class imbalance may exist.

F1-Score

1 = 2 × ( × )/( + )

The confusion matrix was used to visualize class-wise prediction performance and identify misclassification patterns among different emotional classes. Confusion matrix analysis helps determine which emotions are correctly recognized and which emotions exhibit overlap because of similar facial patterns. ROC-AUC analysis was also performed to evaluate the discriminative capability of the proposed models. High AUC values indicate better class separability and improved classification robustness.The proposed framework was evaluated using CNN, ResNet18, and an optimized deep learning model to compare the effectiveness of shallow and deep residual learning architectures.
EXPERIMENTAL RESULTS

The experimental analysis was performed to evaluate the effectiveness of CNN, ResNet18, and the proposed optimized framework for multi-class facial emotion recognition. The experiments demonstrate the importance of deep residual learning for improving emotion classification accuracy and feature extraction capability. The baseline CNN model achieved moderate classification performance because shallow architectures have limited capability for extracting highly discriminative facial features. Although CNN successfully identified basic emotional patterns, it struggled to accurately classify visually similar emotions such as fear and sadness.

ResNet18 significantly improved classification performance because of residual learning and skip connection mechanisms. Residual blocks enabled deeper feature extraction and improved gradient propagation during training. The model demonstrated better convergence capability and reduced overfitting compared to the baseline CNN architecture. The proposed optimized framework achieved the best overall performance because of enhanced feature representation learning and improved classification capability.
1. Comparative Accuracy Analysis
  
  Table 5.1 Accuracy comparison for baseline and proposed model
  
  Model
  
  Accuracy
  
  CNN Baseline
  
  62%
  
  ResNet18
  
  78%
  
  Proposed Model
  
  90%
  
  The results indicate that residual learning significantly improves emotion recognition capability. The proposed model achieved 90% classification accuracy, outperforming both CNN and ResNet18 architectures.
  
  Fig 1. Accuracy Comparison for baseline model
2. Precision, Recall, and F1-Score Analysis
  
  Table 5.2 Performance Metrics comparison for baseline and proposed model
  
  Model
  
  Accuracy
  
  Precision
  
  Recall
  
  F1-score
  
  CNN
  
  0.62
  
  0.60
  
  0.59
  
  0.59
  
  ResNet18
  
  0.78
  
  0.77
  
  0.76
  
  0.76
  
  Proposed Model
  
  0.9
  
  0.89
  
  0.88
  
  0.88
  
  Fig 2. Comparison Metrics for baseline and proposed model
  
  The proposed model achieved higher precision and recall values, indicating better prediction consistency and lower false classification rates. The balanced F1-score values also demonstrate improved generalization performance across all emotional categories.
3. Confusion Matrix Analysis
  
  The confusion matrix analysis revealed that happiness and surprise emotions achieved the highest classification accuracy because of their distinctive facial expressions. However, slight confusion was observed between fear and sadness classes because of
  
  similar facial patterns.
  
  Fig 3. Confusion Matrix
  
  The proposed model reduced misclassification rates significantly compared to the baseline CNN model. This improvement confirms the effectiveness of residual learning and enhanced feature extraction capability.
4. ROC Curve Analysis
  
  ROC-AUC analysis demonstrated strong classification capability for all emotion classes. The proposed model achieved high AUC values, indicating improved class separabiity and robust prediction performance.
  
  Fig 4. ROC- AUC Anlaysis
  
  The ROC curves also confirmed that deep residual learning architectures provide better discriminative capability compared to shallow CNN architectures.
5. Educational Impact Analysis
  
  The experimental results indicate that the proposed framework can be effectively integrated into intelligent classroom systems and adaptive educational technologies. Emotion-aware systems can help monitor student engagement, identify learning difficulties, and improve personalized teaching strategies.
  
  The proposed system can support:
  - Smart classroom monitoring
  - Adaptive e-learning systems
  - Intelligent tutoring systems
  - Student engagement analysis
  - Emotion-aware educational analytics
    
    Overall, the results confirm that deep residual learning architectures significantly improve classroom emotion recognition performance and intelligent educational analysis.
    
    Experimental results confirm that deep residual learning architectures significantly improve classroom emotion recognition performance. CNN models provide limited feature extraction capability for complex emotional patterns. ResNet18 improves classification performance through deep residual learning and skip connections.
    
    The proposed optimized framework achieved superior performance because of:
  - Better feature representation
  - Improved convergence
  - Reduced overfitting
  - Enhanced generalization capability
    
    The framework is suitable for real-time intelligent classroom systems and adaptive educational technologies.
CONCLUSION

This paper presented an emotion-aware intelligent classroom framework using CNN and ResNet18 architectures for facial emotion recognition. Experimental evaluation demonstrated that the proposed framework achieved 90% classification accuracy and outperformed baseline CNN models. The study confirms that deep residual learning can significantly improve emotion recognition performance in intelligent educational environments.

Future work may include:

REFERENCES

M. Aly and N. S. Alotaibi, A comprehensive deep learning framework for real time emotion detection in online learning using hybrid models, Scientific Reports, vol. 15, no. 1, 2025.
N. el Ayat, M. Boutalline, A. Tannouche, and H. Ouanan, Emotion-Aware Adaptive Learning: Enhancing Engagement and Performance in STEM Education Using AI-Powered Emotion Analysis, 2026.
M. M. Bala, H. Akkineni, and C. Srinivasulu, An Approach for Learner Categorization Based on Emotions in Intelligent Adaptive E-Learning Environment, Journal of Mobile Multimedia, vol. 18, no. 6, pp. 17091732, 2022.
D. Devasenapathy et al., Real-Time Classroom Emotion Analysis Using Machine and Deep Learning for Enhanced Student Learning, Journal of Intelligent Systems and Internet of Things, vol. 16, no. 2, pp. 82101, 2025.
Y. S. Khandekar, Intelligent Multimodal Emotion Recognition Framework for Personalized and Adaptive STEM Education, International Journal for Research Trends and Innovation, vol. 11, 2026.
M. Mohana and P. Subashini, Facial Expression Recognition Using Machine Learning and Deep Learning Techniques: A Systematic Review, SN Computer Science, vol. 5, no. 4, 2024.
A. Professor, Facial Expression-Based Emotion Detection for Adaptive Teaching in Educational Environments, International Journal of Innovative Science and Research Technology, vol. 10, no. 1, 2025.
V. V. N. Raju et al., Enhancing emotion prediction using deep learning and distributed federated systems with SMOTE oversampling technique, Alexandria Engineering Journal, vol. 108, pp. 498508, 2024.
A. Sharma and V. Mansotra, Deep learning based student emotion recognition from facial expressions in classrooms, International Journal of Engineering and Advanced Technology, vol. 8, no. 6, pp. 46914699, 2019.
X. Wu et al., A deep learning approach to emotionally intelligent AI for improved learning outcomes, Scientific Reports, vol. 16, no. 1, 2026.

Parameter	CNN	ResNet18
Optimizer	Adam	Adam
Learning Rate	0.001	0.0001
Epochs	15	15

Model	Accuracy
CNN Baseline	62%
ResNet18	78%
Proposed Model	90%

Model	Accuracy	Precision	Recall	F1-score
CNN	0.62	0.60	0.59	0.59
ResNet18	0.78	0.77	0.76	0.76
Proposed Model	0.9	0.89	0.88	0.88