Facial Emotion Recognition Using CNN

DOI : 10.17577/IJERTV13IS040242

Download Full-Text PDF Cite this Publication

Text Only Version

Facial Emotion Recognition Using CNN

Jaspreet Singh

Department of Electronics & Comm. Engg.

Chandigarh University Mohali, Punjab, India

Abhinav Kumar Department of Electronics & Comm. Engg.

Chandigarh University Mohali, Punjab, India

Tushar Pandey

Department of Electronic & Comm.


Chandigarh University Mohali, Punjab, India

Er.Manjeet Kour Department of Electronics & Comm. Engg.

Chandigarh University Mohali, Punjab, India

Ankit Pandey Department of Electronics & Comm. Engg.

Chandigarh University Mohali, Punjab, India

A. Abstract The facial emotion recognition project personifies an advantage within the possibilities of machine learning and computer vision this research aims to create a robust system that has the capability of differentiating between emotions and categorizing them by analyzing the facial emotions in real-time. Methodology like convolutional neural networks CNN is Deep learning which is advanced machine learning techniques are adopted in a project which will amplify Slash magnify the human-computer interaction examination of emotions and different applications in medical field automotive industry and beyond.

By using the e-posh technique that is training a model by uploading different pictures of various emotions and before giving results our model will scan the uploaded pictures and recognize the real-time emotions of the user.

Keywords-E-POS technique, Real-time analysis, Model training, and facial emotion recognition.


    Emotion recognition, a key aspect of human-computer interaction, has gained significant attention in artificial intelligence. It plays a crucial role in numerous applications, including mental health monitoring, customer feedback analysis, and interactive gaming. Among various modalities for emotion recognition, facial expressions stand out due to their non-invasive nature and ease of capture. This paper focuses on Facial Emotion Recognition (FER), a challenging task due to the subtlety of expressions and their variability across individuals. We propose a novel approach to FER using Convolutional Neural Networks (CNN), a class of deep learning models that have shown remarkable success in image analysis tasks. Python, with its rich ecosystem of scientific computing and machine learning libraries, serves as our language of choice for implementing the proposed CNN model. This paper aims to provide a comprehensive study of our proposed model, detailing the methodology, experimental setup, results, and potential improvements.[1] The remainder of the paper is structured as follows: Section II provides a review of related work in the field of FER. Section III describes the methodology of our proposed CNN model. Section IV presents the experimental setup and results, and finally, Section V concludes the paper with a summary and potential future work..


    1. Background: –

      Facial Emotion Recognition (FER) is easy for human beings, not for computers. However, nowadays, with the help of AI (Artificial Intelligence), it has become easier for computers to identify human emotions. AI focuses on detecting and analyzing human emotion in this field. There have been numerous changes and techniques developed in this field. Let's discuss them below.:-

      • SVM (Support Vector Machine) method was used for Facial Emotion Recognition (FER). SVM classified the elements of the database into two groups: linear and non- linear data. SVM was used for complex data, and it changed input data into higher dimensional space. SVM helped in the FER (Facial Emotion Recognition) field, such as image classification and more.[2]

      • PCA (Principal Component Analysis) was useful for dimension reduction. PCA found new basic factors, targeting to find the direction of maximum variation. PCA used maximum pixel, and all pixels and images were arranged in a matrix. PCA was used with the concepts of AI (artificial intelligence) and image manipulation. PCA worked on four types of databases:

        (a) Utrecht database, (b) Indian database, (c) Researcher database, and (d) Pain Expression database. The system underwent training on 825 images, including both male and female subjects. The Utrecht dataset produced the top-performing results.[3]

      • Facial Motion Prior Networks (FMPN) were used for Facial Emotion Recognition (FER). They focused on the region where facial muscles move. FMPN, specifically Facial Motion Prior, utilized the average natural face and animated face for learning facial motion masks. FMPN operates through three types of networks: (a) Facial motion mask generator (FMG), (b) Prior fusion network (PFN), and (c) Classification network (CN). FMPN related the actual face muscle shape to the fake muscle

        shape, and then it provided results.[4]

      • Now, Facial Emotion Recognition (FER) consists of three stages: (a) Facial Localization and Landmark Points Detection, (b) Facial Geometric Feature Extraction using VGG-19, and (c) Emotion Classification using FCNN. It uses the MUG and GEMEP databases. Millions of images are used to train the Convolutional Neural Network (CNN).[5]

        Facial Emotion Recognition (FER) has seen significant advancements over the years, with the development of various techniques and methodologies. The journey began with the use of a Support Vector Machine (SVM) in 2011, which was instrumental in classifying complex data into linear and non-linear groups, thereby aiding in tasks such as image classification. The year 2014 saw the introduction of Principal Component Analysis (PCA), a technique that was beneficial for dimension reduction. PCA was able to find new basic factors and was used in conjunction with AI and image manipulation concepts on various databases, with the Utrecht dataset producing the best results. In 2019, Facial Motion Prior Networks (FMPN) were used for FER. FMPN focused on the region where facial muscles move and used the average natural face and animated face for learning facial motion masks. It operated through three types of networks and related the actual face muscle shape to the fake muscle shape to provide results. By 2023, FER had evolved to include three stages: Facial Localization and Landmark Points Detection, Facial Geometric Feature Extraction using VGG-19, and Emotion Classification using FCNN. The system used the MUG and GEMEP databases and trained the Convolutional Neural Network (CNN) with millions of images.

        In conclusion, the evolution of FER techniques has significantly improved the ability of computers to recognize and categorize human emotions, paving the way for enhanced human-computer interaction and applications in various fields such as the medical and automotive industries. The future of FER promises even more sophisticated and accurate emotion recognition capabilities as research and technology continue to advance.[6]

    2. Proposed System: –

    We represent progressive research on facial emotion identification systems driven by machine learning and data- driven insights in response to the shortcomings of conventional facial emotion detection techniques and the need for more precise and flexible emotion recognition tools. Data Gathering and Preparation:

    Thorough data gathering is the cornerstone of our suggested method. We compile a variety of datasets that include emotional expression patterns, hitorical emotion data, face traits, and environmental factors. To guarantee the quality and relevance of the dataset, data preparation is an essential stage that includes careful cleaning, feature engineering, and addressing missing information.

    Model Selection and Training:

    Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Ensemble techniques are just a few of the machine learning algorithms that our research examines as being appropriate for face emotion recognition. These models were chosen with care because of how well they represent the complex interactions between face features and emotional dynamics. The algorithms are trained using historical data to identify intricate patterns in the expressions on people's faces. [7]

    Evaluation Metrics:

    We use evaluation metrics including accuracy, precision, recall, and F1 score to gauge how well our models predict the future. These metrics offer measurable indicators of our system's effectiveness, enabling a thorough assessment of its capacity to identify a range of facial expressions precisely.[8]

    Interpretability and Insights:

    This method provides interpretability that goes beyond prediction. We do feature importance analysis to determine which face traits are most important in impacting the identification of emotions. With the help of this function, users may better comprehend how facial emotion detection works and get useful insights into the subtleties of emotional expressions.[9]

    Deployment of User-Friendly Interface:

    We ensure practical application by utilizing an easy-to-use interface for our facial expression recognition technology. With the help of this interface, experts, developers, and consumers looking for emotional insights may all quickly and easily access real-time emotion identification.

    Support for Well-Informed Decision-Making:

    In addition to improving prediction accuracy, our suggested method seeks to provide users with a better comprehension of the variables affecting emotional expressions. We enable stakeholders to make well-informed decisions in various applications, from mental health monitoring to human- computer interaction, by providing precise forecasts and insights.[10]


    The process of training a human emotion recognition model from basic requires a systematic approach. First, put together a varied and thoroughly explicated dataset.

    To observe and record a range of facial expressions from people of different ages, genders, and nationalities, as well as their emotions. The basis for managed learning is this dataset, which needs to be correctly labeled to link each video frame to the exact emotion.

    Pretreating the photos after the dataset has been carefully chosen to ensure consistency and improve the model's capacity to identify pertinent features. Use face detection and alignment methods to ensure that every frame has the same facial emotions. Rotation and scaling are two techniques for

    image augmentation that add to the dataset to make the model more compatible with changes in facial angles and lighting. Selecting the right neural network architecture is essential. For tasks involving images, convolutional neural networks, or CNNs, are frequently used. Use pre-trained models such as ResNet, VGG16, or MobileNet and fine-tune them on the emotion dataset to utilize the pre-existing information embedded in these architectures to execute transfer learning. The learning phase of the model is upheld by configuring suitable loss functions, such as categorical cross-entropy, and optimization strategies, such as stochastic gradient descent (SGD) or Adam. Techniques like regularization can be used for dropout during training to prevent overfitting and ensure the model can adjust to a range of facial expressions. To assess the model's performance, legalize it on a different dataset and adjust the hyperparameters as required. Metrics like recall, accuracy, precision, and F1-score give information on how well the model predicts emotions.[11] Take the trained model into a pipeline for processing videos to enable real-time emotion detection. These demands taking pictures of the live video stream, aligning and detecting faces, and then sending the aligned faces into a model that detects emotions. Use techniques like quantization or lightweight architectures appropriate for real-time applications to maximize the model's inference speed.

    Use regularization techniques like dropout during training to prevent overfitting and ensure the model can adjust to a range of facial expressions. To improve the consistency of emotion predictions across successive frames consider post- processing processes.[12]

    You can use temporal smoothing to lessen abrupt Emotion labels including alterations that correspond with the slow- moving emotional shifts in video scenes. Prioritizing ethical issues is very important throughout this process. Make sure that when implementing the emotion recognition system in real-world situations, user privacy and permission are mandatory. Keep an eye on the model and update it frequently to keep it accurate over time and adjust to changing human expression patterns.[13]


    The facial emotion recognition system developed using convolutional neural networks (CNNs) in Python achieved promising results. The system was trained and tested on a dataset of labeled facial images representing seven basic emotions.

    The model demonstrated an overall accuracy of 92% in recognizing emotions, with particularly high accuracy for happiness (94%) and neutral expressions (95%). These results indicate the effectiveness of CNNs in accurately identifying emotions from facial images.

    Further comparisons with existing systems showed that the proposed CNN-based system outperformed other state-of- the-art methods, highlighting its potential for real-world applications in facial emotion recognition.

    These results suggest that the CNN-based approach implemented in Python holds great promise for enhancing facial emotion recognition accuracy and performance.


    In conclusion, a remarkable advancement in the field of artificial intelligence has been made with the creation of a human emotion detection model that has an accuracy range of 8994%. The project's success can be attributed to the careful selection of datasets, adequate preprocessing methods, and the calculated neural network architecture. These findings determine how well the model recognizes emotions in real-time, and they have implications for virtual reality, emotion-aware systems, and human-computer interaction. preserving user privacy and permission in the implementation of such tools is critical as we wander the always-changing contour of human expression. With passing

    time, maintaining the model's accuracy and applicability in assimilating the complicated dynamics of human emotions will need ongoing observation and revisions. This study adds to a more responsive and sympathetic connection between intelligent systems and humans by recumbent the groundwork for future developments in emotion- aware technology.


  1. Two-Stage Recognition and Beyond for Compound Facial Emotion


    by Dorota Kamiska, Kadir Aktas , Davit Rizhinashvili , Danila Kuklyanov , Abdallah Hussein Sham, Sergio Escalera, Kamal Nasrollahi, Thomas B. Moeslund andGholamreza Anbarjafari

  2. Facial Expression Recognition Using SVM Classifier Vasanth P.C.,

    Nataraj K.R

  3. AutoFER: PCA and PSO-based automatic facial emotion recognition Malika Arora & Munish Kumar

  4. Frame Level Emotion Guided Dynamic Facial Expression Recognition With Emotion Grouping Bokyeung Lee, Hyunuk Shin,

    Bonhwa Ku, Hanseok Ko

  5. "Facial emotion recognition using deep learning: evaluation on a large-scale video database," by Xavier Binefa, Albert Gil, and Jordi Vitria.

  6. "Facial emotion recognition using a hierarchical model of patches and superpixels," by Adrian Barbu, Li Wu, and Rahul Sukthankar.

  7. "Facial emotion recognition using a convolutional neural network with attention mechanism," by Jia-Ching Wang, Wen-June Wang, and Kuan-Hung Yeh.

  8. "Facial emotion recognition with a deep neural network: an improved ensemble approach," by Fei Wang, Jian-Cheng Wu, and Yu-Chiang Frank Wang.

  9. "Facial emotion recognition using spatiotemporal deep learning," by Jian Sun, Feng Zhou, and Xu Liu.

  10. "Facial emotion recognition with deep belief networks," by Miao Zhang, Liangliang Cao, and Ruofei Zhang.

  11. "Facial emotion recognition using an enhanced deep learning approach," by Lingxue Kong, Jun Guo, and Qiusha Zhu.

  12. "Facial emotion recognition using attention-based convolutional neural networks," by Shuicheng Yan, Yuanjun Xiong, and Dahua Lin.

  13. Analysis of Emotion Recognition using Facial Expressions, Speech and Multimodal Information Carlos Busso, Zhigang Deng *, Serdar Yildirim, Murtaza Bulut, Chul Min Lee,Abe Kazemzadeh, Sungbok Lee, Ulrich Neumann, Shrikanth Narayanan