Recognizing Emotional State of the Human Using Facial and Acoustic Features

G.Saranya; G.Mary Amirtha Sagayee; S.Priyavarsha

doi:10.17577/IJERTCONV2IS05028

NCICCT - 2014 (Volume 2 - Issue 05)

Recognizing Emotional State of the Human Using Facial and Acoustic Features

DOI : 10.17577/IJERTCONV2IS05028

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 99
Total Downloads : 17
Authors : G.Saranya, G.Mary Amirtha Sagayee, S.Priyavarsha
Paper ID : IJERTCONV2IS05028
Volume & Issue : NCICCT – 2014 (Volume 2 – Issue 05)
Published (First Online): 30-07-2018
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Recognizing Emotional State of the Human Using Facial and Acoustic Features

G.Saranya1

Post Graduate Student, Dept. of ECE Parisutham Institute of Technology and Science, Thanjavur.

Affiliated to Anna University, Chennai, India.

Email saranyaraji.91@gmail.com

ary Amirtha Sagayee2 Professor & Head, Dept. of ECE Parisutham Institute of Technology and Science, Thanjavur.

Affiliated to Anna University, Chennai, India.

Email gmasagayee@gmail.com

The feature vector for the particular image is calculated. Using Principal component the coefficient, score and latent value for the image is created.

Independent Component Analysis is used to extract the maximum information from the multiple visual channels. ICA maximizes the joint entropy and it provide brain-like visual features for the natural image. ICA can also be used for speech separation in the area of speech recognition. ICA is the unsupervised computational and statistical method to discover hidden factors in the data. Steps involved in ICA are given below.

Steps in ICA

Centring make the signals centered into zero.
Sphering make the signals uncorrelated.
Rotation maximization of an object function.

The Gabor wavelet is used to compute the Gabor features of a gray scale image. The wavelet scale, filter orientation, wavelength of small scale filter, scaling factor between successive filters, log Gabor filter transfer function, ratio of angular interval between filter orientations and the standard deviation of the angular Gaussian function, number of standard deviation of the noise energy beyond the threshold point and the polarity values are selected. The feature vector is calculated for the image by calculating the mean squared energy and mean amplitude.

A. Expression Classification

The expression is classified by calculating the distance between the feature vectors of the image. Distance classifier is used to classify the expressions. Euclidean distance is used for distance classification. Mean value for neutral expression in the dataset is calculated. Test image is subtracted from the mean neutral to provide the score value for each image.

(1)

Minimum distance is calculated for the two images and the related expression is produced as the output.

VOICE MODULE

Human emotional state is detected from the voice. The voice signal is recorded through microphone. The key point is detected by collecting the Mel Frequency Cepstral Coefficient (MFCC) and Sub band based Cepstral (SBC). The expression is classified through classifier such as K-nearest Neighbors (KNN), Hidden Markov Model (HMM), Gaussian Mixture Model, Support Vector Machine and Artificial Neural Network. In the proposed approach Gaussian mixture model is used as the classifier.

Framing & Windowing

FF

Mel Frequency Wrapping

Mel Spectrum

DCT

Log

Fig 4. Work flow in MFCC

MFCC is a powerful analytic tool in the field of recognition. MFCC mimic the behavior of human ears by applying the Cepstral analysis. It is computed based on the speech frames. For speech recognition the total number of coefficients used is between nine and thirteen. The work flow in MFCC is given in figure.4. The speech signal is split up into several frames. To avoid the unnatural discontinuities in the signal windowing process is performed. Fast Fourier Transform (FFT) is performed to convert the signal from time domain to the frequency domain. Mel-scale is the scale where the pitch are placed periodic manner. Discrete Cosine Transform (DCT) is performed to convert the signal again to

time domain. If the calculated score value is greater than 6.8 then it is considered as the perfect match.

SBC is similar to MFCC instead of FFT it uses wavelet packet transform. SBC parameters are derived from the subband energies. In SBC if the score value calculated is greater than 21.5 then the sample is considered as perfect match.

A. Expression Classification

Gaussian Mixture Model is represented as a mixture of the Gaussian densities. GMM is the linear combination of M Gaussians. The equation for the linear combination is given by,

(2)

where is a D- dimensional random vector bi(x),and i=1,.2,.M are the component densities and pi, i=1,2,..M are mixture weights. Each component density is a D-dimensional Gaussian function of the form

(3)

Where Âµ denotes the mean vector and ,denotes the covariance vector matrix. The mixture weights satisfy the law of total probability.,

(4)
FUSION

The multimodal features are extracted and combined using feature-level fusion. It is the direct fusion method in which feature vectors from the multiple modalities are concatenated to obtain a combined feature vector for a classification task.
EXPERIMENTAL RESULTS

The human expression is recognized from facial expressions and from voice tone. The training dataset is created by saving various expressions made by five persons, 24 images for happy, 12 for sad, 13 for disgust, 11 for neutral and 13 for anger. During recognition process the following steps are followed. The web camera is used to record the human expressions. The various emotions are stored in the desired location as a frame format. The stored frames are compared with the images in the database to produce the results. The training process is given in figure.5.

Fig 5. Training process

The expression for trained image is stored manually in the label file. It is given in figure.6.

Fig 6. Label file

The trained images are loaded during the testing process. The emotion is tracked to find the changes in the emotions during the testing process. Figure.7 shows the emotional tracking in real time.

Fig 7. Emotional tracking
CONCLUSION

The visual features of human emotional states are recognized to improve the performance of human system recognition during non-verbal communication. It is useful in human machine interaction. To get the efficient result the entrainment over various expressions are performed. The image which is under test is compared with the images in main database. The result will be produced by comparing and retrieving the related expressions from the main database. Time consumption for testing phase is more. The delay will be large. In the future work the delay can be reduced by using various techniques, where the dimensionality to save and retrieve the image will be greatly reduced.

REFERENCES

Bellakhdhar, Kais Loukil, Mohamed Svm Classification For Face Recognition, Faten Journal of intelligent computing volume 3 Number 4 December 2012.
Carlos Busso, Zhigang Deng , Serdar Yildirim, Murtaza Bulut, Chul Min Lee, Abe Kazemzadeh, Sungbok Lee, Ulrich Neumann, Shrikanth Narayanan , Analysis of Emotion Recognition using Facial Expressions, Speech and Multimodal Information, Emotion Research Group, Speech Analysis and Interpretation Lab Integrated Media Systems Center, University of Southern California, Los Angeles.
Ce Zhan, Wanqing Li, Philip Ogunbona, and Farzad., Real-Time Facial Feature Point Extraction, Safaei University of Wollongong, Wollongong, NSW 2522, Australia. Zhan, F. (2007). Pacific- Rim Conference on Multimedia (pp. 88-97). Germany: Springer.
Deepesh raj, A Real Time Face Recognition System Using PCA And Various Diatance Classifiers., Spring 2011.
Faten Bellakhdhar, Kais Loukil, Mohamed ABID, computer embedded system, University of Sfax 2012. SVM classification for face recognition, Journal of intelligent computing volume 3 Number 4 December.
G.U.Kharat, S.V. Dudul, 2009 Emotion Recognition from facial expression using neural networks, Human-computer systems interaction advances in intelligent and soft computimg.
Hua Gu Guangda Su Cheng Du Department of Electronic Engineerng, Feature Points Extraction from Faces Research Institute of Image and Graphics, Tsinghua University, Beijing, China. Image and vision computing NZ.
Ira Cohen, Ashuto,sh Garg, Thomas S. Huang, Emotion Recognition from Facial Expressions using Multilevel HMM, Beckman Institute for Advanced Science and TechnologyThe University of Illinois at Urbana- Champaign.
Jui-Chen Wu, Yung-Sheng Chen, and ICheng Chang., An Automatic Approach to Facial Feature extraction for 3-D Face Modeling, , IAENG International Journal of Computer Science, 33:2, IJCS_33_2_1, 24 May 2007.
K.V.Krishna., Emotion Recognition In Speech Using MFCC And Wavelet Features.,., 2013 3rd IEEE International Advance Computing Conference (IACC).
L.S.Chen. Joint processing of audio-visual information for the recognition of emotional expressions in human-computer interaction. PhD thesis, University of Illinois at Urbana-Champaign, Dept. of Electrical Engineering, 2000.
Lee, C. M., Yildirim, S., Bulut, M., Kazemzadeh A., Busso,C., Deng, Z., Lee, S., Narayanan, S.S. Emotion Recognition based on Phoneme Classes. To appear in Proc. ICSLP04, 2004.
Mase K. Recognition of facial expression from optical flow. IEICE Transc., E. 74(10):34743483, 0ctober 1991.
P.Ekman and W.V. Friesen, Facial action coding system: Investigators Guide. Consulting PsychologistsPress, Palo Alto, CA, 1978.
Priya Metri1, Jayshree Ghorpade and Ayesha Butalia,Facial Emotion Recognition Using Context Based Multimodal Approach, Int. J. Emerg. Sci., 2(1), 171-182,March 2012 ISSN: 2222-4254 Â© IJES 171, Pune.
Qiuxia wu, Zhiyong Wang, Feiqi Deng, Zheru Chi, David Dagan Feng, Realistic Human Action Recognition With Multimodal Feature Selection And Fusion. IEEE transactions on systems, man, and cybernetics: systems, VOL.43, NO, 4, July 2013.
T.Kanade,T.Kanade, J.F. Cohn, and Y. Tian. Comprehesive database for facial expression analysis. In Proc. Of 4rd Intl Conf. Automatic Face and Gesture Rec., pages 4653, 2000.
vSoroosh Mariooryad, Carlos Busso., Exploring Cross-Modality Affective Reactions for Audiovisual Emotion., IEEE Transactions On Affective Computing, Vol. 4, No. 2, April-June.
Yoshitomi, Y., Sung-Ill Kim, Kawano, T., Kilazoe, T. Effect of sensor fusion for recognition of emotional states using voice, face image and thermal image of face. Robot and Human Interactive Communication, 2000. RO-MAN 2000. Proceedings. 9th IEEE International Workshop on, 27-29.

Recognizing Emotional State of the Human Using Facial and Acoustic Features

Leave a Reply