Speaker Dependent Emotion Recognition System

Download Full-Text PDF Cite this Publication

Text Only Version

Speaker Dependent Emotion Recognition System

Priyadarshini N S#1,Priyanka B M#2,Priyanka R#3,Pushpa K#4,Dr. Jayanth J*1

#Under Graduate student, ECE Dept, GSSSETW, Mysuru, India

*Professor in ECE Dept, GSSSETW, Mysuru,570016

Abstract-

Emotion isian integrated feature that creates a voidibetween humans and humanoids. In order to fulfill this void,iemotion r ecognition plays an important role. Though there areimany ot her methods toirecognize emotions, we haveichosen speech as a basis for extraction ofiemotionsiasiit is less effected from env ironmentaliconstrains such as magnetic field, light andiotherif actor. Emotion recognition has already been implemented in many languagesiexcept for Kannada. In this paper, we have cr eated a system where in Kannada speech is an input andiemot ion is theioutput. Praat isiusedito extract features from the spe ech signal which is given by theispeaker as input. AiGUIiin M ATLAB has been created to interfaceihuman speech with the system. The neural network takes the features extracted from praat software anditest theidata to the trained feed forwardin euralinetworkiandirecognizes theibasiciemotion ofihumansisuc h as sad, happy, angry .

Keywords-

Emotion recognition, MATLAB, Praat, Speech database neurali network trainingiandivalidation, confusionimatrix.

  1. ARCHITECTURE Humanicommunicationidevelopediwithitheioriginiofispeechi

    whichiapproximatelyidatesibackitoi500,000iBCE.iFromithisispee chihumansicouldiexpressitheiriemotions[4].iTheseiemotionsiofih appy,isad,iangryietciplayianiimportantiroleiiniunderstandingione sisituationioridesireiiniaibetteriway.Thusiemotionicanibeidefined iasiaistrongifeelingiofionesicircumstance.iBeforeitheiinventioni ofiemotionirecognitionisystem,itheimachinesicouldinotiefficientl yiidentifyipersonsiemotioniandirespond.iByiintroducingithisisys tem,itheimachinesiareiableitoirespondiefficientlyitoitheiusersine edsiandiincreaseitheiutilityiofitheimachinesicorrespondingitoione sineeds.iInithisisystemiweihaveicreatediGUIitoirecordispeechifr omitheispeaker.iFromitheirecordedispeechitheifeaturesiareiextrac ted.iTheseiextractedifeaturesiareisentiasitestidataito

    recognizeitheibasiciemotionsiofihumansisuchiasihappy,isad,iangr y[3].iThisirecognizediemotioniisidisplayed ioniscreen.

  2. METHODOLOGY

    Fig.

    1. Block diagram

      Speechiinput i:iEmotioniisiidentifiedibased ionitheispeechigivenibyitheispeaker,itheisystemitakesiKannadaispee chiasitheiinputifromitheispeaker.

      Pre-processing i:iTheifeatureiextractioniprocessirequiresispeechisignaliwithoutitheie xternalinoise.iInorderitoiremoveithisinoiseisystemiusesispectralisubt ractioniandiremovesiexternalinon-periodicinoises.

      Featureiextraction i:iInithisiprocess,ifeaturesiareiextractedifromitheioriginalidataset,ibyi decreasingitheiamountiofivariables.iAsitheinumberiofifeaturesiincre ase,itheiaccuracyiincreasesiaccordingly.

      Pitch i:iItirepresentsitheivariationiofiaitoneigivingiprosodiciinformationiof ianiutterance.

      Intensity i:iPowericarriedibyiaisoundiwaveiperiunitiareaiinitheidirectioniperpe ndicularitoithatiarea.

      Jitter i:iDeviationifromitrueiperiodicityiofiaipresumablyiperiodicisignal.

      a) Shimmer

      : It relates to amplitude variation ofithe sound wave.

      1. Classification

        : The classifier differentiates the emotions ofithe speech. The re are many classifiers such as HMM, GMM, ANN, SVM etc

        . In this project ANN classifier includingifeed forward neural network is used.

        ANN is popular choice as it can faciliate non linear relationsh ip between features and classes. It has

        3 layers (input, output, hidden). In ANN, generally 90% data is used foritraining and

        10% is used for validation. ANN is a highly adaptable learni ng machine.

      2. Output

      : The desirediemotion to be obtained is represented in the for miof emojis hence indicating the recognition ofiparticular em otion[4].

      TABLE I English transcription ofitheiKannada text.

  3. DATABASE

    Kannadaiis one ofithe Southern Dravidian language, andiits hi story dividediinto three periods: Halegannada from 450

    1200 CE, Nadugannada from 1200

    1700, and Modern Kannada from

    1700 to the present. Kannada is influenced by Sanskrit. Influ ences ofiPrakrit and Pali, can also seen in Kannada language. We used read type as our speech corpora. For analyzing the e motion we considered

    100 Kannada sentence. The totalinumber ofifeature in dataset of 600 (100 sentence *3 emotion

    *2artist) were

    9. The proposed emotioniin speech corpus are Angry, Sad, H appy.

    ThePraatsoftwareisusedtorecordtheKannadasentences. The rec ording factors considered here are mono channel and samplin g frequency of

    44.1 kHz. The audio file recorded is saved in WAVifile form at for further feature extraction to be simple. Multiple sentenc es in Kannada recited by the speaker are collected through wh ich the features suchias pitchiparameter (mean pitch, SD pitch

    , min pitch, max pitch), duration, jitter, shimmer are extracted and tabulated to createithe database.

    Pitch –

    degree ofihighness or lowness ofitone. Inten sity – degree ofiloudness.

    Jitter – deviation from the tune periodicity.

    Shimmeri- periodic variationibetweeniamplitude peaks. These features were chosen, as they were giving

    90 percent ofiaccuracy. Where as considering other features. By adding other features to these features just increase the acc uracy ofithe emotion recognition system by just

    10 percent. Thus limitingiour work to these features yields hi gheriefficiency with less number ofifeatures and limiting the database. Randomiaudio clips were played and each clips wer e verified with corresponding emotion with the domain expert s.

    The data sample from the speakeriwere artified and was recor ded usingipraat software in noise less environment.

    The data collected from speakers here is audio samples. The s entences taken for training the systemiareiroughly

    100, however few sentence are given in the TABLE I[7].

    Sent.id

    Sentence

    1

    Shale bahalaidooradalede. (School is very far)

    2

    Nanu oorigeihoguthene. ( I'm going to town)

    3

    Swalpaimellageimathadi. (talk in low voicei)

    4

    Nanage sahayaimadi. ( Please help me)

    5

    Navu adanu nodidevu. (iWe sawiiti)

    6

    Shalege makkalu baralilla. (Childre n did not come to school)

    7

    Ninna hesaru yenu. (What is your name)

    Sent.id

    Sentence

    1

    Shale bahalaidooradalede. (School is very far)

    2

    Nanu oorigeihoguthene. ( I'm going to town)

    3

    Swalpaimellageimathadi. (talk in low voicei)

    4

    Nanage sahayaimadi. ( Please help me)

    5

    Navu adanu nodidevu. (iWe sawiiti)

    6

    Shalege makkalu baralilla. (Childre n did not come to school)

    7

    Ninna hesaru yenu. (What is your name)

  4. TRAINING AND VALIDATION

    Once the database is created, it is used foritraining andivalid ation using the MATLAB software, where the MATLAB h asinbuilt tool for training

    neural network namely nntool(neural network toolb ox) and for verifyingivalidationianditesting namely nftool(ne ural fitting toolbox). The feed forward back propagation is th einetwork type used for trainingitheisystem where former up date the data in forward direction and back propagation helps to reduce the error by working has feedback system. The T RAINLM is used as training function that change the weight andibias basedon levenberg-marquardt

    optimization

    further TRAINLM is the fastest among all the algor ithmiin MATLAB toolbox.

    There are

    3 types ofilayers as usual in feed forward neural network na mely input layer, hidden layer and output layer. The input la yericonsist of 9 neuron ,output consist of 3 neuron single hidden layer is considered with

    6 neuron and the system is trained. 70% ofithe dataset is used for training

    15% ofidataset is used forivalidation and 15% ofiremaining data is used for testing[4].

  5. EXECUTION Graphical User Interface (GUI) created as user-

    friendly method using MATLAB software. The GUI includ es

    5 push buttoniand one axes. For selecting the system wheth erito identify the emotioniofimale or female two button wer e included, further after selecting it has the pushibutton for r ecord ,stop and play. The record andistop button perform op erationias the name

    onithe button. Playibutton play the denoisediaudio ofithe recor dediaudio.

    Fig. 2. GUIifor emotion detection.

    GUIitakes the real-

    time input from the speaker and is saved iniaitemporary file f or further analysis. Praat software is invoked in GUI through MATLAB to extract the required features fromithe real-

    time input. These extractedifeatures are sent as a test dataito t he already trained neural network. The neural network analys es and compares from already trainedidataset and recognizes t he emotion. The result is then sent to GUI for display[8].

  6. SOFTWARES

    Two softwares were usedithroughout the project they are MA TLAB and PRAAT.

    1. MATLAB: Highiperformance for technical computing. It has data structures, built-

      iniediting and debugging tools and supports object orientedipr ogramming. It generates displays orioutputs when commands are executed. It combines calculationiand graphic plotting. It i s designed foriscientific computing.

    2. PRAAT

    : It is a computeriprogram with which you canianalyse, synth esize, andimanipulate speech, andicreate high-

    quality pictures for youriarticles and thesis. It is a freeware pr ogramifor the analysis and reconstruction ofiacoustic speech.

  7. RESULT

The above project was verified using two methods one with t he inbuilt toolboxiin MATLAB andithe other methodiby givin g dataset manually.

Fig. 3. Confusion matrix forimale emotion detection.

The above figure shows that the system detecting male emoti on iniour experiment is 95.2% accurate.

Similariresults were obtained in female speech, emotion

Here in MATLAB the nprtool is usedito obtain confusion mat rix

The confusion matrix ofimale and female speechiemotion rec ognitionifor training, validation, testing Using nprtooliis as sh owniin figures 3 and 4.

The confusion matrix obtained during manual testing method for male emotionidetection.

Fig. 4. Confusion matrix for femaleiemotion detection.

TABLEiIIi system accuracyiforimale emotion recognition.

Sad

Angry

Happy

overall

Sad

100

0

0

100

Angry

0

100

0

100

Happy

5

15

80

80

overall

93

The Table II shows the emotion recognition ofithe male to det ect exact emotion in real time by giving randomisentence ofi eachiemotion, the result obtained were written in percentage. When the happy sentence were given to the system the error was more compared to that ofiangry and sad sentence. The fa ct for the reduced accuracy in happy sentence is that iniangry sentence pitch will be highiand for the sad sentence pitch will be low but in the happy sentence the pitch value will be in be tween the otheritwo emotion. However, the main feature we c onsidered is pitch andiits parameter the accuracy will be depe ndent onipitch.

The confusion matrix obtained during manual testing for fema le emotion detection.

TABLE III System accuracy for female emotion recognition

Sa d

Ang ry

Hap py

over all

Sad

10

0

0

0

100

Angr y

0

100

0

100

Hap py

10

20

70

70

over all

90

The Table III shows the emotion recognition ofit he female at the real time by giving

10 sentence ofieach emotion, the result obtained were written in percentage. The result obtained d

uring manual testing for male emotion recognitio n is around

93 percent and forifemaleiemotion recognition it was around 90 percent.

The GUI result obtained afteriangry male sentenc e is given as input, is as shown in the below figur e

5 similarly GUI will display the emotion ofidiffe rent type .

Fig. 5 GUIiafter the recognition ofiemotion

Leave a Reply

Your email address will not be published. Required fields are marked *