Speaker Verification System based on Type-2 Fuzzy Gaussian Mixture Models

Mrs. S. Gayathri

doi:10.17577/IJERTCONV5IS13102

ICONNECT - 2017 (Volume 5 - Issue 13)

Speaker Verification System based on Type-2 Fuzzy Gaussian Mixture Models

DOI : 10.17577/IJERTCONV5IS13102

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 128
Total Downloads : 26
Authors : Mrs. S. Gayathri
Paper ID : IJERTCONV5IS13102
Volume & Issue : ICONNECT – 2017 (Volume 5 – Issue 13)
Published (First Online): 24-04-2018
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Speaker Verification System based on Type-2 Fuzzy Gaussian Mixture Models

Mrs. S. Gayathri

Assistant Professor,

Department of Electronics and Communication Engineering, K.Ramakrishnan College of Technology,

Tiruchirappalli, Tamilnadu, India.

Abstract This paper proposes the use of Type-2 Fuzzy Gaussian Mixture Models (T2 FGMMs) in a Speaker Verification system. Type-2 Fuzzy Gaussian Mixture Model is an extension of GMM based on Type-2 Fuzzy Sets (T2 FSs). It uses Footprint Of Uncertainty (FOU) and interval secondary Membership Function (MF) to handle GMMs uncertainty in estimating the parameters mean Âµ and covariance matrix . The proposed methodology for Speaker Verification system uses speech files from the TIMIT database for training and testing phases. The test features are applied to the trained model and the verification decision is made using Generalized Linear Model (GLM). The experimental results showed that T2 FGMMs provide a low Equal Error Rate (EER) than GMM indicating that T2 FGMMs gives better performance than GMMs in a Speaker Verification system.

Keywords -Gaussian Mixture Model, Fuzzy Sets, Type-2Fuzzy Gaussian Mixture Model, Footprint Of Uncertainty, Generalized Linear Model, Equal Error Rate.

INTRODUCTION

Speaker Verification refers to the task of determining the claimed identity of the unknown speaker. It plays a major role in biometrics and security. It is used in Automatic Speaker Verification (ASV) systems for access control. They are also used for voice telephony, voice mail, tele-banking, tele-shopping and secure transfer of confidential information. In Speaker Verification, GMM is used to model the distribution of feature vectors of speaker utterances.

Gaussian Mixture Models (GMMs) are widely used in modeling because of their universal approximation ability. They can model any density function if they contain enough mixture components [5].GMMs are used for clustering, object tracking, background subtraction, feature selection, signal analysis, learning and modeling [6].GMM based methods have been developed to meet specific applications such as adapted GMMs [1], Mahalanobis distance based GMMs [6], wrapped GMMs [6] and Active curve axis or GMMs (AcaGMMs) [6].

Real world problems often encounter uncertainties in the system parameters due to noisy data. The various sources of uncertainties occurring in a Speaker Verification system can be grouped into [3]: (a) Insufficient or noisy training data can make parameters of the model uncertain, so that the mapping of the model is also uncertain. (b) The relationship between training data and unknown test data is uncertain due to limited prior information. (c) The linguistic labels can be

uncertain since the same observation may mean different things to different people. All of these uncertainties can be considered as fuzziness resulting from incomplete information, i.e., fuzzy observations, fuzzy models, and fuzzy labels. The nature of uncertainty in a Speaker Verification system can be categorized into three types [8]: (i) Fuzziness (vagueness), which results from the imprecise boundaries of fuzzy sets. (ii) Non-specificity (information based imprecision) which is connected with sizes (cardinalities) of relevant sets of alternatives. (iii) Strife (discord), which expresses the conflicts among the various sets of alternatives.

The uncertainties occurring in the GMM parameters can be handled by Type-2 Fuzzy Sets (T2 FSs) [5]. The Type-2 Fuzzy Sets are used to describe the fuzziness of the GMM parameters: the mean vector Âµ and the covariance matrix [5]. These Type-2 Fuzzy Sets (T2 FSs) can describe and estimate the uncertainties due to their three dimensional fuzzy Membership Functions (MFs) [2]. In contrast, type-1 fuzzy sets cannot directly model the uncertainties due to their crisp MFs and two dimensional structures of the MFs [2]. Type-2 membership functions can simultaneously evaluate randomness and fuzziness by using Footprint Of Uncertainty (FOU) and interval secondary Membership Functions as shown below [4].

Figure (1): The three-dimensional Type-2 Fuzzy Membership Function (T2 MF) (a) shows the primary membership with the lower (thick dashed line) and upper (thick solid line) membership functions, where and are the lower and upper bounds given the input x respectively. The shaded

region is the Footprint Of Uncertainty (FOU). (b) shows the Gaussian secondary membership function. (c) Shows the interval secondary membership function. (d) Shows the mean has a uniform membership function.

The various features of Type-2 Fuzzy Sets comprises of [3]: T2 FSs can represent more uncertainties simultaneously by using primary and secondary Membership Functions (MFs). T2 FSs can handle uncertainties covered by Foot Print of Uncertainty (FOU) efficiently by propagating the uncertainties. Different defuzzication techniques of T2 FSs may produce different results giving additional flexibility to design systems.

Based on these Type-2 Fuzzy Sets (T2 FSs), a new extension of GMM is obtained known as Type-2 Fuzzy GMM (T2 FGMM) which is the key part of the proposed Speaker Verification system. Section II describes the proposed system for Speaker Verification using T2 FGMM. Section III gives the experiments conducted with their results during the training and testing phases of the proposed Speaker Verification system. Section IV discusses the future direction and conclusion obtained from the observations of the previous section.
PROPOSED SYSTEM
The Speaker Verification process is made using Generalized Linear Model (GLM) [4]. Here, the interval likelihoods obtained using T2 FGMMs are specified by the upper and lower bound values and respectively. Then weights are assigned to the upper and lower bound values. After that, the weighted upper and lower bounds are linearly combined to give the output for decision making. Finally for performing verification, a threshold value is fixed and score of the speaker is compared with the threshold. If the score value if greater than the threshold value, the speaker is accepted otherwise rejected.
EXPERIMENTS AND RESULTS

The proposed system for Speaker Verification uses the TIMIT (Texas Instruments Massachusetts Institute of Technology) database. It consists of speech files extracted from 30 female speakers and 71 male speakers. The speech data for each speaker includes 10 speech files, each of about 2-3 seconds duration. Here, the dialect region1 (dr1) speech signals from the TIMIT database was taken in which a single speaker is made to utter different sentences such as: Dont ask me to carry an oil rag like that and She had your dark suit in greasy wash water all year. The Speaker Verification system is implemented in the MATLAB platform.

The initial step is pre-processing of the signal using Voice Activity Detection (VAD). This voice activity detection deals with removing the silence duration i.e., it removes the silence portion present at the beginning and end of adjacent samples. It determines the silence, voiced and unvoiced regions present in the speech signal. First, the input speech is divided into number of segments. Then, the energy and zero crossing rates for each segment are calculated. After estimating the energy and zero crossing rates, they are compared with a pre- defined threshold. If the segment has high energy and lesser number of zero crossings then it is labeled as a voiced

portion. On the other hand, if the segment contains low energy with more number of zero crossings then it is labeled as an unvoiced portion. This VAD performed signal is given as input to feature extraction carried out by using MFCCs.

The next step after VAD is the feature extraction process. Mel- Frequency Cepstral Coefficients (MFCCs) are used to extract the feature from the speakers of the TIMIT database. Here, the first five female speakers of dialect region1 (dr1) is taken. Each speaker is made to utter eight sentences. Technique of computing MFCC is based on the short-term analysis, and thus from each frame a MFCC vector is computed. Finally, thirteen coefficients on Mel-scale were extracted.

After feature extraction process, the training data and testing data are modeled using GMM-UBM system.
1. Training Phase: For training phase, the first five female speakers of dr1 are taken. Each speaker is made to utter eight sentences. GMM containing 10 mixtures of dimension 13 is used. Only the diagonal values are considered. Also the UBM model is developed using GMM adaptation. After that, the desired features i.e., the features that really establish the characteristics of the speaker are considered and modeled using GMM and T2 FGMM. For analysis, the results of GMM and T2 FGMM are compared with each other.
2. Testing Phase: In testing phase, the first three female speakers are considered to be imposters. They are made to utter only two sentences of dr1. Durig testing, the scores are calculated for each speaker. The mean value of the score value is kept as the threshold. Now, the T-norm score values are combined and their average value is found out. This is given to the Detection Error Tradeoff (DET) function to get the performance curves. This curve gives the value of Equal Error Rate (EER in %) at which the false alarm probability (in %) and miss probability (in %) are almost equal.
  
  TABLE I: PERFORMANCE EVALUATION OF GMM AND T2 FGMM IN TERMS OF EER
  
  Model
  
  Equal Error Rate (in %)
  
  GMM
  
  16.0
  
  T2 FGMM-UM
  
  14.2
  
  T2 FGMM-UV
  
  13.9
  
  The above given results shows, that the proposed system for Speaker Verification, achieves a minimum Equal Error Rate (EER) than the existing GMM based speaker verification systems.
CONCLUSION AND FUTURE DIRECTIONS

The performance of proposed speaker verification system based on Type-2 Fuzzy Gaussian Mixture Models (T2 FGMMs) is analyzed. GMM model is developed in order to provide a comparative analysis between GMM and T2 FGMM. Due to noisy data in real world problems, Speaker Verification systems are more subjected to uncertainties. These uncertainties in the system are directly modeled by T2 FGMM which uses Footprint Of Uncertainty (FOU) and

interval secondary membership function.The mean and standard deviation values estimated using the GMM based speaker model are used in T2 FGMM to calculate the uncertain mean vector (UM) and uncertain covariance matrix (UV) by creating a Membership Function (MF) for them. The uncertainty parameter k used in T2 FGMM set the intervals in which the parameters Âµ and vary. The verification process is made over the mean and variance intervals. The speaker acceptance/rejection decision is made by using Generalized Linear Model (GLM).The proposed method for Speaker Verification performs better than the existing techniques. Our future direction is to implement the proposed method under noisy conditions and use hybrid modeling techniques for further enhancement in verification process.

REFERENCES

Reynolds D. A., Quatieri T. F. and Dunn R. B., Speaker Verification using adapted Gaussian Mixture Models, Digital Signal Processing, vol. 10, no. 1-3, pp. 19 41, 2000.
Mendel J.M. and John R.I.B., Type-2 Fuzzy Sets made simple,IEEE Transactions on Fuzzy Systems, vol. 10, no. 2, pp 117127,2002. 3, pp. 19 41, 2000.
Zeng J. and Liu Z-Q., Type-2 Fuzzy Sets for handling uncertainty in pattern recognition, IEEE International Conference on Fuzzy Systems, pp. 65976602, 2006.
Zeng J. and Liu Z-Q., Type-2 Fuzzy Sets for pattern recognition: the state-of-the-art, Journal of Uncertain Systems, vol. 1, no. 3, pp.163177, 2007.
Zeng J., Xie L., and Liu Z-Q., Type-2 Fuzzy Gaussian Mixture Models, Pattern Recognition, vol. 41, no. 12, pp. 36363643,2008.
Zhaojie Ju ,Honghai Liu, Fuzzy Gaussian Mixture Models, Pattern Recognition, vol. 45, pp. 11461158,sep 2011.
Jerry M.Mendel., Advances in Type-2 Fuzzy Sets and systems, ScienceDirect, Information Sciences 177 (2007) 84110.
Zeng J. and Liu Z-Q., Type-2 Fuzzy Markov Random Fields and their Application to Handwritten Chinese Character Recognition, IEEE Transactions On Fuzzy Systems, Vol. 16, No. 3, June 2008.
Bachu R.G., Kopparthi S., Adapa B., Barkana B.D, Separation of Voiced and Unvoiced using Zero crossing rate and Energy of the Speech Signal.
L.R.Rabiner and M.R.Sambur, An Algorithm for determining the end points of isolated utterances, Bell system technical journal 1975.
Tsang Ing Ren, Dimas Gabriel, Hector N. B. Pinheiro and George D.

C. Cavalcanti, Speaker Verification Using Type-2 Fuzzy Gaussian Mixture Models, 2012 IEEE International Conference on Systems, Man, and Cybernetics.
Hector N. B. Pinheiro, Tsang Ing Ren, George D. C. Cavalcanti, Tsang Ing Jyh and Jan Sijbers, Type-2 fuzzy GMM-UBM for text- independent speaker verification, 2013 IEEE International Conference on Systems, Man, and Cybernetics.

Model	Equal Error Rate (in %)
GMM	16.0
T2 FGMM-UM	14.2
T2 FGMM-UV	13.9

Speaker Verification System based on Type-2 Fuzzy Gaussian Mixture Models

Leave a Reply