EEMD Based Novel Approach to Find Pitch Markers EEMD Based Novel Approach to Find Pitch Markers

Download Full-Text PDF Cite this Publication

Text Only Version

EEMD Based Novel Approach to Find Pitch Markers EEMD Based Novel Approach to Find Pitch Markers

EEMD Based Novel Approach to Find Pitch Markers in Speech Signal

1Priyanka Galhotra, Sheenam Mehta, R.S. Chauhan

1Department of ECE, MM Engineering College MM University Mullana, Ambala, Haryana, India 2Department of ECE JMIT, Radaur

Kurukshetra University, Kurukshetra, Haryana, India

3Department of ECE JMIT, Radaur Kurukshetra University, Kurukshetra, Haryana, India

1aku.priyanka@gmail.com, 2sheenammehta595@gmail.com

Abstract This paper describes the novel approach to finding pitch markers (vocal tract excitation) using ensemble empirical mode decomposition (EEMD). EEMD is the method used for time-frequency analysis for any signal. Using EEMD, signal decomposed into intermediate function called IMF. This IMF is used to extract the pitch excitation in speech signal. The basic ideology behind this work is that, it assumed IMF contains single frequency at every instant of time and it can be possible one of its IMF contains fundamental frequency, so it is easy to pick information about pitch markers directly from that IMF. Here IMF 4 used for this study. To find out accurate pitch marker zero crossing points determined in IMF and after that to separate voiced, unvoiced and silence segment threshold is applied. This proposed algorithm is giving very promising and convincing results.

Index Terms EMD, EEMD, IMF, Pitch markers, white noise.

  1. INTRODUCTION

    Pitch marker (PM), is used to locate every vibration of the vocal chords. PM does not involve classifying speech into voiced or unvoiced regions but rather may use such pre- existing knowledge for locating pitch cycle markers. Broadly there are two approaches for the analysis of speech that is, pitch-synchronous and pitch-asynchronous. In pitch- synchronous analysis, pitch markers are detected from the speech signal and are used as anchor points for further processing. Alternatively, in pitch-asynchronous analysis no such pitch markers are used for processing. Generally it has been observed that pitch-synchronous analysis gives better performance compared to pitch-asynchronous analysis [1-4].

    Voiced speech analysis consists of determining the frequency response of the vocal-tract system and the glottal pulses representing the excitation source. Although the source of excitation for voiced speech is a sequence of glottal pulses, the significant excitation of the vocal-tract system is within a glottal pulse. The significant excitation can be considered to

    occur at the instant of glottal closure, called the epoch. Many speech analysis situations depend on the accurate estimation of the epoch locations within a glottal pulse. For example, knowledge of the epoch locations is useful for accurate estimation of the fundamental frequency (f0). Other potential applications of the markings of pitch period markers include analysis of jitter, prosody in speech [5], text-to-speech synthesis [6-7], analysis of voice quality, and pitch synchronous speech analysis [8]. Normally, pitch markers are associated to the glottal closure instants (GCIs) of the glottal cycles. Most pitch marker extraction methods rely on the error signal derived from the speech waveform after removing the predictable portion (second-order correlations). The error signal is usually derived by performing linear prediction (LP) analysis of the speech signal [9]. The first contribution to the detection of epochs was due to Sobakin [10]. A slightly modified version was proposed by Strube [11]. In Strubes work, some predictor methods based on LP analysis for the determination of the pitch markers were reviewed. Most of pitch marker determination methods are based on autocorrelation function Autocorrelation method [12], Cepstral method [13], AMDF [14], etc. But, all of these techniques face a few or all of these problems- windowing effect, low time resolution, low frequency resolution, etc. Later on Group delay based method [15-16] and zero frequency resonator based method developed [17-18]. Except zero frequency resonator based method all are short term processing. Only zero frequency resonator based algorithm can use on long duration signal.

    This paper work is an attempt to get rid of a few or all of these shortcomings. We can use Empirical Mode Decomposition (EMD) [19] to find the instantaneous pitch. The idea is that one of the Intrinsic Mode Frequencies (IMFs) contains the pitch information. To make sure that there is a unique IMF containing the pitch information, we need to get rid of Mode- mixing [20]. This problem could solve by Ensemble Empirical Mode Decomposition (EEMD) [21-23]. New proposed method for finding pitch markers using EEMD can be apply on long duration signal (up to 1 sec.) and determine

    the pitch markers in very good manner as good as other method used for pitch markers.

  2. ENSEMBLE EMPIRICAL MODE DECOMPOSITION

    Ensemble Empirical Mode Decomposition (EEMD) approach consists of sifting [24] an ensemble of white noise-added signal and treats the mean as the final true result. Finite, not infinitesimal, amplitude white noise is necessary to force the ensemble to exhaust all possible solutions in the sifting process, thus making the different scale signals to collate in the proper intrinsic mode functions (IMF) dictated by the dyadic filter banks. As the EMD is a time space analysis method, the white noise is averaged out with sufficient number of trials; the only persistent part survives the averaging process is the signal, which is then treated as the true and more physical meaningful answer. The effect of the added white noise is to provide a uniform reference frame in the time-frequency space; therefore, the added noise collates the portion of the signal of comparable scale in one IMF. With this ensemble mean, one can separate scales naturally without any a priori subjective criterion selection as in the intermittence test for the original EMD algorithm. This new approach utilizes the full advantage of the statistical characteristics of white noise to perturb the signal in its true solution neighborhood, and to cancel itself out after serving its purpose; therefore, it represents a substantial improvement over the original EMD.

    1. EEMD Algorithm

      The EEMD algorithm is as follows [25]:

      • Add a white-noise series, n(t), to the targeted signal, x(t), in the following description, x1(t)=x(t)+n(t). The added noise power from 5 to 25 dB was used to investigate the EEMD performance.

      • Decompose the data x1(t) using the EMD algorithm.

      • Repeat Steps (1) and (2) until the pre-set trial numbers, each time with different added white-noise series of the same power. The new IMF combination cij (t) is achieved, where I is the iteration number and j is the IMF scale.

      • Estimate the mean (ensemble) of the final IMF of the decompositions as the desired output.

    nt

    cij (t)

  3. FINDING PITCH MARKERS USING EEMD

    Step 1: Initially low pass filter is applied to the sample speech signal with the purpose of eliminating spurious frequency components .This filter is centered in the frequency 0-4kHz.

    Fig. 1: Speech signal and its corresponding IMFs

    Step 2: EEMD method has been used to decompose the filtered signal into a finite and often small number of frequency modes called Intrinsic Mode Functions (IMF). It defines the true IMF components as the mean of certain

    j

    EEMDc (t )

    i1

    nt

    (1)

    ensemble of trials, each one obtained y adding white noise of

    finite variance to the original signal.

    Step 3: Select the IMF having the highest energy, proposed as

    Where nt denotes the trial numbers. Similar to EMD, an EEMD-based partial reconstruction of ensemble IMF can be defined as:

    n1

    REEMDk EEMDc (t )

    the IMF containing the pitch information. It can be observed that IMF 4 contains the pitch information, has the highest fraction of energy, lowest fluctuation and irregularity in the instantaneous frequency.

    Step 4: Find out zero-crossings in the selected IMF. The zero-

    j

    j k

    (2)

    crossings accompanied by positive to negative transition are

    detected as the candidates for pitch markers. For convenience, the positive going zero crossings has been used in this study.

    This method to determine IMF using EEMD is applied on a

    small segment of speech signal. The resultant IMFs are shown in Figure 1.

    Step 5: Some of the detected zero-crossings may also correspond to excitations like glottal openings in voiced speech and burst and frication in unvoiced speech and these are unwanted. To determine the desired zero crossings for finding the locations of the pitch markers, search back process is applied to the detected zero crossings.

    Step 6: Threshold is then applied to the signal to locate the desired pitch markers and to eliminate the unwanted zero crossings from the silent and unvoiced part.

    The proposed algorithm has been shown in the form of a flowchart in the Figure 4 according the steps described above.

    Speech

    Low Pass Filter

    Apply EEMD (IMFs extraction)

    Select IMF 4

    Find Zero

    Apply Search

    Apply Threshold

    Pitch Locations

    Fig.2: Flow chart for proposed algorithms

  4. RESULT AND DISCUSSION

    1. Experimental Setting

      According to the principle of the EEMD, the added white noise would populate the whole time-frequency space uniformly with the constituting components of different scales separated by the filter bank. When signal is added to this uniformly distributed white background, the bits of signal of different scales are automatically projected onto proper scales of reference established by the white noise in the background. Of course, each individual trial may produce very noisy results, for each of the noise-added decompositions consists of the signal and the added white noise. Since the noise in each trial is different in separate trials, it is canceled out in the ensemble mean of enough trails. The ensemble mean is treated as the true answer, for, in the end, the only persistent part is the signal as more and more trials are added in the ensemble.

      In this study, the noise standard deviation used is 1.5 and ensemble size is 1000 i.e. no. of trials. These both parameters can vary upto their right combination. The noise standard deviation can vary from 0.2 to 2.5 or so on as per the no. of trials gives the appropriate results..

    2. Implementation of proposed algorithm

      EEMD method has been used to decompose the filtered signal into a finite and often small number of frequency modes called Intrinsic Mode Functions (IMF). It defines the true IMF components as the mean of certain ensemble of trials, each one obtained by adding white noise of finite variance to the original signal.IMF having the highest energy , proposed as the IMF containing the pitch information. It can be observed that IMF 4 contains the pitch information, has the highest fraction of energy, lowest fluctuation and irregularity in the instantaneous frequency. The zero-crossings accompanied by positive to negative transition are detected as the candidates for pitch markers. For convenience, the positive going zero crossings has been used in this study. Some of the detected zero-crossings may also correspond to excitations like glottal openings in voiced speech and burst and frication in unvoiced speech and these are unwanted. To determine the desired zero crossings for finding the locations of the pitch markers, search back process is applied to the detected zero crossings. Threshold is then applied to the signal to locate the desired pitch markers and to eliminate the unwanted zero crossings from the silent and unvoiced part. The result obtained by the proposed algorithm has been shown in the Figure 5.

  5. CONCLUSION

In this study, we have proposed a novel and effective approach for determining pitch markers in speech signal which operates using the Ensemble Empirical Mode Decomposition (EEMD) technique. The basic principle of the Ensemble Empirical Mode Decomposition (EEMD) is simple; the new method indeed can separate signals of different scales without undue mode mixing. Adding white noise helps to establish a dyadic reference frame in the time-frequency or time-scale space. The real data with a comparable scale can find a natural location to reside. The EEMD utilizes all the statistical characteristic of the noise. Since the role of the added noise in the EEMD is to facilitate the separation of different scales of the inputted data without a real contribution to the IMFs of the data, the EEMD is a truly noise-assisted data analysis (NADA) method that is effective in extracting pitch information from the speech signal. The truth defined by EEMD is given by the number in the ensemble approaching infinity. But the number of the trials in the ensemble, N, has to

Fig.3: Results from proposed algorithm for detection of pitch markers (a) A segment of speech signal, (b) corresponding IMF 4 of speech signal, (c) zero crossing points in IMF signal, (d) zero crossing points after applying threshold, and (e) pitch marker points corresponding speech segment.

be large. The proposed method for pitch marker detection is very efficient and is providing very promising result.

REFERENCES

    1. A. K. Krishnamurthy and D. G. Childers, "Two-channel speech analysis," IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-34, pp. 730-743, Aug. 1986.

    2. D. Y. Wong, J. D. Markel, and A. H. Gray, "Least squares glottal inverse filtering from the acoustic speech waveform," IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-27, pp. 350-355, Aug 1979.

    3. D. G. Childers and C. K. Lee, "Voice quality factors: Analysis synthesis and perception," J. Acoust. Soc. Amer, vol. 90, pp. 2394-2410, 1991.

    4. B. Yegnanarayana and N. J. Veldhuis, "Extraction of vocal-tract system characteristics from speech signals," IEEE Trans. Speech Audio Processing, vol. 6, pp. 313- 327, July 1998.

[5]. Harbeck S., Kiebling A., Kompe R., Niemann H. and Nöth E, Robust pitch period detection using dynamic programming with an ANN cost function, Proc. EUROSPEECH, Madrid, vol. 2, pp. 1337-1340,

September 1995.

[6]. V.Colotte and Y Laprie, Higher precision pitch marking for TD-PSOLA, Proceedings of XI European Signal ProcessingConference (EUSIPCO), Toulouse, 2002.

[7]. Laprie, Yves and Colotte, Vincent, Automatic pitch marking for speech transformations via TD-PSOLA, European Signal Processing Conference (EUSIPCO), Rhodes, 1998.

[8]. Moulines, E. and Charpentier, F., Pitch-Synchronous Waveform Processing Techniques for Text-To-Speech Synthesis Using Diphones, Speech Communication, 9: 453-467, 1990.

  1. J. E. Markel and A. H. Gray, Linear Prediction of Speech. New York: Springer-Verlag, 1982.

  2. A. N. Sobakin, Digital computer determination of formant parameters of the vocal tract from a speech signal, Soviet Phys.-Acoust., vol. 18, pp. 8490, 1972.

  3. H.W. Strube, Determination of the instant of glottal closures from the speech wave, J. Acoust. Soc. Amer., vol. 56, pp. 16251629, 1974.

  4. Lawrence R. Rabiner, On the Use of Autocorrelation Analysis for Pitch Detection, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol Assp-25, no. 1, February 1977.

  5. A.M. Noll, Cepstrum pitch determination, J. Acoust. Soc. Amer. 41 (2) (1967) 293309.

  6. M. J. Ross, H. L. Shaffer, A. Cohen, R. Freudberg, and

    H. J.Manley, Average magnitude difference function pitch extractor, IEEE Trans. Acoust., Speech, Signal Processing,vol. ASSP-22, pp. 353-362, Oct. 1974.

  7. K. Rao, S. Prasanna, and B. Yegnanarayana, Determination of instants of significant excitation in speech using hilbert envelope and group delay function, IEEE Signal Process. Letters, vol. 14, no. 10, pp. 762 765, 2007.

  8. S. Prasanna and A. Subramanian, Finding pitch markers using first order gaussian differentiator, in Third Int. Conf. on Intelligent Sensing and Inf. Process., 2005, pp. 140145.

  9. L. R. Rabiner, M. J. Cheng, A. H. Rosenberg and C. A. McGonegal. A comparative performance study of several pitch detection algorithms. IEEE Trans. Acoust., Speech, Signal Processing, 24(5): 399-417, 1976.

  10. B. Yegnanarayana and K.Sri Rama Murty, Event-based instantaneous fundamental frequency estimation from speech signals, IEEE Trans. Audio, Speech and Language Processing, Vol.17, No.4, May 2009.

  11. Flandrin, P., Rilling, G. and Goncalves, P., Empirical mode decomposition as a filter bank, IEEE signal processing letters, Vol. 11, No. 2, pp.112-114, 2004.

  12. G. Schlotthauer, M. E. Torres, and H. L. Rufiner, Voice fundamental frequency extraction algorithm based on ensemble empirical mode decomposition and entropies, in Proc. 11th Int. Congr. of the IFMBE, Munich, 2009, pp. 984987.

  13. G. Schlotthauer, M. E. Torres, and H. L. Rufiner, A new algorithm for instantaneous F0 speech extraction based on ensemble empirical mode decomposition, in Proc. European Signal Processing Conference, Glasgow, Scotland, August 2009.

  14. Huang, N. E., Shen, Z., Long, S. R., Wu, M. C., Shih, E. H., Zheng, Q., Tung, C. C., Liu, H. H.: The empirical mode decomposition method and the Hilbert spectrum for non-stationary time series analysis, Proc. Royal Society London 454A, 1998, p. 903995.

  15. Wu, Z., Huang, N. E. (2004). A study of the characteristics of white noise using the empirical mode decomposition method, Proceedings of the Royal Society A, 460, 15971611.

  16. J.D. Markel, The SIFT algorithm for fundamental frequency estimation, IEEE Trans. Audio Electroacoust. AU-20 (1972) 367 377.

  17. Z. Wu and N.E. Huang Wu, Z., Ensemble Empirical Mode decomposition: a noise-assisted data analysis method. Advances in Adaptive Data Analysis, vol. 1, pp. 1-41, 2009.

Leave a Reply

Your email address will not be published. Required fields are marked *