Multi-Level Denoising Filter Bank for Speech Signal

Download Full-Text PDF Cite this Publication

Text Only Version

Multi-Level Denoising Filter Bank for Speech Signal

Multi-Level Denoising Filter Bank For Speech Signal

J. MUBEENA PARVEEN

Dept. Of Electronics & Communication Engineering, SSN College Of Engineering,

Chennai, Tamil Nadu, India. Mubeena227@Gmail.Com

R. RAJAVEL

Dept. Of Electronics & Communication Engineering, SSN College Of Engineering,

Chennai, Tamil Nadu, INDIA. Rajavelr@Ssn.Edu.In

Abstract Denoising technique helps in speech enhancement. Denoising methods based on Spectral Subtraction, Minimum Mean Square Error produces musical noise distortion. This paper, proposes a new technique for musical noise reduction in speech signal. The noisy speech signal is first decomposed in to non- uniform sub bands using a gammatone perceptual filter model with Equivalent Rectangular Bandwidth Scale (ERB). The sub bands signals are divided into frames. Then, noise estimation is carried out over each frame; if noise exits in the particular frame then the denoising algorithm is applied. In this work two denoising algorithms are used they are Minimum Mean-Square Error Short- Time Spectral Amplitude (MMSE STSA) and Bolls Spectral Subtraction (Bolls SS) technique. Finally enhanced speech signal is obtained. The effectiveness of quality enhancement was measured using Perceptual Evaluation of speech Quality (PESQ) technique. Both MMSE STSA and Bolls spectral subtraction algorithms provide better results than the existing Spectral Subtraction and MMSE technique. However, quality enhancement in MMSE STSA is much better than Bolls SS.

Keywords Gamma tone filter bank, Perceptual filter bank, Boll Spectral Subtraction (Boll SS), MMSE STSA, Denoising.

  1. Introduction

    In Speech Communication systems, when the speech signal is transmitted through the channel, noise gets added up along with the speech signal and it is very difficult to obtain completely noise free speech signal at the receiver. Hence denoising techniques such as Minimum Mean-Square Error Short-Time Spectral Amplitude (MMSE STSA) and Bolls Spectral Subtraction (Bolls SS) technique are required to improve the quality of the speech signal.

    Filter bank employs decomposition of the signal in to its low frequency and high frequency components in the analysis block and then reconstructing it at the synthesis block [1].

    signal by subtracting the noisy spectral components from the original signal. However it introduces musical noise. Hence, designing speech enhancement algorithm without introducing any perceptible speech distortion is the main challenge. In this paper, the speech signal along with noise is processed through gammatone filter bank. The output of the gammatone filter bank is divided in to frames and windowed and converted in to frequency domain using FFT. Then continuous noise estimation is carried out over each frame and the required denoising method is applied. Finally IFFT and overlap add technique is carried out and the enhanced speech signal is obtained. In MMSE STSA, a priori SNR tracking obtained using a decision-directed method is more efficient than the previous MMSE method [4, 5]. Bolls spectral subtraction technique uses spectral averaging and residual noise reduction method which is more efficient than the previous method [6]. Both the methods, reduces the musical noise to a greater extent than the previous methods thereby improving the quality of the speech signal.

  2. Proposed method

    1. Perceptual filter bank model for speech enhancement

      Although most of the filtering techniques like wiener filtering filters the noise but it does not completely eliminate the musical noise. By using speech enhancement algorithm based on perceptual model residual noise can be eliminated to some extent. In this technique the degraded input waveform is decomposed in to non-uniform decomposition using gammatone filter bank [7]. Using this technique the lower frequency component can be efficiently obtained during the synthesis process. In this paper, the degraded input signal is decomposed using Gamma tone filter bank [8].

      However, a slight modification is required in developed

      gtt Atn1 exp 2 bBccos2 fct

      (1)

      systems to suppress undesired noise. An improved spectral subtraction method for reducing acoustic noise is obtained by using weighting filter based on psychoacoustic property to reduce the residual musical noise [2]. There are several speech enhancement algorithms, among them spectral subtraction algorithms is the oldest and most popular techniques due to its simplicity in implementation [3]. It improves the quality of the

      Where A is the magnitude normalization parameter, n is the filter order, fc is the centre frequency of filters, B is filters bandwidths, and bB(fc) represents the filter envelop. The frequency resolution of human hearing with broadband signals is expressed using Equivalent Rectangular Bandwidth (ERB) scale. The expression used to convert a frequency f in Hz in its value in ERB is

      1. Mubeena Parveen, R. Rajavel

        ERB (f) = 21.4 log (0.00437fc + 1) (2)

        relation;

        Yk, gt n yn gtk n

        (3)

        The output of the gamma tone filter bank is divided in to frames and windowed using Hamming window and it is converted in to frequency domain using the FFT technique [5] and then the required speech enhancement algorithm is applied.

        C. Perceptual generalized spectral subtraction techniques using Bolls method

        The spectral subtraction estimator Se jw , is calculated

        using the relation [6],

        Se jw

        Or

        Xe jw

        • e jw e j x e jw

        (4)

        Se jw He jw Xe jw

        With

        (5)

        1

        He jw

        e jw

        Xe jw

        (6)

        e jw ENe jw

        (7)

        Where

        He jw is the spectral subtraction filter,

        e jw is

        the average value taken during nonspeech activity,

        The spectral error e jw , can be calculated as follows;

        e jw Sejw Sejw Nejw ejw ej x

        (8)

        The spectral error can be reduced by magnitude averaging, half-wave rectification, residual noise reduction and by additional signal attenuation during nonspeech activity.

        Using half-wave rectification the estimator becomes

        Fig.1. Denoising techniques using Method 1 MMSE STSA, Method 2- Boll Spectral Subtraction

        Se jw HR e jw Xe jw

        Where

        (9)

    2. Multi Bands Perceptual Filter Bank

      The processing is done by dividing the degraded waveform

      HR (e jw

      He jw He jw

      2

      (10)

      y(n), into separate bands Yk ,gt (n) based on gamma tone filter bank [7].Using gammatone filter bank the lower frequency components can be efficiently obtained as the lower frequency components are divided in to more number of sub bands than the high frequency components. These bands are individually

      The residual noise reduction scheme is given as,

      Si e jw Si e jw , for Si e jw max NR e jw

      S e jw min S e jw j i 1,i,i 1,

      (11)

      estimated using the required speech enhancement algorithm based on Bolls spectral subtraction or MMSE STSA technique. The analysis filter bank is made of 27- 4th order gamma tone filter [8]. The output of the kh filter of the analysis gamma tone filter bank can be calculated using the

      i

      Where

      i

      for Si e jw

      max NR e jw

      (12)

      1. Mubeena Parveen, R. Rajavel

    Si e jw HR e jw Xi e jw

    (13)

    f ,

    f ,

    And

    G f,

    1.5

    f ,

    exp

    2

    (20)

    f , f ,

    max NR e jw = maximum value of noise residual

    .1 f , I0

    f , I1

    2

    2 .

    measured during nonspeech activity.

    (.) is the gamma function and I (.) and I (.) denotes the

    The absence of speech can be calculated by using, 0 1

    1

    T 20 log10 2

    Se jw

    jw

    dw

    (14)

    modified Bessel functions of the zero and the first-order, respectively.

    f , is given by,

    e

    For the value of T less than -12 dB, the frame is termed as no speech activity frame .The nonspeech activity including the output attenuation for the output spectral estimate is given as

    jw Se jw T 12dB

    f , f , f ,

    1 f ,

    The noise reduction is estimated using,

    y f , Gf , y f ,

    (21)

    S e

    cX e jw

    T 12dB

    (15)

    PROP

    DS

    (22)

    Where 20 log10 c = -30dB

    D. Minimum mean square error short time spectral amplitude method

    1. MATLAB Simulation

  3. simulation

    The posteriori SNR estimate f , , is given by [4],

    Simulation is done using MATLAB. It is used due to its high performance in technical computing.

    y f , 2

    f , DS

    f ,

    (16)

    Where f , is the power spectrum of the estimated noise given by,

    f ,

    Ezf , 2 th ,

    (17)

    th

    th th

    Ezf , 2 .

    Where th is a smoothing parameter for frame window and

    a

    E.B denotes an expectation operator from A to B. The Priori SNR estimate is calculated as follows,

    f , f , 1 G2 f , 1

    Fig. 2. Block Diagram of Denoising Algorithm

    It allows computation and programming in a user friendly environment. In Perceptual Filter bank Analysis, the noisy speech signal is decomposed in to non-uniform sub bands and the required denoising method is used and then the enhanced speech signal is synthesized using Perceptual Filter bank Synthesis.

    1 Pf , 1 0 1,

    (18)

    1. Speech quality and measure

    Perceptual Evaluation method is an objective measure used

    is the weighting factor of decision directed estimation, for speech quality measurement. This method is used in order

    G f, is the spectral gain function and the operator P[.] is to evaluate the quality of the speech signal. This method

    estimated as follows,

    captures the received audio stream (degraded input) and

    compares it with the original signal and simulated using

    Pl

    0

    l l 0,

    otherwise).

    (19)

    MATLAB thus predicting the quality of the signal [9]. PESQ range corresponding to 0.5 indicates poor quality of signal and the values reaching towards 4.5 indicates efficient quality of signal.

    J. Mubeena Parveen, R. Rajavel

    Fig. 3. Block Diagram for Perceptual Evaluation of Speech Quality

  4. RESULTS AND DISCUSSION

    Method 1: Speech denoising algorithm based on Minimum Mean Square Error Short-Time Spectral Amplitude method with gamma tone filter bank reduces musical noise more efficiently than the previous MMSE method using gammatone filter bank. Using this method the PESQ value 2.38 is obtained which is better than the previous MMSE method which provides the PESQ value 2.25.

    Fig. 4. MMSE STSA with ERB Perceptual filter bank (Gammatone)

    Fig .5. Perceptual Evaluation of speech Quality (PESQ) for MMSE STSA method

    Method 2: Speech denoising algorithm based on Boll Spectral Subtraction method with gamma tone filter bank provides better result than the previous spectral subtraction method. This method provides the PESQ value 2.3 which is better than the PESQ value 2.2 of the previous Spectral Subtraction method.

    Fig. 6.Bolls Spectral Subtraction with ERB Perceptual filter bank (Gammatone)

    Fig .7. Perceptual Evaluation of speech Quality (PESQ) for Bolls Spectral Subtraction method

  5. Conclusion

This paper proposes a new technique for musical noise reduction in speech signal based on MMSE STSA with gammatone filter bank and Bolls Spectral Subtraction with gammatone filter bank. The noised speech is decomposed in to nonuniform sub bands using gammatone filter banks that are manipulated in each nonlinear block with the Bolls spectral subtraction process and the MMSE STSA technique and simulated in MATLAB. PESQ was calculated for each combination and Experimental results obtained were found to be better than the previous existing method. It was observed from the above methods that noisy signal was efficiently denoised and original speech signal reconstructed by MMSE STSA the gammatone perceptual filter bank model was found to provide better results than Bolls Spectral Subtraction in terms of better PESQ.

J. Mubeena Parveen, R. Rajavel

  1. Muhammad Amir Shafiq and Saqib Ejaz, Real Time Implementation Of Multi-Level Perfect Signal Reconstruction Filter Bank, International Journal of Engineering & Technology, Vol. 10, No. 04, pp.40-47, 2010.

  2. R. M. Udrea, N. D. Vizireanu, and S. Ciochina, An improved spectral subtraction method for speech enhancement using a perceptual weighting filter, Digital Signal Processing, vol. 18, no. 4, pp. 581587, 2008.

  3. M. Berouti, R. Schwartz, and J. Makhoul, Enhancement of speech corrupted by acoustic noise, in Proceedings of the International Conference on Acoustics, Speech and Signal Processing, pp. 208211, April 1979.

  4. Ryoi Okamoto, Yu Takahashi, Hiroshi Saruwatari, Kiyohiro Shikano, MMSE STSA Estimator With Nonstationary Noise Estimation Based On ICA For High-Quality Speech Enhancement, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ), pp. 4778-4781, 2010.

  5. Y. Ephraim and D. Malah, Speech enhancement using a minimum mean-square error-log-spectral amplitude estimator, IEEE Transactions

    on Acoustics, Speech, and Signal Processing, vol. 33, no. 2, pp. 443 445, 1985

  6. Steven F.Boll , Suppression of Acoustic Noise in Speech Using Spectral Subtraction, IEEE Transactions On Acoustics, Speech And Signal Processing, Vol. Assp-27, No. 2, pp. 113-120, April 1979.

  7. Novlene Zoghlami and Zied Lachiri, Application of Perceptual Filtering Models to Noisy Speech Signals Enhancement, Hindawi Publishing Corporation Journal of Electrical and Computer Engineering, pp.1-12,2012.

  8. V. Hohmann, Frequency analysis and synthesis using a Gammatone filterbank, Acta Acustica United with Acustica, vol. 88, no. 3, pp. 433 442, 2002.

  9. S. S.V Sumanth kotta and B.K . kommineni , Acoustic beamforming for hearing using multi micro phone array by designing graphical user interface, Blekinge Institute of technology, Sweden, pp.1-72, 2012.

J. Mubeena Parveen, R. Rajavel

Leave a Reply

Your email address will not be published. Required fields are marked *