Frequency Domain Implementation of Advanced Speech Enhancement System on TMS320C6713DSK

DOI : 10.17577/IJERTCONV3IS19229

Download Full-Text PDF Cite this Publication

Text Only Version

Frequency Domain Implementation of Advanced Speech Enhancement System on TMS320C6713DSK

Zeeshan Hashmi Khateeb Gopalaiah

Student, M.Tech 4th Semester, Research Scholar and Associate Professor Department of Instrumentation Technology Department of Instrumentation Technology

Dayananda Sagar College of Engineering Dayananda Sagar College of Engineering Bangalore, India Bangalore, India

Abstract– Human speech communication typically takes place in complex acoustic backgrounds with environmental sound sources, competing voices and ambient noise. The presence of background noise can substantially degrade the speech communication system by reducing the quality of the signal, intelligibility and increase listener fatigue. This necessitates the need for noise reduction in the world of telecommunication and has become a subject of intense research in the recent years. The project focuses on the implementation of Speech Enhancement System by background noise suppression using (fixed point DSP processor based) TMS320C6713DSK. Speech enhancement system is developed using various algorithms in terms of signal-to- noise ratio and various other performance measures. The proposed algorithms continuously evaluate the noise with a noisy speech signal by using channel SNR estimator. This results in more precise SNR estimate available for gain calculation which further improves the quality of speech along with sufficient noise suppression. Software implementation of the algorithm is done with the help of MATLAB tool. The hardware implementation is performed using DSK for which coding is done in code composer studio. Listening tests are performed to determine the subjective quality and intelligibility of speech enhanced by this method.

In this project, TIA127-B compliant (Narrow Band Speech Enhancement System) Noise Suppression systems are to be simulated using MATLAB and is implemented on TMS320C6713 DSK to demonstrated in real time.

  1. INTRODUCTION

    In modern hands free speech communication environment, there often occurs a situation that the speech signal is superposed by background noise (see Fig.1). This is particular the case if the speaker is not located as close as possible to the microphone. The speech signal intensity decreases with growing distance to the microphone. It is even possible that background noise sources are captured at a higher level than the speech signal. The noise distorts the speech and words are hardly intelligible. In order to improve the intelligibility and reduce the listeners (FES) stress by increasing the signal to noise ratio a noise reduction

    procedure also called speech enhancement algorithm is applied.

    Historically, pre-processor single-channel speech enhancement algorithms have been considered in the context of robust speech coding, (see Fig. 2). These algorithms are designed to operate in an environment where only the noisy signal is available, and both facilitate the operation of the speech codec (coding and decoding) and improve the perceived sound quality at the end user.

    Acoustic background noise in mobile speech communication systems, while largely inevitable, can have a severely detrimental effect on speech intelligibility. Noise suppression is highly desirable in these systems. However, the process of reducing noise in a speech signal is associated with distortion of the processed signal, the severity of which is generally proportional to the amount of noise suppression applied.

    Fig.1. Speech signal superposed by background noise

    In a single-channel application, the noise suppression algorithm needs an additional module for the estimation of the noise and clean speech statistics. The underlying idea in all these algorithms is that the noise statistics can be

    estimated from the signal segments, either in the time or in the frequency domain, where the speech energy is either low, or the speech signal does not exist at all.

    The classical noise suppression scheme is based on the idea of spectral subtraction. It is widely used nowadays, mainly because of its simplicity. Spectral subtraction schemes are based on direct estimation of the short time spectral magnitude of clean speech. A drawback of this algorithm is the musical noise. Musical noise consists of tones with the same duration as the window length of algorithm and with a different set of frequencies for every frame. Musical noise is a result of variability in the power spectrum.

    Fig 2. Configuration of Noise Suppression (NS) as a speech enhancement Pre-processor for speech codec

  2. PROBLEM STATEMENT

    In modern hands free speech communication environments often occurs the situation that the speech signal is superposed by background noise as shown in Fig 1.1. This is particular the case if the speaker is not located as close as possible to the microphone. The speech signal intensity decreases with growing distance to the microphone. It is even possible that background noise sources are captured at a higher level than the speech signal. The noise distorts the speech and words are hardly intelligible. In order to improve the intelligibility and reduce the listeners stress by increasing the signal to noise ratio a noise reduction procedure also called speech enhancement algorithm is applied.

  3. NOISE REDUCTION PRINCIPLES

    The requirements of a noise reduction system in speech enhancement are:

    • Naturalness and Intelligibility of the enhanced signal

    • Improvement of signal-to-noise ratio

    • Short signal delay

    • Computational simplicity

    The quality of the enhanced signal is a diverse issue, it may be characterised by the terms intelligibility and

    naturalness. There are several methods for performing noise reduction, but all can be regarded as a kind of filtering. In our application, speech and noise are mixed to one signal channel. They reside in the same frequency band and may have similar correlation properties. Consequently the filtering will inevitably have an effect on both the speech and the noise. Therefore it is a very challenging task to distinguish between them. Sometimes speech components can be detected as noise and thus will be suppressed as well. Especially fricatives and plusives are attenuated due to their noise-like properties.

    Furthermore, the residual noise characteristics should preserve the characteristics of the background noise in the recording environment. Typical single channel noise reduction algorithms add a synthetic noise, also called Musical Noise., which sounds artificial and has a disturbing effect on the listener.

    Single channel noise reduction algorithms are based on the fact that the statistical properties of speech are only stationary over short periods of time whereas the noise often can be assumed to be stationary over much longer periods. Another aim for the algorithm design is the limitation of the signal delay because of its annoying effect in dialog situations.

    The noise reduction algorithms can be split into two groups: time domain algorithms and those utilising some kind of transform, e.g. Fourier Transform. Whereas the filter calculation for time domain solutions generally relies on the usage of correlation estimates, there is a large variety of algorithms operating in the frequency domain.

  4. NOISE SUPPRESSION SYSTEM

    The Noise suppression algorithm used by the EVRC is based on the Spectral Subtraction technique, in which the main emphasis is given on the Spectral Weighting .The Fig.3 shows the general principle of such a system.

    Fig.3. General principle of the EVRC NS system

    Firstly, the input signal, y(n), is block wise transformed from the time domain, to the frequency domain. Secondly, a set of gain factors, q(k), are calculated. The actual spectral subtraction takes the form of a multiplication of G(k) with the gain factors from the gain calculation, resulting in the enhanced spectrum Q(k). Lastly, this spectrum is transformed from the frequency domain, to the time domain, and the signal is block wise reassembled to form the enhanced output yNS(n) .

    1. TIA 127-B (Narrow Band) Speech Enhancement System

      The fundamental concept of a frequency domain solution is spectral weighting and block processing. The architecture of such a system is presented in Figure 4. Since in a single/multi channel approach the estimation of the noise and the weighting function can only be derived in frequency domain, the time domain input signal has to be transformed. The transformations are performed by means of standard analysis and synthesis systems operating on a frame-by-frame basis. It consists of three major components:

      • the analysis/synthesis framework for time domain / frequency domain transformation

      • the noise estimation

      • the weighting function.

    If the noise estimation equals the disturbing noise spectrum the output signal spectrum Y(n, i), will be very similar to the noiseless speech spectrum S(n, i).

    Estimating the noise spectrum Nest(n, i ) is one of the major tasks of a noise cancelling system. Based on the above mentioned assumption that the noise part of the signal is stationary over longer periods of time than the speech part, an estimate of the noise is obtained by extracting slowly changing portions of the signal spectrum. The output frame is obtained by applying the inverse frequency transformation to the weighted enhanced spectrum Y(n, i ) and the noisy phase x(n, i).

    noise suppression system consists of subsystems as shown in Fig.4.

    The input to the noise suppressor are the noisy speech samples s(n) which have been previously high-pass filtered. These are passed through a pre-emphasis filter and transformed into the frequency-domain values G(k). In the frequency domain, a filtering operation is performed by multiplying G(k) by the scalar gain values Y(k) to yield H(k). The filtered spectral values H(k) are transformed back into the time domain and passed through a de-emphasis filter to provide the noise suppressed speech samples s(n) to the speech coder.

    The channel energy estimator divides this spectrum into Nc channels and calculates an estimate of the signal energy in each one. The spectral deviation estimator calculates the difference between the current channel energies and an average long-term estimate. An estimated signal-to noise ratio is calculated by the SNR estimator, using the channel energy and background noise estimates. The SNR estimate is used to calculate the voice metric, which is a weighted sum which provides an estimate of the signal "quality". It is used mainly as an indication as to whether or not the current frame contains speech. When the input signal is deemed to contain no speech, the background noise estimator is updated. Under some conditions the SNR estimates are changed by the SNR modifier. Based on the (modified) SNR estimates and the background noise the gains for each channel are calculated by the channel gain calculator. These gains are then used to perform the filtering of the input signal.

    The overall gain factor for the current frame, n, is calculated according to

    n=max {min-10log

    ( 1

    10

    10

    1 En(m, i))} (1)

    =0

    =0

    Fig . 4.TIA 127-B (Narrow Band) Speech Enhancement System.

    A TIA/EIA/IS127-B Compliant Speech Enhancement System is pre-processing block in Enhanced Variable Rate Codec (EVRC) used to enhance the speech signal before encoding the speech signal. The main components of the TIA/EIA/IS127-B Compliant Speech Enhancement System are:

    1. High Pass System

    2. Adaptive Noise Suppression System.

    High Pass System comprises 6th order Butterworth filter implemented using 3 sections of Biquad Filter. Adaptive

    Where min=-13 is the minimum overall gain, Efloor = 1 is the noise floor energy and En(m; i) is the estimated noise spectrum calculated during the previous frame. The dB- scale channel gains are calculated as

    dB(i)=µg((i)-th)+n; 0 i < Nc (2)

    Where g=0.39 is the gain slope and th the SNR threshold, both constants. In Fig 4 the gain curve for a single channel resulting from following equation is plotted in comparison with the gain curve resulting from the spectral subtraction rule.

    = Max (th ) (3)

    To simulate the single channel behaviour of EVRC-NS

    (3) was used. As is evident, the gain curve for EVRC-NS is quite different from that of spectral subtraction. The channel gains are converted to linear scale according to

    20

    20

    ch(i) = min {1,10() } 0 i < (4)

    In our implementation, the input speech is presented to the noise suppressor in frames of 80 samples (10 ms frames at 8 kHz sampling). These samples along with 24 samples of the previous frame are multiplied by a smoothed trapezoidal window and transformed into the frequency domain by a 128-point FFT In the frequency domain, the spectral values are grouped together to form 16 unequal frequency bands (similar to critical bands) referred to as channels.

    A scalar gain value is computed for each channel and applied to all the spectral values corresponding to that channel including both positive and negative frequencies. The filtered values Y(k) are transformed back into time domain using a 128-point IFFT and overlap-added with the last 48 noise-suppressed samples of the previous frame. The first 80 samples are then released to the speech coder. It is seen that the noise suppressor essentially operates as a time- adaptive filter

  5. SIMULATION RESULT

    1. Original Speech Sample corrupted by Noise

    2. Recovered speech sample obtained from EVRC Algorithm

    TABLE I : Object Measures For Various Noise Types

    Table I presents the results of Correlation Coefficient, Segmental SNR, Log Spectral Distance, Vector Quantization based Minimum Mean Euclidean Distance values for various noise types and levels obtained by using the EVRC TIA-127-B speech enhancement system.

  6. CONCLUSION

    A noise suppression algorithm based on EVRC TIA/EIA/IS127-B has been proposed. The proposed algorithm continuously updates the noise estimate by noisy speech in accordance with an estimated SNR. The spectral gain is modified with the SNR so that it better fits the new noise estimate for higher speech quality.

  7. REFERENCES

  1. Gongping Huang, Jacob Benesty, Tao Long, and Jingdong Chen, A Family of Maximum SNR Filters for Noise Reduction, IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 22, NO. 12, DECEMBER 2014

  2. Xiang Chen, Tong Shao, Digital Phase Noise Cancellation for a Coherent-Detection Microwave Photonic Link, IEEE PHOTONICS TECHNOLOGY LETTERS, VOL. 26, NO. 8, APRIL 15, 2014

  3. H. Thumchirdchupong and N. Tangsangiumvisai, A Two- Microphone Noise Reduction Scheme for Hands-Free Telephony in A Car Environment, 978-1-4799-0545-4/13©2013 IEEE

  4. J. A. Tavares Reyes, E. Escamilla Hernández, J. C. Sánchez García, DSP-Based Oversampling Adaptive Noise Canceller for Background Noise Reduction for Mobile Phones, 978-1-61284-1325-5/12 ©2012 IEEE

  5. Zhinxin Chen, Simulation of Spectral Based Noise Reduction Method, International Journal of Advanced Computer Science and Applications, Vol.2, No8, 2011.

  6. Ekaterina Verteletskaya, Boris Simak, Noise Reduction Based on Modified Spectral Subtraction Method, International Journal of Computer Science, 2011.

  7. Yoshhisa Uemura, Yu Takahashi, Hiroshi Saruwatari, Kiyohiro Shikano and Kazunobu Kondo , Musical Noise Generation Analysis For Noise Reduction Methods Based On Spectral Subtraction and MMSE STSA Estimation , ICASSP 2009.

  8. M.Kato, A.Sugiyama and M.Serizawa, Noise Suppression with High Speech Quality Based on Weighted Noise Estimation and MMSE STSA, Technical report of IEICE, DSP/IE/MI2001-8,pp.53-60, Apr.2001.

  9. D.Deepa, A.Vijay, D.Hema Priya and A.Shanmugam, Enhancement of Noisy Speech Signal Based on Variance and Modified Gain Function,IEEE,2011.

Leave a Reply